
TechRxiv
A preprint server for health sciences.

A large-scale dataset of manually annotated audio events.

AudioSet is a large-scale dataset of manually annotated audio events, designed to provide a common evaluation task for audio event detection and a starting point for a comprehensive vocabulary of sound events. It consists of an expanding ontology of 632 audio event classes and a collection of over 2 million human-labeled 10-second sound clips drawn from YouTube videos. The ontology is structured as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments, and common environmental sounds. The data collection process involves human annotators verifying the presence of sounds within YouTube segments nominated based on metadata and content-based search. Machine-extracted features are available for download alongside the dataset, facilitating machine learning model training and evaluation.
AudioSet is a large-scale dataset of manually annotated audio events, designed to provide a common evaluation task for audio event detection and a starting point for a comprehensive vocabulary of sound events.
Explore all tools that specialize in sound classification. This domain focus ensures AudioSet delivers optimized results for this specific requirement.
Contains over 2 million 10-second audio clips with human-verified labels, providing a large-scale resource for training and evaluating audio event detection models.
The ontology consists of 632 audio event classes organized as a hierarchical graph, providing a structured vocabulary of sounds.
Pre-computed audio features are available for download, reducing the computational burden of feature extraction.
Labels in the AudioSet dataset are verified by human annotators, ensuring high accuracy and reliability.
YouTube segments are nominated for annotation using both metadata and content-based search techniques, ensuring diverse and relevant sound clips.
Download the AudioSet dataset and ontology from the download page.
Explore the ontology to understand the hierarchical structure of audio event classes.
Familiarize yourself with the data format and available machine-extracted features.
Implement a data loading pipeline to process the audio clips and labels.
Train a machine learning model using the AudioSet dataset for audio event detection.
Evaluate the model's performance using the provided evaluation metrics.
All Set
Ready to go
Verified feedback from other users.
"AudioSet is a highly regarded dataset for audio event detection, praised for its scale, diversity, and human-verified labels."
Post questions, share tips, and help other users.

A preprint server for health sciences.

Connect your AI agents to the web with real-time search, extraction, and web crawling through a single, secure API.

A large conversational telephone speech corpus for speech recognition and speaker identification research.

STRING is a database of known and predicted protein-protein interactions.

A free and open-source software package for the analysis of brain imaging data sequences.

Complete statistical software for data science with powerful statistics, visualization, data manipulation, and automated reporting in one intuitive platform.