AudioSet is a large-scale dataset of manually annotated audio events, designed to provide a common evaluation task for audio event detection and a starting point for a comprehensive vocabulary of sound events. It consists of an expanding ontology of 632 audio event classes and a collection of over 2 million human-labeled 10-second sound clips drawn from YouTube videos. The ontology is structured as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments, and common environmental sounds. The data collection process involves human annotators verifying the presence of sounds within YouTube segments nominated based on metadata and content-based search. Machine-extracted features are available for download alongside the dataset, facilitating machine learning model training and evaluation.

AudioSet

About AudioSet

Core Capabilities

Main Tasks

Audio Event Detection

Sound Classification

Acoustic Scene Understanding

What this tool is best suited for

Shortlist AudioSet against top options

Pros

Cons

Reviews & Ratings

Reviews

Write a Review

Core Tasks

Target Personas

Categories

Alternative Tools