Where can I find the documentation for this corpus?

Documentation is available on the LDC website under LDC97S62 Documents.

Switchboard-1 Release 2

Switchboard-1 Release 2 | Find AI List

Overview

Switchboard-1 Release 2 is a corpus of approximately 260 hours of conversational telephone speech developed by Texas Instruments and distributed by the Linguistic Data Consortium (LDC). It comprises around 2,400 two-sided telephone conversations among 543 speakers from across the United States. The data was collected using a computer-driven robot operator system which introduced topics for discussion and recorded speech from both participants into separate channels. This release corrects errors from the original NIST publication (Release 1) and includes modifications to the NIST Sphere headers for consistency. The ISIP update of phonetic transcriptions developed by the International Computer Science Institute (ICSI) and corrected word alignments are available at ISIP. Researchers leverage this data for annotation projects, discourse analysis, part-of-speech tagging, and phonetic transcriptions. The corpus includes speaker attribution tables, updated file lists, and documentation.

Common tasks

Speech Recognition Speaker Identification Discourse Analysis

FAQ

View all

What is Switchboard-1 Release 2?

Switchboard-1 Release 2 is a corpus of approximately 260 hours of conversational telephone speech developed by Texas Instruments and distributed by the Linguistic Data Consortium (LDC).

What is the sampling rate of the audio data?

The sample rate is 8000 Hz.

What kind of license is required to use this data?

An LDC User Agreement for Non-Members is required.

What are some of the applications of this corpus?

Applications include speaker identification and speech recognition.

FAQ+