Overview
The VCTK Corpus, also known as the CSTR VCTK Corpus, is a collection of speech data from 110 English speakers with varied accents. Each speaker recorded approximately 400 sentences, sourced from newspapers, a rainbow passage, and an elicitation paragraph. This diverse dataset is designed to support research in text-to-speech synthesis, particularly speaker-adaptive methods and neural waveform modeling. The recordings, captured using high-quality microphones in a hemi-anechoic chamber, are processed to 16 bits and downsampled to 48 kHz. The corpus includes transcript files for most speakers, facilitating alignment and training. It is particularly useful for training HMM-based and DNN-based speech synthesis systems, offering a comprehensive resource for advancing voice cloning and speech technology research and development. It was referenced by Google DeepMind in their work on WaveNet.