Overview

The VCTK Corpus, also known as the CSTR VCTK Corpus, is a collection of speech data from 110 English speakers with varied accents. Each speaker recorded approximately 400 sentences, sourced from newspapers, a rainbow passage, and an elicitation paragraph. This diverse dataset is designed to support research in text-to-speech synthesis, particularly speaker-adaptive methods and neural waveform modeling. The recordings, captured using high-quality microphones in a hemi-anechoic chamber, are processed to 16 bits and downsampled to 48 kHz. The corpus includes transcript files for most speakers, facilitating alignment and training. It is particularly useful for training HMM-based and DNN-based speech synthesis systems, offering a comprehensive resource for advancing voice cloning and speech technology research and development. It was referenced by Google DeepMind in their work on WaveNet.

Common tasks

Training speech synthesis models Developing voice cloning systems Researching speaker adaptation techniques Evaluating text-to-speech algorithms Creating multi-speaker speech datasets Analyzing regional accents in speech Experimenting with neural waveform modeling

FAQ

View all

Full FAQ is available in the detailed profile.

Pricing

View pricing

Pricing varies

Plan-level pricing details are still being validated for this tool.

Pros & Cons

Pros/cons are still being audited for this tool.

Overview

Common tasks

FAQ

View all

Full FAQ is available in the detailed profile.

Pricing

View pricing

Pricing varies

Plan-level pricing details are still being validated for this tool.

Pros & Cons

Pros/cons are still being audited for this tool.

VCTK Dataset

Should you use VCTK Dataset?

Overview

FAQ

Pricing

Pros & Cons

Reviews & Ratings

VCTK Dataset

Should you use VCTK Dataset?

Overview

FAQ

Pricing

Pros & Cons

Reviews & Ratings