Cityscapes Dataset
Cityscapes is a large-scale dataset for semantic urban scene understanding, providing high-quality pixel-level annotations of street scenes from 50 different cities.
The VCTK Corpus provides diverse English speech data from 110 speakers, ideal for voice cloning and speech synthesis research.

The VCTK Corpus, also known as the CSTR VCTK Corpus, is a collection of speech data from 110 English speakers with varied accents. Each speaker recorded approximately 400 sentences, sourced from newspapers, a rainbow passage, and an elicitation paragraph. This diverse dataset is designed to support research in text-to-speech synthesis, particularly speaker-adaptive methods and neural waveform modeling. The recordings, captured using high-quality microphones in a hemi-anechoic chamber, are processed to 16 bits and downsampled to 48 kHz. The corpus includes transcript files for most speakers, facilitating alignment and training. It is particularly useful for training HMM-based and DNN-based speech synthesis systems, offering a comprehensive resource for advancing voice cloning and speech technology research and development. It was referenced by Google DeepMind in their work on WaveNet.
The VCTK Corpus, also known as the CSTR VCTK Corpus, is a collection of speech data from 110 English speakers with varied accents.
Explore all tools that specialize in training speech synthesis models. This domain focus ensures VCTK Dataset delivers optimized results for this specific requirement.
Explore all tools that specialize in developing voice cloning systems. This domain focus ensures VCTK Dataset delivers optimized results for this specific requirement.
Explore all tools that specialize in researching speaker adaptation techniques. This domain focus ensures VCTK Dataset delivers optimized results for this specific requirement.
Explore all tools that specialize in evaluating text-to-speech algorithms. This domain focus ensures VCTK Dataset delivers optimized results for this specific requirement.
Explore all tools that specialize in creating multi-speaker speech datasets. This domain focus ensures VCTK Dataset delivers optimized results for this specific requirement.
Explore all tools that specialize in analyzing regional accents in speech. This domain focus ensures VCTK Dataset delivers optimized results for this specific requirement.
The dataset includes speech from 110 different speakers, enabling the training of models that can generalize across diverse voices.
Speech data was recorded using professional-grade microphones in a controlled acoustic environment, ensuring minimal noise and high fidelity.
The speakers represent a range of English accents, allowing for the development of accent-agnostic or accent-specific speech models.
The dataset includes text transcripts for 109 of the 110 speakers, facilitating accurate alignment and training of speech models.
Each speaker reads from a combination of newspaper text, the rainbow passage, and an elicitation paragraph ensuring diverse phonetics.
Visit the Edinburgh DataShare website at https://datashare.ed.ac.uk/handle/10283/3443.
Review the dataset description and available files.
Accept the terms of the license agreement.
Download the main file containing audio and text files (approximately 10.94GB).
Download the README file for detailed information about the dataset.
Extract the downloaded files to a local directory.
Explore the audio and text files for each speaker.
All Set
Ready to go
Verified feedback from other users.
"The VCTK Dataset is widely used in the speech synthesis research community as a reliable and comprehensive resource for training and evaluating voice cloning and text-to-speech models."
0Post questions, share tips, and help other users.
Cityscapes is a large-scale dataset for semantic urban scene understanding, providing high-quality pixel-level annotations of street scenes from 50 different cities.
ShapeNet is a richly-annotated, large-scale dataset of 3D shapes designed to enable research in computer graphics, computer vision, robotics, and related disciplines.
TruEra helps businesses build and maintain trust in their AI systems by providing AI model evaluation, debugging, and monitoring solutions.
The AI orchestration platform that allows you to turn AI and agents into business performance.
Zod is a TypeScript-first schema validation library with static type inference.
Trail of Bits fortifies code by combining high-end security research with a real-world attacker mentality.
ZenML is the AI Control Plane that unifies orchestration, versioning, and governance for machine learning and GenAI workflows.