Overview
WaveGAN is a TensorFlow implementation of a generative adversarial network designed to synthesize raw audio waveforms. It operates by observing many examples of real audio and learning to generate new audio samples that mimic the characteristics of the training data. WaveGAN employs a DCGAN-like architecture, adapted for the specific challenges of audio synthesis. Key capabilities include generating audio up to 4 seconds at 16kHz and supporting various audio sample rates and multi-channel audio. It offers the ability to train on datasets of arbitrary audio files without requiring extensive preprocessing, using streaming data loaders for formats like MP3, WAV, and OGG. WaveGAN can be compared to SpecGAN, an alternative audio generation approach that applies image-generating GANs to audio spectrograms. Use cases span speech synthesis, generating sound effects, and creating music excerpts.
