Overview
StarGAN v2 is a PyTorch-based image-to-image translation model designed to learn mappings between different visual domains while maintaining diversity and scalability. It addresses limitations of existing methods by using a single framework to handle multiple domains and improve image diversity. The architecture involves a generator and discriminator network, trained adversarially to produce realistic and diverse images. Key use cases include transforming images between different animal faces (AFHQ dataset) and manipulating attributes like hairstyle on human faces (CelebA-HQ dataset). The model's performance is evaluated using metrics such as Fréchet Inception Distance (FID) and Learned Perceptual Image Patch Similarity (LPIPS), showing significant improvements over baseline methods. It supports pre-trained networks and datasets via bash scripts for easy setup.