Overview

Transformers is a centralized model definition framework supporting state-of-the-art machine learning models across text, computer vision, audio, video, and multimodal domains. It facilitates both inference and training, acting as a pivot across various frameworks, including Axolotl, Unsloth, DeepSpeed, and inference engines like vLLM, SGLang, and TGI. The library ensures compatibility with adjacent modeling libraries such as llama.cpp and mlx. Transformers simplifies model definition, making it customizable and efficient for developers, machine learning engineers, and researchers. It supports pre-trained models, reducing computational costs and carbon footprint. Key features include Pipelines for optimized inference, a comprehensive Trainer with mixed precision and distributed training, and fast text generation with LLMs and VLMs.

Common tasks

Text Generation Image Segmentation Speech Recognition Question Answering