DVC (Data Version Control)
DVC brings software engineering best practices to data, AI/ML, and data science teams using a Git-like model for data version control.
Manage data and machine learning models with version control, making AI/ML projects reproducible and collaborative.

DVC (Data Version Control) is an open-source version control system for machine learning projects. It extends Git to handle large datasets and machine learning models, enabling teams to track changes, reproduce experiments, and collaborate effectively. DVC manages data in a separate storage system (like S3, GCS, or Azure Blob Storage) while keeping metadata in Git. This approach allows for versioning of data without bloating the Git repository. It provides features like data pipelines, experiment tracking, and model management. DVC is designed for data scientists, machine learning engineers, and AI teams looking to apply software engineering best practices to their data science workflows.
DVC (Data Version Control) is an open-source version control system for machine learning projects.
Explore all tools that specialize in version control large datasets and ml models. This domain focus ensures DVC delivers optimized results for this specific requirement.
Explore all tools that specialize in track ml experiments and their results. This domain focus ensures DVC delivers optimized results for this specific requirement.
Explore all tools that specialize in reproduce ml experiments with data and code snapshots. This domain focus ensures DVC delivers optimized results for this specific requirement.
Explore all tools that specialize in create and manage data pipelines. This domain focus ensures DVC delivers optimized results for this specific requirement.
Explore all tools that specialize in collaborate on data science projects. This domain focus ensures DVC delivers optimized results for this specific requirement.
Explore all tools that specialize in manage data dependencies. This domain focus ensures DVC delivers optimized results for this specific requirement.
Tracks changes to large datasets and ML models by storing metadata in Git and data in external storage. Uses `.dvc` files to point to data in cloud storage.
Captures code, data, and parameters for each experiment, allowing you to compare and reproduce results. Uses `dvc exp` commands to manage experiments.
Defines data processing workflows as a series of stages with dependencies, enabling automated and reproducible data transformations. Uses `dvc.yaml` files to define pipelines.
Version controls and tracks machine learning models, allowing you to deploy and manage models with confidence. DVC integrates with popular ML frameworks.
Guarantees that your ML experiments can be reproduced by others or in the future, even with large datasets and complex pipelines. Uses snapshots of data, code, and parameters.
Install DVC using pip: `pip install dvc`
Initialize DVC in your Git repository: `dvc init`
Configure remote storage (e.g., S3, GCS, Azure Blob Storage): `dvc remote add -d myremote s3://mybucket`
Track your data files or ML models with DVC: `dvc add data.csv`
Commit the `.dvc` files to your Git repository: `git add data.csv.dvc && git commit -m "Add data tracking"`
Push your data to the remote storage: `dvc push`
Use `dvc repro` to reproduce your data pipelines.
Explore experiment tracking with `dvc exp run`.
All Set
Ready to go
Verified feedback from other users.
"DVC empowers users to manage their data and models effectively by bringing software engineering best practices to the ML lifecycle. It seems well-regarded for reproducibility and collaboration."
0Post questions, share tips, and help other users.
DVC brings software engineering best practices to data, AI/ML, and data science teams using a Git-like model for data version control.
Automate GitHub pull requests with auto-updates and merges to streamline developer workflows.
GitHub Desktop simplifies your development workflow by providing a GUI for interacting with Git repositories.
RVM allows you to easily install, manage, and work with multiple ruby environments.
SourceTree simplifies how you interact with your Git repositories, allowing you to focus on coding through a user-friendly Git GUI.
Zod is a TypeScript-first schema validation library with static type inference.
ZenML is the AI Control Plane that unifies orchestration, versioning, and governance for machine learning and GenAI workflows.