Sourcify
Effortlessly find and manage open-source dependencies for your projects.

The Pythonic framework for high-scale data science and MLOps orchestration.

Metaflow is a human-centric framework originally developed at Netflix to help data scientists build and manage real-life data science projects. Architecturally, it sits as a layer above infrastructure, abstracting away the complexities of cloud compute, storage, and orchestration. In the 2026 landscape, Metaflow has evolved into the industry standard for bridging the gap between local development and production-grade execution. It utilizes a DAG-based (Directed Acyclic Graph) structure where users define steps using simple Python decorators like @step and @batch. Its core strength lies in its 'content-addressed' data store, which automatically versions every piece of data produced by every run, enabling perfect reproducibility and effortless debugging. By integrating natively with AWS Step Functions, Argo Workflows, and Kubernetes, it allows teams to scale from a single laptop to massive GPU clusters without changing their code. The framework’s philosophy emphasizes developer productivity, allowing scientists to focus on modeling while Metaflow handles the 'plumbing' of infrastructure, dependency management, and state persistence.
Metaflow is a human-centric framework originally developed at Netflix to help data scientists build and manage real-life data science projects.
Explore all tools that specialize in orchestrate ml pipelines. This domain focus ensures Metaflow delivers optimized results for this specific requirement.
Explore all tools that specialize in process large-scale data. This domain focus ensures Metaflow delivers optimized results for this specific requirement.
Explore all tools that specialize in deploy machine learning models. This domain focus ensures Metaflow delivers optimized results for this specific requirement.
Explore all tools that specialize in model versioning. This domain focus ensures Metaflow delivers optimized results for this specific requirement.
Every variable assigned to 'self' in a step is automatically serialized and stored in a content-addressed data store (S3/Azure/GCS).
Developers use @resources(cpu=4, gpu=1, memory=16000) directly in Python code to provision cloud resources.
Metaflow captures the exact software environment for every step and recreates it in remote containers.
An extensible framework for generating HTML-based visual reports (plots, tables, images) attached to specific steps.
Native support for fan-out execution patterns, allowing thousands of parallel tasks to run across a cluster.
Built-in @retry and @catch decorators to handle transient infrastructure failures and edge-case data errors.
A Python API to query metadata, logs, and artifacts from any historical run programmatically.
Install the open-source package via 'pip install metaflow'.
Configure your cloud metadata provider (AWS/Azure/GCP) using 'metaflow configure'.
Define your workflow by inheriting from the FlowSpec class in Python.
Annotate steps with @step to define the DAG execution order.
Use @batch or @kubernetes decorators to assign compute resources to specific steps.
Pass data between steps using instance variables (self.data) for automatic state persistence.
Execute the flow locally for debugging using 'python flow.py run'.
Deploy to a production orchestrator like AWS Step Functions or Argo Workflows.
Use the Metaflow Client API to programmatically inspect results in a Jupyter Notebook.
Enable Metaflow Cards with @card to generate automated visual reports for stakeholders.
All Set
Ready to go
Verified feedback from other users.
"Users praise Metaflow for its 'magical' ability to handle data persistence and infrastructure, though some find the initial cloud setup complex."
Post questions, share tips, and help other users.
Effortlessly find and manage open-source dependencies for your projects.

End-to-end typesafe APIs made easy.

Page speed monitoring with Lighthouse, focusing on user experience metrics and data visualization.

Topcoder is a pioneer in crowdsourcing, connecting businesses with a global talent network to solve technical challenges.

Explore millions of Discord Bots and Discord Apps.

Build internal tools 10x faster with an open-source low-code platform.

Open-source RAG evaluation tool for assessing accuracy, context quality, and latency of RAG systems.

AI-powered synthetic data generation for software and AI development, ensuring compliance and accelerating engineering velocity.