Overview

Kedro is an open-source Python framework designed to help data scientists and engineers create production-ready data pipelines. Originally developed by McKinsey's QuantumBlack and now a part of the LF AI & Data Foundation, Kedro addresses the 'notebook-to-production' gap by enforcing software engineering best practices—such as modularity, separation of concerns, and versioning—within the data science workflow. Its architecture centers around a 'Data Catalog' which abstracts data access, and a 'Pipeline' structure composed of 'Nodes' (pure Python functions). This decoupling allows teams to swap data sources or execution environments without rewriting core logic. In the 2026 market, Kedro remains the gold standard for enterprise-grade data orchestration where governance, auditability, and team collaboration are paramount. It integrates seamlessly with modern stack components like MLflow, Great Expectations, and Airflow, providing a standardized project structure (based on Cookiecutter) that enables rapid onboarding and scale-out capabilities across distributed computing environments like Apache Spark or Dask.

Common tasks

Data Pipeline Orchestration ETL Development Machine Learning Engineering Data Cataloging Software Engineering for Data Science