
Trino
Fast distributed SQL query engine for big data analytics.

A declarative Python micro-framework for modular, testable, and self-documenting dataflows.

Hamilton is a specialized micro-framework designed to solve the 'Big Ball of Mud' problem in data science and machine learning pipelines. Developed originally at Stitch Fix and now maintained by DAGWorks, Hamilton fundamentally changes how data transformations are written by mapping function names to variable outputs and function arguments to dependencies. This architecture creates a Directed Acyclic Graph (DAG) that is naturally decoupled from the underlying compute infrastructure. In the 2026 market, Hamilton has evolved into a critical layer for LLM-based RAG (Retrieval-Augmented Generation) applications, where modularity is essential for swapping embedding models, vector databases, and prompt templates without breaking the system. It enables teams to maintain high-velocity development by forcing a functional paradigm that ensures unit-testability, data validation via integrations like Pandera, and automatic documentation of data lineage. As organizations shift toward 'Data-as-Code,' Hamilton provides the structural integrity required to move from experimental Jupyter notebooks to hardened production environments across Spark, Ray, Dask, and local Python executors.
Hamilton is a specialized micro-framework designed to solve the 'Big Ball of Mud' problem in data science and machine learning pipelines.
Explore all tools that specialize in orchestrate data pipelines. This domain focus ensures Hamilton delivers optimized results for this specific requirement.
Explore all tools that specialize in feature engineering. This domain focus ensures Hamilton delivers optimized results for this specific requirement.
Separates the 'what' (the logic) from the 'how' (the execution engine), allowing the same code to run on a laptop or a massive Spark cluster.
Extracts the DAG structure directly from function signatures and module inspection.
Extensible API to inject custom logic before/after function execution for logging, telemetry, or data validation.
Support for conditional execution and parameter sweeps using @config.when and @parametrize decorators.
Optimized execution for Polars dataframes, leveraging lazy evaluation within the Hamilton DAG.
A UI component to track versions, execution status, and data artifacts over time.
Functions as the internal orchestrator inside Airflow or Prefect tasks.
Install via pip: pip install 'sf-hamilton[visualization]'
Define your logic in a Python module using 'clean functions' where function name = variable name.
Specify dependencies as function arguments in your module.
Import the 'driver' from the Hamilton library.
Instantiate the Driver with your defined modules and a 'final_vars' list.
(Optional) Add lifecycle hooks for telemetry or data validation.
Execute the graph using driver.execute() or driver.materialize().
Visualize the DAG using driver.display_all_functions() for debugging.
Integrate with remote executors like Ray or Dask by swapping the GraphAdapter.
Deploy as a microservice using FastAPI or a batch job in Airflow.
All Set
Ready to go
Verified feedback from other users.
"Highly praised for its ability to turn 'spaghetti code' into clean, maintainable systems. Users note the learning curve but emphasize it is indispensable for large-scale data teams."
Post questions, share tips, and help other users.

Fast distributed SQL query engine for big data analytics.

Unlocking insights from unstructured data.

A visual data science platform combining visual analytics, data science, and data wrangling.

Open Source OCR Engine capable of recognizing over 100 languages.

Liberating data tables locked inside PDF files.

Move your data easily, securely, and efficiently with Stitch, now part of Qlik Talend Cloud.

Open Source High-Performance Data Warehouse delivering Sub-Second Analytics for End Users and Agents at Scale.