
Alation
The Data Intelligence Platform for the Modern AI and Cloud Enterprise.

The open-source data discovery and metadata engine for modern data-driven enterprises.

Amundsen is an industry-standard open-source data discovery platform originally developed at Lyft and now part of the LF AI & Data Foundation. It is architected to improve the productivity of data scientists and engineers by providing a 'Google-like' search interface for internal data assets. Technically, Amundsen follows a microservices architecture consisting of a front-end service, a search service (backed by Elasticsearch), and a metadata service (backed by Neo4j or Apache Atlas). It utilizes a Databuilder framework—a generic data ingestion library—to pull metadata from various sources like Snowflake, BigQuery, and Redshift. In the 2026 market, Amundsen distinguishes itself by remaining vendor-neutral, allowing organizations to maintain full control over their metadata graph without the licensing costs of proprietary catalogs. Its integration with lineage tools and automated metadata extraction makes it a critical component for AI-readiness, as it provides the structured context necessary for feeding RAG (Retrieval-Augmented Generation) systems with high-quality, documented organizational data.
Amundsen is an industry-standard open-source data discovery platform originally developed at Lyft and now part of the LF AI & Data Foundation.
Explore all tools that specialize in data lineage mapping. This domain focus ensures Amundsen delivers optimized results for this specific requirement.
Uses PageRank-inspired algorithms to rank tables based on query frequency and user interaction.
Supports both Neo4j (graph) and Apache Atlas as the primary metadata store.
Ingests lineage from tools like dbt and Airflow to visualize upstream and downstream dependencies.
A highly extensible Python framework for extracting, transforming, and loading metadata from any source.
Enable automated documentation updates via API without manual UI interaction.
Captures user behavior (frequent users, owners) to build a social graph of data experts.
Integrates with profiling tools like Great Expectations to show data quality scores and sample rows.
Clone the amundsen-io/amundsen repository from GitHub.
Configure Docker and Docker-Compose on the host machine.
Launch the core services (Front-end, Metadata, Search) using the provided docker-amundsen.yml.
Initialize the Neo4j graph database and Elasticsearch indices.
Install the Amundsen Databuilder library in a Python environment.
Configure an ingestion script (ETL job) for your primary data warehouse (e.g., Snowflake).
Execute the Databuilder job to populate the Metadata and Search services.
Configure OIDC or Header-based authentication for user access control.
Integrate Slack or Email for notification-based data ownership requests.
Verify data visibility and lineage in the Amundsen Web UI.
All Set
Ready to go
Verified feedback from other users.
"Highly praised for its simplicity and 'search-first' approach, though some users find the initial Databuilder configuration complex."
Post questions, share tips, and help other users.

The Data Intelligence Platform for the Modern AI and Cloud Enterprise.

The DataHub metadata platform gives context for AI to safely manage and use data.

The open-standard for unified metadata management, data discovery, and collaborative governance.

Free, open-source database management tool for personal and professional use.

Global and Unified Access to Knowledge Graphs.

Dimensions provides linked data solutions for smarter research analysis.