
Trino
Fast distributed SQL query engine for big data analytics.

The open-standard for unified metadata management, data discovery, and collaborative governance.

OpenMetadata is a comprehensive, open-source metadata management solution that centralizes data discovery, governance, and collaboration. Built on a foundation of JSON-schema-based standards, it treats metadata as a first-class citizen, enabling a consistent and extensible architecture across the data stack. In the 2026 market, OpenMetadata positions itself as the primary alternative to expensive proprietary catalogs by offering native, automated data lineage and integrated data quality profiling. It utilizes a central metadata repository (MySQL/PostgreSQL) and an indexing layer (Elasticsearch/OpenSearch) to provide low-latency discovery. The platform supports over 50+ native connectors, including Snowflake, BigQuery, Databricks, and various BI tools. Beyond simple documentation, OpenMetadata focuses on 'Social Metadata,' allowing teams to collaborate via integrated feeds, tasks, and announcements. Its API-first design ensures that metadata is not a silo but an active participant in data engineering workflows, supporting programmatic updates and automated governance policies. For enterprise-grade scalability, the managed version via Collate provides advanced security, hosted ingestion, and dedicated support, bridging the gap between open-source flexibility and managed service reliability.
OpenMetadata is a comprehensive, open-source metadata management solution that centralizes data discovery, governance, and collaboration.
Explore all tools that specialize in manage metadata. This domain focus ensures OpenMetadata delivers optimized results for this specific requirement.
Explore all tools that specialize in monitor data quality. This domain focus ensures OpenMetadata delivers optimized results for this specific requirement.
Explore all tools that specialize in data lineage mapping. This domain focus ensures OpenMetadata delivers optimized results for this specific requirement.
Uses JSON Schema to define all metadata entities, ensuring consistent API responses and strict data typing across all services.
Parses SQL query logs and Airflow DAGs to build column-level lineage without manual intervention.
Natively supports Great Expectations-style tests and profiling directly within the metadata service.
Maintains a history of metadata changes, allowing users to see how schemas or descriptions evolved over time.
Activity stream functionality where users can mention others, request descriptions, and assign tasks directly on data assets.
Granular policy engine that controls visibility and edit rights based on user attributes and asset tags.
Triggers external workflows whenever metadata is updated (e.g., notifying Slack when a PII tag is added).
Deploy OpenMetadata using Docker Compose or Helm Charts for Kubernetes.
Access the UI and configure the Metadata Store (MySQL) and Search Index (Elasticsearch).
Install the OpenMetadata Ingestion Framework via pip install openmetadata-ingestion.
Create a Service Connection for your data source (e.g., Snowflake or BigQuery) using the UI.
Define a Metadata Ingestion pipeline to extract tables, schemas, and views.
Configure a Lineage Ingestion workflow to parse SQL queries and map dependencies.
Schedule a Data Profiler workflow to calculate statistics and distributions.
Set up Data Quality Tests (assertions) for critical tables using the UI or YAML.
Invite team members and assign Ownership and Tiering to discovered assets.
Integrate with Slack or MS Teams for real-time alerting on metadata changes.
All Set
Ready to go
Verified feedback from other users.
"Highly praised for its UI/UX and standard-based approach; some complexity in scaling self-hosted ingestion workers."
Post questions, share tips, and help other users.

Fast distributed SQL query engine for big data analytics.

Open-source RAG evaluation tool for assessing accuracy, context quality, and latency of RAG systems.

Unlocking insights from unstructured data.

A visual data science platform combining visual analytics, data science, and data wrangling.

Open Source OCR Engine capable of recognizing over 100 languages.

Liberating data tables locked inside PDF files.

Move your data easily, securely, and efficiently with Stitch, now part of Qlik Talend Cloud.

Open Source High-Performance Data Warehouse delivering Sub-Second Analytics for End Users and Agents at Scale.