
Trino
Fast distributed SQL query engine for big data analytics.

The Easy and Open Data Lakehouse Platform built for sub-second SQL queries and Git-like data management.

Dremio is a high-performance data lakehouse platform designed to provide a unified, self-service interface for data across diverse storage environments. Built on a foundation of open-source technologies including Apache Arrow, Project Nessie, and Apache Iceberg, Dremio eliminates the need for complex and costly ETL processes by allowing users to query data directly in-place. By 2026, Dremio has established itself as the premier solution for 'Git-for-Data' workflows, enabling data engineers to branch, merge, and version-control data lakes just like code. Its columnar cloud cache (C3) and 'Data Reflections' technology utilize Apache Arrow to deliver sub-second response times on petabyte-scale datasets. The platform's architecture is specifically optimized for modern AI workloads, providing the high-throughput data streams required for training Large Language Models (LLMs) and supporting vector search capabilities directly within the lakehouse environment. Dremio’s 2026 positioning emphasizes its role as the 'Open' alternative to proprietary data warehouses, championing a decentralized data mesh architecture that empowers analysts to access governed data across S3, Azure Data Lake, and Google Cloud Storage through a single SQL-compliant semantic layer.
Dremio is a high-performance data lakehouse platform designed to provide a unified, self-service interface for data across diverse storage environments.
Explore all tools that specialize in track data lineage. This domain focus ensures Dremio delivers optimized results for this specific requirement.
Explore all tools that specialize in data versioning. This domain focus ensures Dremio delivers optimized results for this specific requirement.
Uses Apache Arrow to create and persist optimized physical representations of data that automatically accelerate various query patterns.
Provides Git-like operations (commit, branch, merge) for Apache Iceberg tables.
A high-performance protocol for big data transfer that bypasses the bottlenecks of JDBC/ODBC.
An LLVM-based compiler for Apache Arrow that optimizes query expressions for high-performance SIMD instructions.
Automatically caches data from remote object stores onto local NVMe storage on executor nodes.
A virtual layer where users can define business logic and security policies across multiple sources without moving data.
Cloud-native compute engines that automatically scale up or down based on query concurrency.
Sign up for Dremio Cloud or deploy Dremio Software via Helm Chart on Kubernetes.
Connect your cloud storage (S3, ADLS, or GCS) using IAM roles or service principals.
Configure a Project Nessie catalog for Git-like data versioning and Iceberg table management.
Run an initial discovery scan to index metadata and schema from your data lake.
Define a Semantic Layer by creating Virtual Data Sets (VDS) to represent business logic.
Enable 'Data Reflections' on high-traffic tables to accelerate query performance using Arrow-based materialization.
Connect your BI tool (Tableau, Power BI, or Looker) via the native Dremio connector.
Set up User/Role-based Access Control (RBAC) to govern data access at the row and column level.
Implement data versioning workflows by creating branches for ETL testing before merging to production.
Monitor query execution and resource consumption via the Dremio UI performance profiler.
All Set
Ready to go
Verified feedback from other users.
"Highly praised for its query speed and ease of use compared to Presto/Trino, though some users find the initial cluster configuration complex."
Post questions, share tips, and help other users.

Fast distributed SQL query engine for big data analytics.

Unlocking insights from unstructured data.

A visual data science platform combining visual analytics, data science, and data wrangling.

Open Source OCR Engine capable of recognizing over 100 languages.

Liberating data tables locked inside PDF files.

Move your data easily, securely, and efficiently with Stitch, now part of Qlik Talend Cloud.

Open Source High-Performance Data Warehouse delivering Sub-Second Analytics for End Users and Agents at Scale.