
Trino
Fast distributed SQL query engine for big data analytics.

The gold-standard research framework for high-performance data mining and spatial indexing.

ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) is a specialized Java-based open-source framework tailored for the development and evaluation of knowledge discovery in databases (KDD). Its primary architectural differentiator is the strict decoupling of data structures and algorithms, which allows researchers to evaluate the performance of spatial and multidimensional index structures independently of the mining logic. In the 2026 market landscape, ELKI remains the premier choice for academic benchmarking and industrial anomaly detection due to its unparalleled implementation of density-based clustering (DBSCAN, OPTICS) and local outlier detection (LOF). Unlike general-purpose libraries like Scikit-Learn or Spark MLlib, ELKI provides a massive repository of over 100 specialized algorithms and high-dimensional distance functions that are often omitted in commercial SaaS offerings. It serves as a backend engine for high-reliability systems where precision in geometric and topological data analysis is required. The framework's modularity allows for the integration of custom distance measures and data types, making it indispensable for complex spatial-temporal datasets and bio-informatics applications.
ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) is a specialized Java-based open-source framework tailored for the development and evaluation of knowledge discovery in databases (KDD).
Explore all tools that specialize in outlier detection. This domain focus ensures ELKI delivers optimized results for this specific requirement.
Supports R*-trees, M-trees, and Cover-trees to reduce the computational complexity of neighbor queries from O(N^2) to O(N log N).
Includes Local Outlier Factor (LOF), COF, LoOP, LOCI, and HiCS for high-dimensional anomaly detection.
Complete implementations of DBSCAN, OPTICS, DeLiClu, and GDBSCAN.
Algorithms to estimate the fractal dimension and local intrinsic dimensionality of datasets.
Modular interface for defining non-metric distance measures and similarity matrices.
A lightweight UI for rapid prototyping of algorithm parameters and real-time scatter plot inspection.
The Java architecture uses extensive generics to ensure type safety and memory performance.
Ensure Java Runtime Environment (JRE) 17 or higher is installed on the host machine.
Download the latest ELKI .jar file from the official GitHub releases or Maven Central.
Launch the MiniGUI by executing 'java -jar elki.jar' to explore algorithm parameters visually.
Prepare data using the required 'id' and 'label' columns in ARFF or CSV format.
Select a parser (e.g., NumberVectorLabelParser) to ingest your specific data schema.
Configure the 'Database' component to choose an indexing structure like R*-tree or M-tree.
Choose a distance function such as Euclidean, Manhattan, or Cosine Similarity.
Set algorithm-specific parameters (e.g., epsilon and minPts for DBSCAN).
Run the task and use the built-in visualizer to analyze clusters or outliers.
Export results using the 'ResultWriter' for integration into downstream pipelines.
All Set
Ready to go
Verified feedback from other users.
"Highly praised in academic circles for technical correctness and efficiency, though noted for a steep learning curve for non-Java developers."
Post questions, share tips, and help other users.

Fast distributed SQL query engine for big data analytics.

Unlocking insights from unstructured data.

A visual data science platform combining visual analytics, data science, and data wrangling.

Open Source OCR Engine capable of recognizing over 100 languages.

Liberating data tables locked inside PDF files.

Move your data easily, securely, and efficiently with Stitch, now part of Qlik Talend Cloud.

Open Source High-Performance Data Warehouse delivering Sub-Second Analytics for End Users and Agents at Scale.