
Citrine Informatics
The AI platform for materials and chemicals development, shortening R&D cycles through materials informatics.

The data curation and active learning platform for computer vision.

Lightly is a high-performance data curation and active learning platform designed to bridge the gap between massive raw datasets and high-quality training data for computer vision models. Built for the 'Data-Centric AI' era, Lightly leverages self-supervised learning (SSL) to generate vector embeddings of visual data without requiring labels. This technical architecture allows ML engineers to identify redundancies, find edge cases, and select the most informative samples for labeling, effectively reducing annotation costs by up to 90%. By 2026, Lightly has positioned itself as the industry standard for industrial-scale vision pipelines, offering seamless integration with cloud storage providers and annotation platforms like Labelbox and Scale AI. Its core engine supports diversity sampling through coreset algorithms and model-in-the-loop active learning, ensuring that every labeled image provides maximum marginal utility to the model. The platform is optimized for petabyte-scale datasets, providing a web-based visualization suite alongside a robust Python SDK for automated workflow integration.
Lightly is a high-performance data curation and active learning platform designed to bridge the gap between massive raw datasets and high-quality training data for computer vision models.
Explore all tools that specialize in active learning. This domain focus ensures Lightly delivers optimized results for this specific requirement.
Uses SimCLR and VICReg architectures to create meaningful vector representations of data without labels.
Mathematical algorithm that selects a subset of data that maintains the geometric properties of the original distribution.
Integrates model predictions to calculate entropy and uncertainty scores for intelligent sampling.
Streams data directly from S3/GCP/Azure without storing client data on Lightly servers.
Allows combining visual embeddings with custom metadata (weather, location, camera ID) for complex queries.
Temporal analysis to remove highly similar frames within video streams.
Monitors changes in the embedding distribution over time to detect dataset shift.
Install the Lightly Python SDK via pip: pip install lightly.
Initialize the Lightly client using your API key obtained from the web platform.
Connect your data source (AWS S3, Azure Blob, or Google Cloud Storage) using IAM roles for secure access.
Create a new dataset in the Lightly platform via the SDK or UI.
Run the embedding process to generate vector representations of your images/videos locally or on Lightly's infrastructure.
Upload the generated embeddings and metadata to the Lightly platform.
Configure a selection strategy (e.g., Coreset, Random, or Diversity-based) to identify the optimal data subset.
Visualize the dataset in the 'Explore' tab to confirm the selection and identify potential bias.
Export the selected filenames or IDs directly to your labeling partner (e.g., Labelbox, CVAT).
Trigger the active learning loop by feeding model predictions back into Lightly to find high-loss samples.
All Set
Ready to go
Verified feedback from other users.
"Users praise Lightly for its ability to handle massive datasets and its seamless integration into existing MLOps stacks. The open-source library is highly regarded for its performance, while the cloud platform is noted for its intuitive visualization."
Post questions, share tips, and help other users.

The AI platform for materials and chemicals development, shortening R&D cycles through materials informatics.