
Collibra
Accelerate digital transformation with the industry's leading Data Intelligence Platform.

The world's leading open-source research data repository for sharing, citing, and archiving scholarly datasets.

Harvard Dataverse is a robust, open-source research data repository software designed to facilitate the sharing, preservation, and citation of scholarly data. Built on a Java-based architecture (utilizing Payara and PostgreSQL), it serves as a central node in the global Dataverse Project network. As of 2026, it remains the primary implementation of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, providing researchers with automated DOI (Digital Object Identifier) minting via DataCite. Technically, it offers a modular schema for metadata, supporting domain-specific standards like DDI, Dublin Core, and Schema.org. The platform's API-first design enables deep integration with computational notebooks like Jupyter and RStudio, as well as institutional identity providers via Shibboleth and OAuth2. Its market position is solidified by its status as a non-profit, community-governed alternative to commercial repositories, offering unmatched granularity in metadata management and long-term digital preservation through its integration with Archivematica. It is optimized for both individual researchers needing to meet funder mandates and large-scale institutions requiring a scalable data infrastructure.
Harvard Dataverse is a robust, open-source research data repository software designed to facilitate the sharing, preservation, and citation of scholarly data.
Explore all tools that specialize in metadata harvesting. This domain focus ensures Harvard Dataverse delivers optimized results for this specific requirement.
Explore all tools that specialize in archive research data. This domain focus ensures Harvard Dataverse delivers optimized results for this specific requirement.
Integration with DataCite and EZID to automatically assign a persistent Digital Object Identifier upon publication.
Supports the Open Archives Initiative Protocol for Metadata Harvesting to increase dataset discoverability.
Permissions can be set at the Dataverse, Dataset, or individual File level, including IP-based restrictions.
Automatically extracts metadata and variable-level information from SPSS, Stata, and R files.
Customizable forms that users must fill out before downloading data, capturing lead and usage information.
Configurable backends for S3, Swift, or local file systems via the Dataverse storage abstraction layer.
Full semantic versioning (Major.Minor) with side-by-side comparison of metadata changes.
Create a Harvard Dataverse account using institutional SSO or email.
Generate an API Token from the user account settings for automated deposits.
Create a 'Dataverse' (a container for your datasets) and define its theme and widgets.
Configure metadata blocks (e.g., Social Science, Life Sciences, or Geospatial).
Create a 'Dataset' within the Dataverse, providing mandatory citation metadata.
Upload data files and documentation (e.g., README, Codebooks).
Utilize the 'Explore' tool to verify tabular data ingest and summary statistics.
Set file-level permissions to 'Restricted' or 'Public' and configure Guestbooks.
Submit the dataset for review or click 'Publish' to mint a permanent DOI.
Use the Dataverse Native API to integrate the dataset into external web pages or tools.
All Set
Ready to go
Verified feedback from other users.
"Highly regarded for its scientific integrity and metadata standards, though some users find the UI slightly dated compared to commercial alternatives."
Post questions, share tips, and help other users.

Accelerate digital transformation with the industry's leading Data Intelligence Platform.

The premier high-fidelity knowledge platform for peer-reviewed academic content and AI-ready scholarly data.

The global gold-standard repository for verified, peer-reviewed open access research metadata.

An inclusive journal community advancing open science for the benefit of all.
Apache Avro is a data serialization system providing rich data structures and a compact, fast, binary data format.
DataGroomr is an AI-powered solution that makes Salesforce data quality fast, accurate, and effortless.
Data.world is an enterprise data catalog that helps organizations turn data chaos into clarity, enabling better data discovery, governance, and AI initiatives.