Overview

Harvard Dataverse is a robust, open-source research data repository software designed to facilitate the sharing, preservation, and citation of scholarly data. Built on a Java-based architecture (utilizing Payara and PostgreSQL), it serves as a central node in the global Dataverse Project network. As of 2026, it remains the primary implementation of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, providing researchers with automated DOI (Digital Object Identifier) minting via DataCite. Technically, it offers a modular schema for metadata, supporting domain-specific standards like DDI, Dublin Core, and Schema.org. The platform's API-first design enables deep integration with computational notebooks like Jupyter and RStudio, as well as institutional identity providers via Shibboleth and OAuth2. Its market position is solidified by its status as a non-profit, community-governed alternative to commercial repositories, offering unmatched granularity in metadata management and long-term digital preservation through its integration with Archivematica. It is optimized for both individual researchers needing to meet funder mandates and large-scale institutions requiring a scalable data infrastructure.

Common tasks

Research Data Archiving Persistent Identifier Generation Metadata Harvesting Restricted Data Access Control Data Versioning