Data Repositories List
When you're ready to deposit your data for sharing and archiving, look into whether there is already a repository in your field where the most likely users of your data would look. The following sites maintain lists of many repositories that accept research data:
- Research Data Repositories - database compiled by Duke Libraries
- Data Repositories - list compiled by the Open Access Directory
- Other Data Repositories - list compiled by Purdue University Libraries
- Repository List - from the DataCite project
- Archives and Repositories for Data - list compiled by the University of Minnesota Libraries
This following is a selected list of data repositories available through other institutions. If you know of any other data repositories that should be included, please send the details to the ITS Service Desk (firstname.lastname@example.org). CWRU is not responsible for any of the content of the sites listed here.
The Long Term Ecological Research (LTER) Network is a collaborative effort involving more than 1800 scientists and students investigating ecological processes over long temporal and broad spatial scales. The Network promotes synthesis and comparative research across sites and ecosystems and among other related national and international research programs.
This site is an interface to a crystal structure database that includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals.
The Arts and Humanities Data Service (AHDS) is a UK national service aiding the discovery, creation and preservation of digital resources in the arts and humanities. Currently, their collection covers history, archaeology, Literature, Languages & Linguistics, visual and performing arts. Funding for the AHDS ceased in 2008, however links to its partner sites are still active.
The COD, once finalized, will be nothing else than a keyword-searchable Web server of crystal structure atomic coordinates, preserving the data after publication as well as unpublished data.
DLESE is a distributed community effort involving educators, students, and scientists working together to improve the quality, quantity, and efficiency of teaching and learning about the Earth system. In pursuing this mission DLESE provides access to Earth data sets and imagery, including the tools and interfaces that enable their effective use in educational settings.
An archive of digital data on archaeological research from the Netherlands
eCrystals - Southampton is the archive for Crystal Structures generated by the Southampton Chemical Crystallography Group and the EPSRC UK National Crystallography Service.
The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) is a NASA-sponsored source for biogeochemical and ecological data and models useful in environmental research. All of our data sets and model products are free of any costs to you (including shipping).
The Inter-university Consortium for Political and Social Research is an organization of member institutions working together to: Acquire and preserve social science data, provide open and equitable access to these data, and promote effective data use.
IFPRI provides the following types of agriculture and socio-economic datasets: Geospatial Data, Household and Community-level Surveys, Institution-level Surveys, Regional Data, and Social Accounting Matrices.
The National Digital Archive of Datasets (NDAD) preserves and provides online access to archived digital datasets and documents from UK central government departments on a wide range of subjects.
The NGDRS is a system of geoscience data repositories, providing information about their respective holdings accessible through a web-based super catalog.
Climate atmospheric data from the UCAR organization and other participating institutions.
PANGAEA is a public digital library for science aimed at archiving, publishing and distributing geo-referenced data with special emphasis on environmental, marine and geological basic research.
The Reciprocal Net is a distributed database used by research crystallographers to store information about molecular structures; much of the data is available to the general public. The Reciprocal Net project is still under development.
The RRUFF Project is an integrated database of Raman spectra, X-ray diffraction and chemistry data for minerals, with the goal of creating a complete set of high quality spectral data from well characterized minerals.
Data, documents and images from 822 expeditions by the Scripps Institution of Oceanography (SIO) since 1903.
The CDS is a data center dedicated to the collection and worldwide distribution of astronomical data and related information.
DANS is responsible for providing permanent access to research material from the humanities and social sciences. The present DANS collection contains the datasets of the Netherlands Historical Data Archive (NHDA), the Steinmetz Archive and the Scientific Statistical Agency (WSA).
The BADC is the Natural Environment Research Council's (NERC) Designated Data Centre for the Atmospheric Sciences.
A comprehensive collection of information about the subsurface of any given area in Great Britain. The NGDC comprises data gathered or generated by the British Geological Survey in addition to data provided by external organizations.
The NEODC is tasked with the acquisition, archiving and provision of access to remotely sensed data of the surface of the Earth acquired by satellite and airborne sensors.
BODC holds wealth of publicly accessible marine data collected using a variety of instruments and samplers and collated from many sources. They handle biological, chemical, physical and geophysical data and their databanks contain measurements of nearly 10,000 different oceanographic variables.
The AEDC coordinates the management of data collected by UK funded scientists in Antarctica and the Southern Ocean.
The UK Data Archive (UKDA) is a centre of expertise in data acquisition, preservation, dissemination and promotion and is curator of the largest collection of digital data in the social sciences and humanities in the UK.
CEH is a major custodian of environmental data for the UK. We have significant capabilities in data collation and management, and information systems development. We use these skills, together with our data archives, to support large-scale, long-term environmental research.
Ensembl is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes.
NVO's objective is to enable new science by greatly enhancing access to data and computing resources. NVO makes it easy to locate, retrieve, and analyze data from archives and catalogs worldwide.
The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans.
Established as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information
The EBI is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.
MaizeGDB is the community database for biological information about the crop plant Zea mays ssp. mays. Genetic, genomic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the data types accessible through this site.
The Scholars Digital Library of Analytics prides itself as an intact repository of data sets for use in research, education, and reference. Included with each set of data is a description of what the data was initially used for, its subject area, and its number of rows and columns.
The NNDC collects, evaluates, and disseminates nuclear physics data for basic nuclear research and for applied nuclear technologies. The NNDC is a worldwide resource for nuclear data.
The VMDB compiles patient encounter data from nearly all North American veterinary medical colleges. Related databases from the Canine Eye Registration Foundation, Health Information Managers, Equine Eye Registration Foundation and a registry of dogs who have passed the DNA tests for various genetics disorders.
The GEON project is a collaboration among a dozen PI institutions and a number of other partner projects, institutions, and agencies to develop cyberinfrastructure in support of an environment for integrative geoscience research.
The IRIS is a university research consortium dedicated to exploring the Earth's interior through the collection and distribution of seismographic data. Their collection includes waveform data, channel response data, and Event (earthquake) catalogs.
The SCEC's mission is to gather data on earthquakes in Southern California and elsewhere, integrate information into a comprehensive and physics-based understanding of earthquake phenomena; and communicate understanding to society at large as useful knowledge for reducing earthquake risk.
The UNAVCO Facility exists to support research investigators in their use of Global Positioning System technology for Earth sciences research. The Facility performs this task in part by archiving GPS data and data products for current and future applications.
To further promote a collaborative research environment, the BIRN has undertaken the development of the public BIRN Data Repository (BDR) for the biomedical research community. The BDR will provide researchers with a venue to share and exchange their data with the broader biomedical research community, providing for the means to capture, curate, store, query, view, and download imaging and related data.
Data sets include information collected from research facilities and tools, as well as information from climate and weather models created and compiled by NCAR scientists and those in our science community.
The National Human Genome Research Institute launched ENCODE to carry out a project to identify all functional elements in the human genome sequence. The project is being conducted in three phases: a pilot project phase, a technology development phase and a planned production phase.
The Arabidopsis Information Resource collects information and maintains a database of genetic and molecular biology data for Arabidopsis thaliana, a widely used model plant.
The Alaska Satellite Facility, downlinks, processes, archives, and distributes SAR data from the European Space Agency's ERS-1 and ERS-2 satellites, NASDA's JERS-1 satellite, and the Canadian Space Agency's RADARSAT-1 satellite.
The GES DISC is the home (archive) of Precipitation, Atmospheric Chemistry and Dynamics, and information, as well as data. We are one of eight NASA Science Mission Directorate DAACs that offer Earth science data, information, and services to research scientists, applications scientists, applications users, and students.
The GHRC provides both historical and current Earth science data, information, and products from satellite, airborne, and surface-based instruments. The GHRC acquires basic data streams and produces derived products from many instruments spread across a variety of instrument platforms.
NODC maintains and updates a national ocean archive with environmental data acquired from domestic and foreign activities and produces products and research from these data which help monitor global environmental changes. These data include physical, biological and chemical measurements derived from in situ oceanographic observations, satellite remote sensing of the oceans, and ocean model simulations.
The UniProt consortium aims to support biological research by maintaining a high quality database that serves as a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.
The ARM Archive supports the scientific field experiments of the Atmospheric Radiation Measurement (ARM) Program by storing and distributing the large quantities of data collected from these experiments. These data are used to research atmospheric radiation balance and cloud feedback processes, which are critical to the understanding of global climate change.
The National Space Science Data Center serves as the permanent archive for NASA space science mission data. "Space science" means astronomy and astrophysics, solar and space plasma physics, and planetary and lunar science.
HMDC is the principal distributor of quantitative social science data from major international data consortia for Harvard and MIT.
PiiMS provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations.
NSIDC support(s) "research into our world's frozen realms: the snow, ice, glacier, frozen ground, and climate interactions that make up Earth's cryosphere. Scientific data, whether taken in the field or relayed from satellites orbiting Earth, form the foundation for the scientific research that informs the world about our planet and our climate systems.
Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied bio-sciences, including ecology, biology, and medicine. From the National Evolutionary Synthesis Center (NESCent) and the University of North Carolina Metadata Research Center, in coordination with a large group of journals and societies.