What is a data repository?
A data repository is a service that commits to long-term storage and preservation of data, provides a persistent identifier (Digital Object Identifier, or DOI) that points to the data and a description of the data, and typically makes data publicly accessible (sometimes with restrictions).
Can't I just email people my data when they ask for it?
Funders increasingly require data to be deposited in a repository for long-term preservation and sharing rather than simply being shared "upon request", since the latter model results in low levels of data sharing and lost data. Submitting to a data repository decreases the burden of sharing for both parties.
Choosing a data repository
Data repositories range from hyper-specific (accepting only one type of data and requiring very specific metadata, such as GenBank for DNA sequences) to general (accepting any type of data from anyone on any topic with minimal required description). The best choice for your data is the place where it is most likely to be found by others who are generating or interested in similar data.
To select a repository:
Contact a librarian using the information in the sidebar for assistance with selecting a repository and using Figshare.
If you've been following your data management plan, your data should already be well-organized and nearly ready to publish. The next steps are to:
De-identifying data: If your data contain personal information, you will need to share it according to your IRB-approved protocol in a way that protects individual research participants. This guide from Johns Hopkins Libraries provides basic and advanced techniques for data de-identification.
Organize your data. Name files uniquely and in a way that logically reflects your research, and organize them in a folder hierarchy. Files over ~5GB in size may cause problems when being uploaded to or downloaded from a repository, so if possible try to keep files smaller.
Additional resources on preparing data for publication:
Have questions about preparing and publishing your data? Know of a resource you think we should add? Let us know.