Registering datasets

Our university is creating a data repository and we have three questions about the Crossref dataset registry:

  1. Is it necessary to use a different DOI prefix to register datasets?
  2. In the DOI registration form we usually fill out we can’t find the option for dataset registration. Where could we find it?
  3. Does Crossref have a technical manual for dataset registration?

Thanks in advance.

Hello @Soledad,

You can use the prefix assigned to your account to register all content types.

You can’t actually use the Web Deposit Form to register datasets unfortunately. But you can find an example XML file for datasets here: best-practice-examples/dataset.5.3.0.xml · master · crossref / Schema · GitLab

So you can adjust the XML file and then deposit that file to us as per below:

You can deposit the xml file to our system following these instructions:

  • Login to here: using your credentials sent over to you during your registration
  • Click on ‘Submissions’
  • Click on ‘Upload’
  • Select your xml file by clicking the ‘Choose file’ button
  • Select the type ‘Metadata’
  • Click upload

Here is the markup guide for datasets: Datasets markup guide - Crossref

I hope this helps.


Hi @Soledad

Although it is possible to register datasets with Crossref, we wouldn’t necessarily recommend it. If your university is creating a data repository, it’s much better for them to work with another Registration Agency altogether for this - DataCite.

DataCite develop and support tools and methods that make data more accessible and more useful. If someone in the scholarly community is trying to find data, they will search and use the DataCite database, they wouldn’t necessarily think to come to Crossref for that sort of content. Although we have a schema for datasets it’s very basic and doesn’t contain rich metadata fields. Our schema and reference linking infrastructure are set up specifically to support and provide services around published content, rather than data.

I would recommend that you contact DataCite to discuss the needs of your data repository. You can find out more about this on our website.

All the best



THe prefix can be always the same. The suffix must be different. For example:
dataset 1 : doi:10.12345/0987yhy
dataset 2 : doi:10.12345/edfthkjjjoij

1 Like

I can’t seem to figure out what the difference is between dataset_type=“collection” and dataset_type=“record” because these terms aren’t defined anywhere in the schema. To my mind, all databases are inherently ‘collections’, especially relational databases. What value is there in creating a DOI for an individual database ‘record’ (like a line in a spreadsheet)? Neither the markup guide or XML example for datasets explain this distinction either.

Also, what is the difference between the ‘Database Level’ and the ‘Dataset Level’? I see in the schema that a ‘dataset’ is contained in a ‘database’, but what do those terms actually mean for registering DOIs and how do they relate to actual databases? It looks like most of the metadata you can record under the ‘dataset’ tag matches what you can record in the ‘database_metadata’ tag, except for funding, format, and citations.