New and existing members often have questions about how to construct their DOI suffixes. As a reminder, Crossref doesn’t assign DOIs to your content for you. Instead, we assign you a DOI prefix (a string of numbers beginning “10.” and followed by a slash “/”) and you assign your own DOIs by using that prefix and creating a suffix (any string of numbers, letters, and/or certain allowed punctuation marks following the prefix and “/”). Your DOI suffix can be any alphanumeric string, using the approved characters “a-z”, “A-Z”, “0-9” and “-._;()/”.
To review our suggested best practices for establishing your DOI suffix pattern, please review the following:
Occasionally members will attempt to register a DOI that contains characters outside that list of approved characters. Something like a <
or >
or an &
is clearly outside of that approved list of characters, but recently our support team received a question about the hyphen in our approved character list. Which one was it?
A member was attempting to register a new DOI and was receiving an error message like this:
<record_diagnostic status="Failure">
<doi>10.5555/u743z‐04‐ii</doi>
<msg>DOI: 10.5555/u743z‐04‐ii , contains invalid characters</msg>
</record_diagnostic>
That error message is familiar to us on the technical support team - we see it for everything from blank spaces to asterisks - but what was so wrong with DOI: 10.5555/u743z‐04‐ii
?
Yeah, it was the hyphens!
Unicode view of the disallowed hyphens U+2011 in the suffix of DOI 10.5555/u743z‐04‐ii
U+0031 : DIGIT ONE
U+0030 : DIGIT ZERO
U+002E : FULL STOP {period, dot, decimal point}
U+0035 : DIGIT FIVE
U+0035 : DIGIT FIVE
U+0035 : DIGIT FIVE
U+0035 : DIGIT FIVE
U+002F : SOLIDUS {slash, forward slash, virgule}
U+0075 : LATIN SMALL LETTER U
U+0037 : DIGIT SEVEN
U+0034 : DIGIT FOUR
U+0033 : DIGIT THREE
U+007A : LATIN SMALL LETTER Z
U+2011 : NON-BREAKING HYPHEN
U+0030 : DIGIT ZERO
U+0034 : DIGIT FOUR
U+2011 : NON-BREAKING HYPHEN
U+0069 : LATIN SMALL LETTER I
U+0069 : LATIN SMALL LETTER I
Unicode view of that DOI 10.5555/u743z-04-ii corrected to use approved hyphen U+002D
U+0031 : DIGIT ONE
U+0030 : DIGIT ZERO
U+002E : FULL STOP {period, dot, decimal point}
U+0035 : DIGIT FIVE
U+0035 : DIGIT FIVE
U+0035 : DIGIT FIVE
U+0035 : DIGIT FIVE
U+002F : SOLIDUS {slash, forward slash, virgule}
U+0075 : LATIN SMALL LETTER U
U+0037 : DIGIT SEVEN
U+0034 : DIGIT FOUR
U+0033 : DIGIT THREE
U+007A : LATIN SMALL LETTER Z
U+002D : HYPHEN-MINUS {hyphen, dash; minus sign}
U+0030 : DIGIT ZERO
U+0034 : DIGIT FOUR
U+002D : HYPHEN-MINUS {hyphen, dash; minus sign}
U+0069 : LATIN SMALL LETTER I
U+0069 : LATIN SMALL LETTER I
The member was using a non-breaking hyphen (U+2011) instead of a hyphen-minus (U+002D). The only approved hyphen for use in a DOI suffix is the hyphen-minus (U+002D).
As a result, we added this note to our documentation for DOI suffixes: the non-breaking hyphen (U+2011), figure dash (U+2012), en dash (U+2013), and em dash (U+2014) are not approved characters. The only approved hyphen is the hyphen-minus (U+002D).
With many of us using different types of keyboards in different areas of the world, it’s important to understand the differences between these unicode characters and the errors that may result from their use.
Thanks for reading,
Isaac