I noticed a very minor data discrepancy relating to the ISSN of a publication between the JSON and XML endpoints. In the later the ISSN is missing the hyphen, so appears as 8 numbers together.
And I was wondering if this was a one-off or symptomatic of a wider issue - perhaps with that publisher? Here is another DOI exhibiting the same issue:
Thanks for your question.
The XML reflects the ISSN exactly how the publisher submitted it to us. We accept them with or without the hyphen. They’re treated exactly the same either way.
When the XML is processed for indexing in our REST API (JSON), the ISSNs are normalized so they all include the hyphen, even if the publisher didn’t use the hyphen in the metadata they submitted to us. It’s just for consistency. There’s no difference in meaning.
OK, good to know, then we can do the same when pulling from the XML endpoint - many thanks for the rapid answer!
As a follow-up, do you have a list of fields where you apply similar normalisations? Mainly it would help us catch where we can also do this, rather than switch over our current implementation to the JSON endpoint.
Hi, I’m still looking into this but wanted to share some preliminary findings. Unfortunately we don’t have any user-friendly documentation about what normalization we do for the JSON output. As I look through the codebase, it looks to me like ISSNs are the only metadata field that we currently normalize.
However, we are in the process of standing up a new internal data model at Crossref, and many more metadata fields are planned for normalization. Here is the issue for that work, and here are the tests/specs. Be advised that this work is still pending and the specification is subject to change.
Also- I know you said that you are trying to avoid having to switch the REST JSON API, but you may want to reconsider that approach.
First, as you have already noted, you will continually have to apply normalisations to the XML to match the normalisations applied to the JSON.
The XML is mostly just going to represent what the member registered with us. The REST API and JSON, on the other hand, will increasingly include:
- additional metadata from Crossref and other sources
- additional metadata types that are not registered via XML
- normalisations (as with the ISSN) to help make the metadata more usable (e.g. by citation formatters, etc)
- access to non-work metadata and functionality (e.g. member data, submissions status information, billing data, etc)
For example, our recently announced opening of the RetractionWatch data will only ever be made available via the REST API. Our upcoming enhanced relationships support will also only be available via the REST API.
Finally, it is also worth noting that the REST API follows the “be conservative in what you send, be liberal in what you accept” principle. So, even though the REST API represents the ISSN with the hyphen according to the ISSN guidelines, you can search and filter for ISSNs with or without the hyphen. For example:
Many thanks for the detailed response - I will feed it back to the team and advocate we switch over to capture the full benefits of the data your provide
I appreciate the detailed answer and links - many thanks (and apologies for the delayed response, was out of office the last weeks).