OAI-PMH ListRecords returns items with broken encoding


I’m getting results with broken UTF-8 encoding when performing a ListRecord request on the OAI-PMH endpoint with this set:


Here is a test showing the problem using command line tools:

Command (set CROSSREF_TOKEN to your authorization token and replace placeholder with actual host name, which the forum software won’t let me insert):

curl --silent -H “Authorization: Bearer $CROSSREF_TOKEN” “https://[IHOST NAME HERE]/oai?verb=ListRecords&set=J:10.1007:753” | xml_grep ‘//title’ | grep Riemanns

Output (I’ve inserted <?> where you should see a broken character):

Riemanns fr<?>he Notizen zum Mannigfaltigkeitsbegriff und zu den Grundlagen der Geometrie

thank for your help!

Hi, and thanks for your post.

OAI-PMH is working correctly in this case. The broken character came from the metadata supplied to us by the publisher when they registered 10.1007/bf00327859

You can see the same replacement character if you query 10.1007/bf00327859 in the API or metadata search as well, e.g.

So, it’s not a problem with OAI-PMH. It’s just a metadata quality issue. Because Crossref can’t supply or update bibliographic metadata directly (it always has to come from the publisher or an organization working on the publisher’s behalf), what we can do in these situations is pass your concerns along to our contacts at the relevant publisher and ask them to submit a corrected metadata record for that item. In this case, the publisher is Springer Nature. You may wish to contact them directly as well, for the fastest possible response.