Encoding when querying DOI

slCr · 22 July 2023 11:46

When I query some DOIs to retrieve the publication text (API of Crossref URL + DOI + /transform/text/plain), sometimes I get strange characters. E.g., this happens with DOIs 10.1145%2F3544548.3580875 and 10.1016%2Fj.inffus.2006.10.007.

It seems a problem of encoding. Can this be solved? Any tip is welcome.

Shayn · 22 July 2023 22:58

Hello, and thanks for your post.

Yes, it is due to encoding. "â€“ is the iso-8859-1 version of the UTF-8 encoded character – . You’ll see similar encoding presentations for other characters, for example apostrophes. So, it’s likely whatever context you’re accessing that text in is not equipped to handle the UTF-8 encoding, or at least isn’t doing so by default.

When I query some DOIs to retrieve the publication text (API of Crossref URL + DOI + /transform/text/plain),

Just to clarify, that query will not retrieve the publication text. It will only retrieve the bibliographic metadata, of the sort that you would find in a formatted citation.

Can you tell more about the method (script, program, client?) you’re using to make these queries and how you’re processing the results? Once I know that I can suggest possible solutions or ask my colleagues on our technical team to do so.

Thanks,
Shayn

slCr · 23 July 2023 07:01

I am just accessing through a URL in the browser. In my workflow, it is very useful to do it like that. Any suggestion is welcome!

Shayn · 24 July 2023 14:23

The API isn’t really meant to be accessed via a browser. It’s intended for programatic use.

If you want to query for DOI citations manually, you can use our Metadata Search Interface. Each result will have a link beneath it that says “Actions”. If you click “Actions” then “Cite”, and then select any citation style, you’ll get the same results as you do using the /transform/text/plain API query, but without the character encoding problems.

Alternatively, the Citation Formatter at https://citation.crosscite.org/ will do the same thing, but for DataCite and mEDRA DOIs, in addition to Crossref DOIs.

If you prefer to use the API in the browser, the only workaround that I’m aware of to get those characters encoded properly is:

save the page as an html file (you will be prompted to save it as .txt, but replace the .txt extension with .html manually)
open that html file in a text editor, and add <meta charset="utf-8"> at the top of the file, before the citation text.
save that change and then re-open the edited file in your browser.

And, finally, if you’d like to familiarize yourself with programatic API querying in a relatively user-friendly environment, I’d recommend this webinar recording showing how to use the Crossref API in the Postman client.

Topic		Replies	Views
Question about characters in DOI suffixes Metadata Retrieval rest-api , doi-suffix , metadata-retrieval	2	581	26 July 2023
Citation DOIs not accepted in XML upload Content Registration references , cited-by , citation , encoding	3	798	31 October 2022
OAI-PMH ListRecords returns items with broken encoding Technical Support oai-pmh , metadata-retrieval , metadata-quality	1	1266	21 November 2022
Fix for querying DOIs with non-ASCII characters Metadata Retrieval metadata-retrieval , xml_api	0	774	18 October 2022
"Cite" functionality in Crossref Metadata Search not working Technical Support metadata-retrieval , crmds , content_negotiation	1	986	1 August 2022

Encoding when querying DOI

Related topics