When I query some DOIs to retrieve the publication text (API of Crossref URL + DOI + /transform/text/plain), sometimes I get strange characters. E.g., this happens with DOIs 10.1145%2F3544548.3580875 and 10.1016%2Fj.inffus.2006.10.007.
It seems a problem of encoding. Can this be solved? Any tip is welcome.
Yes, it is due to encoding. "ââŹâ is the iso-8859-1 version of the UTF-8 encoded character â . Youâll see similar encoding presentations for other characters, for example apostrophes. So, itâs likely whatever context youâre accessing that text in is not equipped to handle the UTF-8 encoding, or at least isnât doing so by default.
When I query some DOIs to retrieve the publication text (API of Crossref URL + DOI + /transform/text/plain),
Just to clarify, that query will not retrieve the publication text. It will only retrieve the bibliographic metadata, of the sort that you would find in a formatted citation.
Can you tell more about the method (script, program, client?) youâre using to make these queries and how youâre processing the results? Once I know that I can suggest possible solutions or ask my colleagues on our technical team to do so.
The API isnât really meant to be accessed via a browser. Itâs intended for programatic use.
If you want to query for DOI citations manually, you can use our Metadata Search Interface. Each result will have a link beneath it that says âActionsâ. If you click âActionsâ then âCiteâ, and then select any citation style, youâll get the same results as you do using the /transform/text/plain API query, but without the character encoding problems.
Alternatively, the Citation Formatter at https://citation.crosscite.org/ will do the same thing, but for DataCite and mEDRA DOIs, in addition to Crossref DOIs.
If you prefer to use the API in the browser, the only workaround that Iâm aware of to get those characters encoded properly is:
save the page as an html file (you will be prompted to save it as .txt, but replace the .txt extension with .html manually)
open that html file in a text editor, and add <meta charset="utf-8"> at the top of the file, before the citation text.
save that change and then re-open the edited file in your browser.
And, finally, if youâd like to familiarize yourself with programatic API querying in a relatively user-friendly environment, Iâd recommend this webinar recording showing how to use the Crossref API in the Postman client.