I have been using CrossRef for some time now and mostly for linking journal article DOI’s to references DOIs as to create a graph to explore the research field.
I started to look into more aspects of the API and this is my current use case:
If I find an interesting article by a certain author, I often like to find more works by that author.
This is easy to do if you have the author ORCID, but here is the catch: 92% of all works in CrossRef is attributed to authors without an ORCID. The latter can be inferred by comparing the total-results of these API calls: “/works?rows=0” and “/works?filter=has-orcid:true&rows=0”.
How do you find works of a specific author when you use query.author= without going through 10k plus results? Even if I locate the correct author name, how do I filter my results to include works of only that author?
How can we refine the query so it will match the string in the query.author exactly and not all matching strings?
Would it be useful to combine query.author with other queries such as query.affiliation to narrow down our results?
So the best way is to use the query.author route but this would return all of the results for all names that are entered in the search.
From here there is no way to only return the exact matches unless it was done programmatically. The API should return the top “scoring” results at the top which would be the results that return the most matches from the query itself. So you might find the results you need are only in the first page and the other results are just very slight matches to the query search.
Adding the query.affiliation I don’t think will help as I believe that it will be doing a separate query rather than doing an “AND” query.
Can you send me over some examples so that we can have a look at them and see how we might be able to improve the results.
I am looking for the author named Pathak (Arjun K.) who worked for Ames Laboratory on materials research until 2019. The author does not have an ORCID in CrossRef for articles published before 2019.
This leaves me between a rock and a hard place as I cannot perform an exact query on surname and initials, nor add a filter on affiliation. Therefore, I seem unable to find any woks published by a single author in CrossRef.
I retrieved the author’s ORCID from ORCID: 0000-0003-0690-7300 and now I can query all the author’s articles published after 2017 with the ORCID in CrossRef as so:
Yes you are correct, you can use multiple field queries now I realise. Although you are still performing a string query search which can never be 100% accurate. This is why we have been recommending ORCiD iDs within metadata for a while now.
It is really hard when authors are adding a free string value to be able to return exact results but if you try your query without the sort section then it will default back to sorting based on the score so the most accurate matches are at the top of the list. If you also add the score to the select field then you can monitor the score and see where a drop-off is and that will probably be where you see the matches you wouldn’t be wanting returned.
The score element is a welcome addition and I have not seen or used it before.
This is why we have been recommending ORCiD iDs within metadata for a while now.
From a REST API user-perspective I welcome the same, but I feel limited of what I can retrieve with an ORCID in CrossRef metadata, because most works are not associated with an ORCID yet.
From what I can infer from the metadata in the API, ORCIDs started to be deposited in CrossRef since 2012.
Cumulative number of articles
My scope is journal articles, so I was interested to see the adoption rate of ORCIDs for all works published since 2012. Below are charts of my findings:
The percentage of journal articles with at least one ORCID is 20% of the total number of articles published since 2012.
Articles per year
However, the number of published articles per year in CrossRef is increasing and so is the percentage of articles with ORCID. In 2022, 43% of the published articles in CrossRef that year had at least one ORCID.
Over time, also more ORCIDs of older published works are being registered.