Ambiguities in the fundref schema

mccurley · 8 November 2022 19:19

I’m in the process of building a new submission and publishing system (sort of like OJS, but specifically tailored to LaTeX content for science and engineering content). We want authors to encode their funding information into their publication using unique identifiers for authors, affiliations, and funding agencies. The funder registry is quite useful for this, but I have identified a few ambiguities in the way fundref information is specified. Some of my comments could be addressed by better documentation, but future versions of the schema might wish to address the other issues.

there are few examples of how to encode funding information. The documentation says you can do it as part of crossmark and best-practice-examples / journal.article5.3.0.xml has it as part of crossmark. This seems odd to me, since the stated purpose of crossmark seems orthogonal to the purpose of funding information. The schema says “If a DOI is not participating in CrossMark, FundRef data may be deposited as part of the <journal_article> metadata.” but unfortunately there is no example and I cannot find any reference to it in the <journal_article> element in 5.3.1. This should be clarified. The recent poster by de Jonge and Kramer in Live22 titled “The availability and completeness of funder metadata in Crossref” identified that many publishers are not reporting funding information, and I think it could be due in part to incomplete documentation.
the funder registry appears to be hierarchical. If you look at 100000083, it’s the computer science directorate for NSF, but there are five subdirectorates listed there including division of computer and network systems. The funding agencies probably want to have grants identified at the finest level of detail possible. This is not really mentioned in the documentation.
it seems that funding is linked to publications rather than authors, whereas affiliations are linked to authors. This seems ambiguous since it isn’t obvious how the funding is linked to the publication. An author may have multiple funding sources just as they might have multiple affiliations. Multiple authors may have the same funding sources. These many-to-many relationships between authors and funding sources don’t seem to be reflected in the schema.
the tools for authors or publishers to identify the ID of their funding agencies are not very user-friendly. The CSV data for the fundref registry only lists a single name, but the JSON information actually provides many alternate names (and contains more than the RDF). I built a small web app that allows an author to enter the name of their funding agency and search the registry to find the correct the identifier - this seems lacking in the crossref tools. I will probably open source this independently, although we also plan to include it in our submission system. It uses advanced search features like stemming (not just autocomplete).
some funding agencies are proxies for others, and this is not reflected in the funder’s registry. As an example, the American Institute of Mathematics is listed, but they are funded by the Mathematical Sciences Directorate of NSF, as well as the NSF Directorate for STEM Education and others. Whose responsibility is it to properly identify the indirect funding agencies? Is it American Institute of Math or NSF or the author?
some funding agencies are not identified in the fundref registry. An example is Mathematical Sciences Research Institute (MSRI). Is it possible to report funding from this agency given that it lacks an identifier?
In the crossref API, funding agencies have a “location” field that appears to be just a country. Some have finer-grained location (an example is 100000900). The “location” field should probably be renamed to “country” or made more fine-grained.

PFeeney · 9 November 2022 14:35

I’m going to defer to our product team regarding the Funder Registry as they currently maintain that, but I can comment on some of your questions -

there are few examples of how to encode funding information. The documentation says you can do it as part of crossmark and best-practice-examples / journal.article5.3.0.xml has it as part of crossmark.

the funder registry appears to be hierarchical. If you look at 100000083, it’s the computer science directorate for NSF, but there are five subdirectorates listed there including division of computer and network systems. The funding agencies probably want to have grants identified at the finest level of detail possible. This is not really mentioned in the documentation.

Thanks for this feedback, I think the markup in particular was captured at some point but our documentation has gone through a lot of revision since we started collecting funding data, and can use some cleaning up - I’ll see what we can do to make this more clear.

it seems that funding is linked to publications rather than authors, whereas affiliations are linked to authors

This is correct - the scope of the funding data we collect in DOI records is to connect research outputs to funders. It’s a bit crude, but provides some basic but important info on what articles etc. result from funding. Connecting authors directly with funding is beyond the scope of the funding data we collect, and is probably beyond the scope of what we collect in general (for now at least) - our records are for publications, grants, and other digital objects, not people.

That said, we do collect investigator details within grant records, and part of our ‘research nexus’ vision is to make these connections explicit, so the metadata we collect will help enable direct person-to-funding connections in the future, probably not in Crossref metadata records but maybe through ORCID iDs or relationships between grant records and other objects.

rlammey · 9 November 2022 14:52

Thanks Patricia, my addition on the funder registry side is:

the tools for authors or publishers to identify the ID of their funding agencies are not very user-friendly. The CSV data for the fundref registry only lists a single name, but the JSON information actually provides many alternate names (and contains more than the RDF). I built a small web app that allows an author to enter the name of their funding agency and search the registry to find the correct the identifier - this seems lacking in the crossref tools. I will probably open source this independently, although we also plan to include it in our submission system. It uses advanced search features like stemming (not just autocomplete).

When we launched the funder registry we built a Labs/R&D ‘fundref’ widget. I think what we found was that service providers/manuscript submission systems said that preferred to build their own integrations, but this widget aimed to do some of the things you describe e.g. querying the .json directly.

some funding agencies are proxies for others, and this is not reflected in the funder’s registry. As an example, the American Institute of Mathematics is listed, but they are funded by the Mathematical Sciences Directorate of NSF, as well as the NSF Directorate for STEM Education and others. Whose responsibility is it to properly identify the indirect funding agencies? Is it American Institute of Math or NSF or the author?

We maintain the registry in collaboration with the funding operations team at Elsevier. We work with them to make changes to the registry which come from a number of sources - sometimes requests come from the funder (that’s often the case for US agencies like NSF that are multi-layered), sometimes from publishers, sometimes from researchers. We compile those every few weeks and send them to Elsevier who evaluate and compile those against data that they collect too. Those changes/additions that are accepted are added to the registry around every six weeks.

some funding agencies are not identified in the fundref registry. An example is Mathematical Sciences Research Institute (MSRI). Is it possible to report funding from this agency given that it lacks an identifier?

We can request additional funders to be added to the registry so we can add a request for MSRI? I know that most integrations do have an ‘other’ option, but matching a funder to its identifier helps us support matching/linking in the metadata rather than dealing with text strings. So if you have funders who we should add, then let me know and we can put in a request.

In the crossref API, funding agencies have a “location” field that appears to be just a country. Some have finer-grained location (an example is 100000900). The “location” field should probably be renamed to “country” or made more fine-grained.

Yes, that’s a good point. I’ve been working on the registry for a few months now and I’m not sure if this has been raised before, but I’ll pick it up and ask.

mccurley · 9 November 2022 18:50

I saw the funding widget, but it has a bunch of problems like a jquery dependency. It also depends on the crossref API that doesn’t do search very well (it appears to not support stemming, but I may be biased since I worked at Google for 12 years). It also lacks the narrower value in the return, though there is apparently an undocumented descendants=true feature. I also couldn’t get pagination to work on the /funders endpoint. In the end I built my own search based on the RDF. Just for the record RDF is a godawful™ format that nobody likes. Unfortunately the JSON and CSV formats lack some of the data.

I don’t have any association with MSRI, other than to receive funding decades ago. They have a ROR ID 05hs5r386, a GRID ID, and an ISNI. NSF has assigned them an entity ID of LWLJAPATKEL8 as part of their grant relationship with NSF. They are getting $5-7M per year from NSF and recently received an endowment grant of $70M from Simons/Laufer. Their omission seems notable, but I have no idea what else is missing from the registry.

I think this relates to my question of who “owns” a record in the funding registry, and who takes responsibility for creating and maintaining a record. It’s mostly in the interest of the institution to have their funding recognized in publications, but that doesn’t mean they will. A funder’s ID appears to require registration of a DOI under 10.13039, and I assume that crossref owns that prefix. Anyone can register an ORCID ID, and there are a lot of bogus records. The scale of the funders registry is much smaller (currently 32k entries), so maybe a single person could vet the data. Authors are being encouraged to record their grants, but it’s much better if they can use unique IDs for the grantor. If they don’t find an ID then they may not mention the funding, so it’s a chicken and egg problem. We plan to also support ROR IDs on funding agencies in order to increase coverage. It’s particularly relevant for scientists who receive travel or visitor support, but lack an affiliation from the institution they visit. I have very little confidence that publishers can figure this out.

Some entities may even not want to be listed, since they sometimes have a political motivation they wish to conceal (e.g., the Tobacco Institute or the CIA). The Tobacco Institute is now disbanded, but CIA is listed. They may still request that some grant recipients keep it secret, but that’s up to them I guess. There are others that fund anti-climate-change research in secret (I know of one that keeps changing their name). You have policy issues behind these decisions, and it may be hard for you to recognize who is authorized to make a request for inclusion or exclusion. I would be particularly careful with the “narrower”, “broader”, and “alt-names” attributes. Good luck with that.

I found a few suspicious values for “narrower” and “broader” in the database by trying a few queries, like

id=100007183, name=“University of California, Santa Barbara”, has a narrower relationship to id=100011453, name=“Arkansas Pharmacists Association”. That seems incredibly unlikely.
id=100005582, name=“Center for Retirement Research, Boston College” is not shown with a broader relationship of Boston College (the two are unlinked).

It’s possible that these are legitimate - who decides? And presumably this is the wrong place to report data problems.

Notice also that the location field sometimes contains “EUE”. That’s an ISO 3166-1 alpha-3 code for the European Union, so I guess it’s not typically thought of as a country (though they do issue passports).

amandafrench · 9 November 2022 22:49

Just wanted to chime in here, Kevin, to say that since you’re planning to “support ROR IDs on funding agencies in order to increase coverage” we’d be interested in your feedback on the ROR API and happy to take requests for additions or modifications to records for funders in the ROR registry. (I’m Technical Community Manager for ROR.) We’ve actually been doing some work to reconcile the Funder Registry with ROR in the past few weeks, and that will continue for quite awhile.

Regarding the first, we’ve got a couple of open requests for feedback on 1) testing our new method of handling organization status in ROR, and 2) ROR schema and API versioning at Discussions · Ideas · ror-community/ror-roadmap · GitHub you could take a look at if you’re interested / willing.

Regarding the second, you can request a single change to a record by submitting a GitHub issue at Issues · ror-community/ror-updates · GitHub or via our request form. If you want to make a batch request, you can contact registry@ror.org and we’ll help with that.

mccurley · 10 November 2022 03:21

I’ll be honest to say that I haven’t looked at the ROR API very much. My first effort searching on “Mathematics” didn’t find the American Mathematical Society (I can’t link to URLs here). That’s the same stemming problem. We don’t use elastic search on our servers because it’s a resource hog. If we expected higher load on our servers we might have made different choices. Search ranking is another issue - it’s not clear who the most important response is for “Mathematics” so you have to just guess. If you had data about how many publications came from a given ROR ID then you could do better ranking (hint hint). It’s not from “Illustrative Mathematics”.

I’ll try to take a look at the ROR roadmap in the copious free time of the days to come.

amandafrench · 10 November 2022 22:42

Gotcha! Yep, we do use ElasticSearch, and no stemming, so any query on “Mathematics” isn’t going to bring up the AMS. We would definitely never ever rank search results by the number of publications associated with an organization: that would be completely contrary to the mission of ROR, though others can certainly use ROR to create that kind of importance ranking if they like.

It’s not really what you’re asking for, but we do have an endpoint that takes long messy text strings and ranks the results based on the most likely match, e.g., https://api.ror.org/organizations?affiliation=“American%20Mathematical%20Society%20(AMS)%20·%20201%20Charles%20Street%20Providence,%20Rhode%20Island%2002904-2213” since a lot of publishers and universities have those kinds of text strings in their systems already.

But yes! Comments on the ROR roadmap welcome!

Topic		Replies	Views
Conditions for linking additional funder to publication Funder Registry	3	1129	13 January 2022
Problems with reporting funding with ROR XML Deposit funders	1	178	10 April 2024
Open funding metadata through Crossref; a workshop to discuss challenges and improving workflows - Crossref Funder Registry funder-registry , blog , funders	1	620	7 September 2023
About the Funder Registry category Funder Registry	0	1319	21 April 2020
About the Grant Registration category Grant Registration content-registration , grants , funders	0	573	22 February 2023

Ambiguities in the fundref schema

Related topics