Linking references of a publication to other publications

Hello,

I have just downloaded the data from the bulk package of April '23 and am getting my first look at the metadata. I have an example which I am working on, which has the following value for the “URL” field: “doi.org/10.1080/09603100010014041

The “reference” field is an array to the various references, and many of these look like this:
{
“key” : “CIT0001”,
“doi-asserted-by” : “publisher”,
“DOI” : “10.1080/07350015.1989.10509739”
}

So I am assuming that, using the DOI, I would thus be able to relate the reference with the publication being referenced. However, there are some references which are less precise, for example:
{
“key” : “CIT0010”,
“series-title” : “SSRJ Working Paper, 8762”,
“volume-title” : “A test for independence based on the correlation dimension”,
“author” : “Brock A.”,
“year” : “1987”
}
Connecting to the original article in this case might be a bit more difficult… And worse still, some references look like this, without a title and with only one of the authors:
{
“key” : “CIT0019”,
“first-page” : “5”,
“volume” : “17”,
“author” : “French K. R.”,
“year” : “1986”,
“journal-title” : “Journal of Finance”
}

So my question is: how do people usually go about linking refenrces to publications when no unique identifier is available? Is the key CIT0019 useful in any way or is this just local to the current reference? Is an attempt at locating the article via the [volume, author, year, journal title …] doomed to fail? Or is it just that the example I grabbed is incomplete, in which case I wonder if there is a way to flag this to someone?

So far, I have stumbled on this question (on Tagging data citations in a journal article reference list) and this post (on Data Citation: what and how for publishers - Crossref), but it seems this has more to do with how the data is entered than how to relate two publications together…

Thanks for your help, Stephane.

Hi @sga ,

Thanks for your message, and welcome to the Community Forum.

Very good questions here.

So, if you see a DOI within the reference metadata in the JSON or XML output, that means that we have matched the reference with another Crossref DOI, so you should easily be able to find the referenced metadata and/or content.

Like in your example above:

The referenced DOI is https://0-doi-org.libus.csd.mu.edu/10.1080/07350015.1989.10509739 and the metadata for that DOI is available in our REST API here: api.crossref.org/works/10.1080/07350015.1989.10509739

Now, CIT0010 is for a work published in 1987 and CIT0019 for a work published in 1986. While we do encourage our members to register legacy DOIs for their older content, we do not require it. So, it is more likely that a work published in 1986 or 1987 would not be registered with us when compared with a work published in, say, 2016 or 2023.

I suspect that neither the work referenced in CIT0010 nor the work referenced in CIT0019 are currently registered with us. Most likely the publisher in question has not registered those works yet, so there just isn’t a DOI for the content in question. Another possibility, although not as likely, is that the work was registered with a different registration agency, like DataCite or mEDRA or one of our other sister registration agencies.

What we do is store that reference metadata in CIT0010 and CIT0019 and retry the matching process about annually to see if the publisher member in question has registered a DOI with us since the last time we attempted the matching process. I’d say that the metadata completeness for both CIT0010 and CIT0019 could conceivably result in a match, if the content was registered with us (i.e., the metadata isn’t so incomplete that we can’t work with it). Now, metadata quality is another story. If the titles, authors, and page numbers did not match the metadata registered with us then obviously that would make it much harder for us to establish a reference match.

You can always use our Simple Text Query form to see if a citation or list of citations has been registered with us. For instance, I retried that CIT0010 in the form and do not see that we have any metadata that matches that information.

If there were a match, a DOI would be present in the results of the form.

Warm regards,
Isaac

1 Like

Dear @ifarley , thanks so much for your reply. I did not expect so much detail in this initial back and forth; I really appreciate it! My reply is in two parts which relate to 1. further exploration of my initial question with respect to your comments about older publications and 2. your introduction to the Simple Text Query form tool which I did not know about.

  1. Below is a screenshot of some of the references provided below the DOI link for the original example I gave (https://0-www-tandfonline-com.libus.csd.mu.edu/doi/epdf/10.1080/09603100010014041). My first remark is that certain references have a Crossref link, but not all of them. (There is some correlation with older articles not having the link, though this is not always the case.) In particular, CIT0019 does not, and so I thought this was related to the issue I raised; however, CIT0018 does not either, whereas in the Crossref metadata it does have a DOI. My second remark is that despite not having a DOI and having the wrong journal information (the journal in the metadata is “Journal of Finance”,which exists and often publishes those authors, but the article is actually from the “Journal of Financial Economics”, another reputable journal which routinely published those authors), the link via Google Scholar is able to link to the correct article (it seems Google Scholar does the heavy lifting here since the URL passed does in fact contain the partially incorrect metadata).

These remark are meant to point out that some systems are apparently able to do what I am struggling to achieve with the Crossref metadata: do you know of any way to work around these roadblocks I am encountering? In particular, how is it that some of this metadata is related to what I see online (the wrong journal) but incomplete (I do not have the article title in the metadata)?

  1. Regarding the tool, I am unsure how it works: CIT0009 has a proper DOI (api.crossref.org/works/10.1016/0304-4076(95)01736-4), and pasting the title of that publication (“Modeling and pricing long memory in stock market volatility”) nevertheless yields no results:
    image

Thanks again for your help! Stephane.

Thanks for following up. See my answers below.

The reference metadata that is available within Crossref for DOI https://0-doi-org.libus.csd.mu.edu/10.1080/09603100010014041 and all DOIs registered with us is provided to us by our members. We make it freely available to our metadata users for exactly the reasons you’re surfacing now. So, that it is open and we can ask questions of it (and, hopefully enrich it over time).

The short of it is, the reference metadata registered with us for DOI 10.1080/09603100010014041 could use some updating. You’re absolutely right that we do have DOIs registered for the 18th and 19th citations on this list. That is reflected properly in the metadata for the 18th reference, as you can see below, but Taylor & Francis should update the display of the references on the DOI’s landing page for that 18th reference to include DOI https://0-doi-org.libus.csd.mu.edu/10.1080/07474938608800095

As for the 19th reference on the list, a typo in the metadata registered with us, and in the display of the reference metadata in the screenshot you have provided is causing us to not match the reference. If, instead, I update the reference to the correct journal - “Journal of Financial Economics” - I get a match, as you can see here:

So, our matching process is as good as the metadata provided to us. As for Google Scholar, they have yet to openly share how their matching algorithm/process works.

As for your second question about how robust the matching within the Simple Text Query form is, you’ll have to remember that we’re trying to match your search terms to the metadata records of over 152 million plus DOIs. You’ll need to provide a little more bibliographic metadata than just the title in order to get a hit. A standard citation style would yield better results; I’ve used APA below:

Bollerslev, T., & Ole Mikkelsen, H. (1996). Modeling and pricing long memory in stock market volatility. In Journal of Econometrics (Vol. 73, Issue 1, pp. 151–184). Elsevier BV.

I’ve flagged the metadata improvements we’ve discussed above to our technical contacts at T&F so they can review and make the necessary improvements to the article landing page and reference metadata registered with us.

Let me know if you have any additional questions or comments,
Isaac

Thanks again @ifarley! This all makes sense, and I appreciate your insights as to the limitations due to user input as well as you flagging that specific reference. I do have two follow-up questions:

  1. Is there any way for us metadata readers to flag these for your team’s review when we stumble upon such typos / incomplete data?
  2. I have a pretty involved follow-up on the simple text query tool; I have looked at different ways to get the results with less info than the full citation (to address the issues related to incomplete and/or erroneous data), and many of them work but it is unclear to me what will work and what won’t (for example, changing the year +1 or -1 works as long as the title and the first author are there, even without the full citation; the reason I try these is to understand how much tolerance there is for error when searching for an article…). I am ready to post these trials and my findings here, or in another question if it is more appropriate (please let me know which). The short version of the question would be: is there any documentation as to how the simple text query works behind the scenes so that I can adapt the queries? For example, is there, for every simple query text an api equivalent filtering method?

Best, Stephane

Yes, our current mechanism is for you to flag that to our support team at support@crossref.org and then we forward that to the technical contact that we have on file for the member in question.

I have a pretty involved follow-up on the simple text query tool; I have looked at different ways to get the results with less info than the full citation (to address the issues related to incomplete and/or erroneous data), and many of them work but it is unclear to me what will work and what won’t (for example, changing the year +1 or -1 works as long as the title and the first author are there, even without the full citation; the reason I try these is to understand how much tolerance there is for error when searching for an article…). I am ready to post these trials and my findings here, or in another question if it is more appropriate (please let me know which). The short version of the question would be: is there any documentation as to how the simple text query works behind the scenes so that I can adapt the queries? For example, is there, for every simple query text an api equivalent filtering method?

The best I can do on this is to refer you to what Dominika Tkaczyk, our Head of Strategic Initiatives, wrote about the topic back in these two blogs from late 2018:

and

The information in those posts is still accurate today.

-Isaac

Our technical contacts at Taylor and Francis have made these improvements (and others) to the citation list on the article landing page, as requested @sga :

Warm regards,
Isaac