CrossRef bibtex export & improving it

I understand that CrossRef exports BibTeX bibliographies, that I can access doi dot org with an “Accept” header that indicates I want bibtex format, and CrossRef will return a BibTeX bibliography. This is very useful! However, I have suggestions for the transformation from CrossRef’s internal format (which we can see using a JSON query) and CrossRef’s choices as far as how to export that data in BibTeX format.

I’d like to understand CrossRef’s export mechanism (I’m presuming there is a script that inputs JSON or another internal format and exports BibTeX). I would be happy to offer a pull request if there are ways I can improve this export. (For instance, paper titles should be “Title: Subtitle” but CrossRef exports only “Title”, and months are formatted incorrectly.)

CrossRef has requested that I open a discussion here and that CrossRef + the community can discuss this.

@jowens thanks for your interest in this, we’re very happy to have community contributions to our code and look at how it can be improved. You can find the script for the bibTeX transformation at src/cayenne/formats/bibtex.clj · main · crossref / REST API · GitLab. If there are any points to discuss feel free to carry on the conversation here.

1 Like

Oh, fantastic. I will figure out how to spin up Clojure! OK, just to confirm then:

  • This code imports something ~equivalent to the JSON that I get through a query like https://0-api-crossref-org.libus.csd.mu.edu/v1/works/10.x/y.z ? (The data structure appears to be metadata, and is probably a Clojure data structure, but content-wise, does it contain everything that the JSON does?)
  • This code outputs something ~equivalent to the BibTeX that I get through a query like one to https://0-doi-org.libus.csd.mu.edu//10.x/y.z with headers set to {‘Accept’: ‘text/bibliography;style=bibtex, application/x-bibtex’} ?

I’m not a developer, but I’ve passed this on to a colleague who will be able to give you some answers on how the code works.

Just politely following up about my understanding of the input and output formats here. Understanding the input schema would be very helpful, thanks!

Hi John, you are right, the function ->bibtex is in charge of transforming a clojure data structure (containing the same information as the JSON output) into bibtex format.

As for the second question, you are correct, although you only need application/x-bibtex for the header.

In the source code, you can see a list of available months.

And add-titles contains the logic for constructing the title depending on the kind of document that is being processed.

I hope this is useful, please let us know if you have more questions.

1 Like

Thank you! Let me note a few places where I believe the bibtex could be improved. I have never written any Clojure so this is less useful than I think you’d like, but let’s at least start with “here are three things that could be done better” and then see the next steps.

Months

It appears that add-month will always return a string surrounded with quotes (e.g., "jan"). BibTeX instead has (built-in) abbreviations for months, and understands those abbreviations, and those abbreviations do NOT have quotes around them. (Then a local style could understand this abbreviation and produce a bibliography with, say, “January” or “Jan.” or “1/” or in a foreign language.) So instead of emitting

month = "jan", or month = {Jan},,

->bibtex needs to emit

month = jan,

It appears that the months are capitalized somewhere in the process and I haven’t figured out how yet (because bibtex-month has uncapitalized strings). But correct BibTeX months are lower-case and not in quotes.

(“It’s best to use the three-letter abbreviations for the month, rather than spelling out the month yourself. This lets the bibliography style be consistent.”—BibTeXing, Oren Patashnik [BibTeX’s author], February 8, 1988. https://bibtexml.sourceforge.net/btxdoc.pdf)

Extra credit: issued/date-parts might have multiple dates. If there are two dates, does this encode things like “March–April”? If so, the correct BibTeX is month = aug # "\slash " # sep.

@inproceedings

I’m looking at an article (doi:10.1145/3448016.3452841) that’s labeled as type: proceedings-article that should be output as an @inproceedings with the container-title mapping to the field booktitle. Instead the BibTeX output is an @article with a journal that contains container-title.

Extra credit: It’d be nice for @inproceedings to also output series = with the contents of event/acronym, if present.

Title + subtitle → title only

Same DOI as above. It has a title and a subtitle field. I believe add-titles only takes into account title. If subtitle is also present, then title and subtitle should be joined with a colon and output as the BibTeX field title = ....

1 Like

Thanks for the suggestions @jowens !

Thanks, @ifarley! With the greatest of respect, I don’t think of these as “suggestions” but instead more properly “bugs”. They’re definitely issues that should be fixed! Looking forward to working with you and your team to fix 'em up.

1 Like

Thanks @jowens, this is really useful feedback. We know that it needs some attention and content negotiation (output in a variety of formats, not just bibtex) needs an overhaul. I’ve filed a ticket for us to prioritise and work on. I’m afraid I can’t give you a date, we have a few large projects ongoing at the moment and some of the changes with content negotiation will need progress on those first.

Relevant to this discussion is the software from Martin Fenner which handles content negotiation for DataCite metadata: talbot · PyPI. We’ll be looking at this to see whether it can be adapted to our needs.

Any progress on this issue? Thanks.

Yes, your timing is good. In the last couple of days we’ve managed to identify a potential bug, see [CR-1064] - Jira. I need to check again with the dev team how straightforward a fix will be, but we’re working on it.

Any further progress on this issue? Thanks.