Content negotiation for bibtex does not return article number

Hi, thank you for opening this forum. I have a question about the spec of bibtex returned by the content negotiation.

I found the article-number field is not included in the bibtex obtained from content negotiation. I think it may be great if this is included as the pages field in case there is no pages field, and would like to know the opinion of the community.

For example, when I execute

curl -LH "Accept: application/x-bibtex" http://0-dx-doi-org.libus.csd.mu.edu/10.1103/physrevlett.104.198101

I get the following entry without the pages entry:

@article{M_ller_2010, title={Image Scanning Microscopy}, volume={104}, ISSN={1079-7114}, url={http://0-dx-doi-org.libus.csd.mu.edu/10.1103/physrevlett.104.198101}, DOI={10.1103/physrevlett.104.198101}, number={19}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={MĆ¼ller, Claus B. and Enderlein, Jƶrg}, year={2010}, month=may }

On the other hand, it is a common custom to include ā€œarticle-numberā€ as the ā€œpageā€ entry when one cites this article. The entry would be cited, for example, as follows:

1. MĆ¼ller C B and Enderlein J (2010) Image scanning microscopy. Physical Review Letters 104: 198101.

With this, I believe it is necessary to include the article-number field as the pages to format the resulting entry. Would it make sense to change the API in this way? If so, and thereā€™s a way to contribute to the codebase, Iā€™m happy to take a look at the repository.

Hello, and thanks for your feedback. Itā€™s a reasonable expectation that article numbers would be treated like page numbers in bibex and other citation formats.

Weā€™ve had this same request from several users over the years. It has been escalated to the API/metadata retrieval product manager and technical team for consideration in future improvements to our metadata retreival services.

You can follow any updates or process here.

1 Like

Hi, thanks @Shayn for your reply, and for guiding me to the issue tracker! I hope this issue is solved in the near future.

I hesitate to add something here, but I also found that some entry even does not have the article-number entry. For example, the entry at 10.1126/science.1234168 does not return the article number that appears at the ā€œHow to citeā€ instruction of the article webpage:

Philipp J. Keller, Imaging Morphogenesis: Technological Advances and Biological Insights. Science 340, 1234168 (2013). DOI:10.1126/science.1234168

since the returned CSL-JSON by the content negotiation is as follows:

{
    "indexed": {
        "date-parts": [
            [
                2023,
                12,
                21
            ]
        ],
        "date-time": "2023-12-21T08:40:29Z",
        "timestamp": 1703148029375
    },
    "reference-count": 51,
    "publisher": "American Association for the Advancement of Science (AAAS)",
    "issue": "6137",
    "content-domain": {
        "domain": [],
        "crossmark-restriction": false
    },
    "published-print": {
        "date-parts": [
            [
                2013,
                6,
                7
            ]
        ]
    },
    "abstract": "<jats:p>Morphogenesis, the development of the shape of an organism, is a dynamic process on a multitude of scales, from fast subcellular rearrangements and cell movements to slow structural changes at the whole-organism level. Live-imaging approaches based on light microscopy reveal the intricate dynamics of this process and are thus indispensable for investigating the underlying mechanisms. This Review discusses emerging imaging techniques that can record morphogenesis at temporal scales from seconds to days and at spatial scales from hundreds of nanometers to several millimeters. To unlock their full potential, these methods need to be matched with new computational approaches and physical models that help convert highly complex image data sets into biological insights.</jats:p>",
    "DOI": "10.1126/science.1234168",
    "type": "journal-article",
    "created": {
        "date-parts": [
            [
                2013,
                6,
                6
            ]
        ],
        "date-time": "2013-06-06T18:13:33Z",
        "timestamp": 1370542413000
    },
    "source": "Crossref",
    "is-referenced-by-count": 155,
    "title": "Imaging Morphogenesis: Technological Advances and Biological Insights",
    "prefix": "10.1126",
    "volume": "340",
    "author": [
        {
            "given": "Philipp J.",
            "family": "Keller",
            "sequence": "first",
            "affiliation": [
                {
                    "name": "Howard Hughes Medical Institute, Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA."
                }
            ]
        }
    ],
    "member": "221",
    "reference": [
        {
            "key": "e_1_3_2_2_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1167094"
        },
        {
            "key": "e_1_3_2_3_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1162493"
        },
        {
            "key": "e_1_3_2_4_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1073/pnas.1108494108"
        },
        {
            "key": "e_1_3_2_5_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nbt.2281"
        },
        {
            "key": "e_1_3_2_6_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/j.cell.2008.01.053"
        },
        {
            "key": "e_1_3_2_7_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/j.ceb.2010.12.004"
        },
        {
            "key": "e_1_3_2_8_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/j.devcel.2011.12.007"
        },
        {
            "key": "e_1_3_2_9_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1225399"
        },
        {
            "key": "e_1_3_2_10_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1235249"
        },
        {
            "key": "e_1_3_2_11_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.2321027"
        },
        {
            "key": "e_1_3_2_12_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1111/j.1365-2818.1993.tb03346.x"
        },
        {
            "key": "e_1_3_2_13_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1100035"
        },
        {
            "key": "e_1_3_2_14_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1002/andp.19023150102"
        },
        {
            "key": "e_1_3_2_15_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1709"
        },
        {
            "key": "e_1_3_2_16_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1002/dvg.20698"
        },
        {
            "key": "e_1_3_2_17_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.2434"
        },
        {
            "key": "e_1_3_2_18_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1195929"
        },
        {
            "key": "e_1_3_2_19_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1411"
        },
        {
            "key": "e_1_3_2_20_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nphoton.2012.205"
        },
        {
            "key": "e_1_3_2_21_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.2098"
        },
        {
            "key": "e_1_3_2_22_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1586"
        },
        {
            "key": "e_1_3_2_23_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/j.cell.2012.10.008"
        },
        {
            "key": "e_1_3_2_24_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nbt.1928"
        },
        {
            "key": "e_1_3_2_25_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1652"
        },
        {
            "key": "e_1_3_2_26_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.2062"
        },
        {
            "key": "e_1_3_2_27_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.2064"
        },
        {
            "key": "e_1_3_2_28_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nphoton.2010.204"
        },
        {
            "key": "e_1_3_2_29_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1364/OL.22.001905"
        },
        {
            "key": "e_1_3_2_30_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1529/biophysj.107.120345"
        },
        {
            "key": "e_1_3_2_31_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1476"
        },
        {
            "key": "e_1_3_2_32_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1098/rsta.2007.0013"
        },
        {
            "key": "e_1_3_2_33_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nphoton.2010.306"
        },
        {
            "key": "e_1_3_2_34_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nbt1037"
        },
        {
            "key": "e_1_3_2_35_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1152/physrev.00038.2009"
        },
        {
            "key": "e_1_3_2_36_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nature11589"
        },
        {
            "key": "e_1_3_2_37_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1221071"
        },
        {
            "key": "e_1_3_2_38_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1224143"
        },
        {
            "key": "e_1_3_2_39_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nprot.2009.130"
        },
        {
            "key": "e_1_3_2_40_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nature10938"
        },
        {
            "key": "e_1_3_2_41_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1007/978-1-61779-210-6_9"
        },
        {
            "key": "e_1_3_2_42_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/j.gde.2010.05.008"
        },
        {
            "key": "e_1_3_2_43_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1371/journal.pbio.1001256"
        },
        {
            "key": "e_1_3_2_44_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1472"
        },
        {
            "key": "e_1_3_2_45_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1073/pnas.0511111103"
        },
        {
            "key": "e_1_3_2_46_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.1228"
        },
        {
            "key": "e_1_3_2_47_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nbt.1612"
        },
        {
            "key": "e_1_3_2_48_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nmeth.2084"
        },
        {
            "key": "e_1_3_2_49_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nrg2548"
        },
        {
            "key": "e_1_3_2_50_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1038/nature09198"
        },
        {
            "key": "e_1_3_2_51_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1126/science.1189428"
        },
        {
            "key": "e_1_3_2_52_2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1242/dev.085001"
        }
    ],
    "container-title": "Science",
    "original-title": [],
    "language": "en",
    "link": [
        {
            "URL": "https://0-syndication-highwire-org.libus.csd.mu.edu/content/doi/10.1126/science.1234168",
            "content-type": "unspecified",
            "content-version": "vor",
            "intended-application": "similarity-checking"
        }
    ],
    "deposited": {
        "date-parts": [
            [
                2022,
                1,
                14
            ]
        ],
        "date-time": "2022-01-14T09:25:26Z",
        "timestamp": 1642152326000
    },
    "score": 1,
    "resource": {
        "primary": {
            "URL": "https://0-www-science-org.libus.csd.mu.edu/doi/10.1126/science.1234168"
        }
    },
    "subtitle": [],
    "short-title": [],
    "issued": {
        "date-parts": [
            [
                2013,
                6,
                7
            ]
        ]
    },
    "references-count": 51,
    "journal-issue": {
        "issue": "6137",
        "published-print": {
            "date-parts": [
                [
                    2013,
                    6,
                    7
                ]
            ]
        }
    },
    "alternative-id": [
        "10.1126/science.1234168"
    ],
    "URL": "http://0-dx-doi-org.libus.csd.mu.edu/10.1126/science.1234168",
    "relation": {},
    "ISSN": [
        "0036-8075",
        "1095-9203"
    ],
    "subject": [
        "Multidisciplinary"
    ],
    "container-title-short": "Science",
    "published": {
        "date-parts": [
            [
                2013,
                6,
                7
            ]
        ]
    }
}

It seems that the only way to reproduce this citation entry is guessing it from its DOI, but it seems that the rule to transform the DOI seems to be different for different journals such as ā€œDevelopmentā€ and ā€œeLifeā€ etc. Iā€™m not sure if issue stems from whether of the metadata deposition process or transformation process, but I hope those entries will also return the appropriate ā€œpageā€ (or at least ā€œarticle-numberā€) entry for citation formatting.

Unfortunately there is no standard way to identify a paper number in BibTeX. BibTeX has lagged for decades on embracing new bibliographic fields. As an example, URLs for papers have shown up in howpublished, note, and a url field that isnā€™t part of the original BibTeX standard. Unfortunately some people put the DOI as a URL field, again because the standard has never been modernized. There has been some discussion about how to handle paper numbers over the years in tex stackexchange and here, I think there is a strong argument not to reuse the pages field for this. The pages field for a book is the number of pages in the book, and using the pages field for a paper number might lead people to think that the paper has 198101 pages in it. Moreover, a bibliographic style is used with bibtex to render the entry in a suitable format, and some styles will insert the word ā€œpagesā€ or ā€œppā€ in front of the pages field. This is inappropriate. There are dozens of bibtex styles in use and they may render things inappropriately.

Letā€™s think about the purpose of a bibliographic citation in the first place: to help the reader find the article, and to identify it uniquely. In the old paper-only days, the page number told the reader where to thumb through an enormous volume to find the paper. It also served as part of the unique identifier for the paper, and gave a hint as to how large the paper was. Nowadays the DOI is the universal identifier of a paper, and serves both as the unique identifier and the canonical URL for the online version. Paper numbers donā€™t have any particular use for finding it, and they are inferior to a DOI for an online paper.

There are lots of non-standard fields that have been created for BibTeX entries, including things like the eprint and archivePrefix from arxiv, etc. I think the best choice for a paper number field would be the eid field that is used by biblatex.

Yes, thatā€™s part of the same request in the issue tracker.

Publishers send metadata to us in xml format, which is fairly directly accessible via an xml API. We then index that into our REST API which formats the metadata in JSON. Theoretically, we should be indexing all metadata elements present in the xml in the REST API records, but in practice, weā€™re still gradually finding some elements which were missed in the initial build of the REST API and we havenā€™t prioritized yet in subsequent updates.

If you check the xml API record for 10.1126/science.1234168, you will see the article ID:

<publisher_item>
<item_number item_number_type="article_number">1234168</item_number>
<identifier id_type="doi">10.1126/science.1234168</identifier>
</publisher_item>

But weā€™re not indexing article IDs in the REST API, which is precisely why theyā€™re not available via bibtex. Any article IDs that you do find in the REST API have been incorrectly tagged as page numbers.

All that said, not every publisher that uses article IDs opts to supply them in their DOIsā€™ metadata records. Most metadata fields are optional. We encourage publishers to be as thorough as possible, but we canā€™t strictly require things likes article IDs or page numbers, because not all publications have them.

2 Likes

I had carefully read through the documentation for the 5.3.1 schema, and when I saw the description of item_number, I inferred that it was a globally unique identifier that fulfilled the role of tracking an article within crossref since it didnā€™t yet have a DOI assigned. This might not be something that is intended for public usage. By contrast, a paper number in a bibliographic reference is sometimes only unique within the issue (e.g., Volume 5, issue 2, paper 3). The paper number would then be reused for the next issue, so it only identifies the paper within that issue. Of course some journals will have unique identifiers, but that isnā€™t necessarily the thing that publishers want used to identify it in a bibliographic reference. This could be cleared up by refining the documentation of item_number.

All the bibliographic metadata elements in our schema, including item_number, necessarily pertain to items that do have DOIs, since we only have metadata records for DOIs that are registered through Crossref.

The documentation for item_number in our earlier schema versions went into a lot more detail. Some of this guidance is outdated, but the general idea is still true

This identifier is a publisher-assigned number that uniquely identifies the entity being registered. This element should be used for identifiers based on publisher internal standards. Use identifier for a publisher identifier that is based on a public standard such as PII or SICI. If the item_number and identifier are identical, there is no need to submit both. In this case, the preferred element to use is identifier. Data may be alpha, numeric or a combination. item_number has an optional attribute, item_number_type. It is assigned by the publisher to provide context for the data in item_number. If item_number contains only a publisherā€™s tracking number, this attribute need not be supplied. If the item_number contains other data, this attribute can be used to define the content. For example, if a journal is published online (i.e. it has no page numbers), and each article on the table of contents is assigned a sequential number, this article number can be placed in item_number, and the item_number_type attribute can be set to ā€œarticle_numberā€. Although Crossref has not provided a set of enumerated types for this attribute, please check with Crossref before using this attribute to determine if a standard attribute has already been defined for your specific needs. If a dissertation DAI has been assigned, it should be deposited in the identifier element with the id_type attribute set to ā€œdaiā€. If an institution has its own numbering system, it should be deposited in item_number, and the item_number_type should be set to ā€œinstitutionā€ If the report number of an item follows Z39.23, the number should be deposited in the identifier element with the id_type attribute set to ā€œZ39.23ā€. If a report number uses its own numbering system, it should be deposited in the identifier element, and the id_type should be set to ā€œreport-numberā€ The designation for a standard should be placed inside the identifier element with the id_type attribute set to ā€œISO-std-refā€ or ā€œstd-designationā€ (more generic label)

I think you can see why that needed to be pared down for clarity and brevity!

item_number should definitely only be used for unique identifiers that are publisher-specific and basically work in parallel with the DOI. Itā€™s not meant for something like ā€œarticle 4ā€ within a given issue.

Our clearer, and most up-to-date best practice recommendation can be found our documentation site page for Article numbers or IDs.

2 Likes

Thanks, @mccurley @Shayn, for your comments and discussion. Iā€™m sorry for taking the time to respond. Itā€™s nice that the article_number is at least accessible from the XML API. I understand the disadvantage of putting the non-page article number in the ā€œpagesā€ section and also that it may be challenging to standardize the alternative article-specific identifiers.

Iā€™m trying to find a way to verify citation entries automatically using the CrossRef database (for researchers). Regarding this, I would like to add two comments from practical perspectives.

First, if we distinguish between the pages and article numbers (which I think is reasonable), it would be reasonable to consider how we make them compatible with the Citation Style Language repositories. For example, many styles seem to have only ā€œpagesā€ entries. (american-physics-society.csl has only <text variable="page-first" form="short"/>.

I think we have two ways to go:

  1. Make changes in the CSL repository
  2. Make an option to output a ā€œdirtyā€ version of CSL-JSON or BibTeX that current CSL repository entries can format.

Option 1 would be preferable in the long term, but I understand it requires enormous work. Would anyone be already working on this?

Second, maybe Iā€™m missing an existing solution, but Iā€™m wondering if thereā€™s a way to format the XML entry directly using citation styles (in CSL, for example). Currently, even citation.crosscite.org uses BibTeX (or CSL-JSON?) output and fails to format the papers in reasonably major journals such as Science, Physical Review Letters, etc. Would it make sense to change the source of this to the deposited XML, or would it be technically difficult?

We seem to be speaking totally different languages. The discussion of CSL is pretty irrelevant to the world of mathematics, engineering, and computer science where BibTeX is the dominant format of bibliographic databases. As far as I can tell, nobody uses CSL in these fields, and I donā€™t know anyone who uses zotero. BibTeX is a data record format that is independent of citation style, and each journal typically has their own bibtex bst style file.

I donā€™t know what you use to store citation information in your backend, but citation.crosscite.org seems to export incorrect bibtex for the DOI 10.1007/978-3-031-15802-5_14. It should be @inproceedings type for bibtex rather than @misc, and it should not have a journal field. It is missing several other fields such as volume, series, booktitle, month, address. I suspect that the formatting of bibtex records requires more work than just adding an article number. Itā€™s worth noting that other services like google scholar also supply limited inadequate bibtex entries, but at least they get this one right as @inproceedings. bibtex may have a few inadequacies like no article number, but at least itā€™s a public standard that is pretty well documented.

Hi, @mccurley, thanks for your comment. Iā€™m afraid Iā€™ve unconsciously switched from BibTeX to CSL things since I thought there is an analogy in the issue (which field to assign the article identifier) irrespective of the format. Iā€™m fine with either of the following, with a technical preference for the latter option.

  • Crossref->BibTeX->formatting with a *,bst file
  • Crossref->CSL-JSON->formatting with CSL

You might agree that the BibTeX output should be consistent with current commonly-used *.bst styles, and CSL-JSON output should be consistent with CSL styles.

Iā€™ve played a bit with RevTeX v4.2 citation styles, and the eid worked as expected. Maybe the BibTeX side provides more complete field support.

export incorrect bibtex for the DOI 10.1007/978-3-031-15802-5_14.

That would be a problem anyway and I hope itā€™ll be solved in the near future.