Issues with pagination using API

mkurowski · 17 January 2025 20:58

We are trying to pull the entire file for a particular journal 1000 records at a time. It is returning everything fine, but there is duplication in the DOI’s that are returned, meaning that we are missing some because the script assumes the total row count and stops there. For instance, rows 1-1000 have some of the same rows that are in 1001-2000, if there are 5 duplicated then we are missing the last 5 that should have been returned had the duplicates not been there.

Here is an example:
https://0-api-crossref-org.libus.csd.mu.edu/journals/1522-1466/works?filter=from-deposit-date:1900-01-01,until-deposit-date:2019-09-30&offset=1000&rows=1000

ifarley · 17 January 2025 22:59

Hello @mkurowski ,

Welcome to the Community Forum, and thank you for your message!

Using large offset values can result in extremely long response times. An alternative to paging through very large result sets is to use cursors. Any combination of query, filters and facets may be used with cursors. While rows may be specified along with cursor, offset and sample cannot be used. To make use of this in a query include the cursor parameter with a value of *.

https://0-api-crossref-org.libus.csd.mu.edu/1522-1466/works?filter=from-deposit-date:1900-01-01,until-deposit-date:2019-09-30&cursor=*

A next-cursor field will be provided in the JSON response. To get the next page of results, pass the value of next-cursor as the cursor parameter. For example:

https://0-api-crossref-org.libus.csd.mu.edu/journals/1522-1466/works?filter=from-deposit-date:1900-01-01,until-deposit-date:2019-09-30&rows=500&cursor=DnF1ZXJ5VGhlbkZldGNoBgAAAAAP5b9LFlo0ZUxVcFdrUnZHX0U1VHJLM2wyRmcAAAAACSi6ThZzdDctS25FTlRseXVSMmFKM2x1Z2pnAAAAAAHlYE0WOThNaG5lOXJRbFNxcUY4TmhDanV6UQAAAAAkIyAZFjNsQ3YwX25kUnFtNjhDTVFFZmVCNHcAAAAADQGubBZhVVJRRXBxZ1NQS1dfbkl6bk5JZDZ3AAAAAA_3xPUWbXNDZkMzblNRQldDelVoSm5BNXNSdw==

Note that the actual cursor value will be different from this illustration.

For each set of results, you should check the number of returned items. If the number of returned items is fewer than the number of expected rows then the end of the result set has been reached. Using next-cursor beyond this point will result in responses with an empty items list.

I’m also attaching a .csv file with all of the 7000+ results. Perhaps that is helpful?

-Isaac

user_url_query_2025-01-17-22-57.csv.zip (6.8 MB)

Topic		Replies	Views
Cursor Based Pagination Engineering rest-api , metadata-retrieval , engineering , cursor	3	2016	2 February 2023
Not being able to retrieve whole set of 'journal-article's Interfaces for Machines rest-api , metadata-retrieval , journal , public-data-file	3	427	11 April 2024
Problem with retrieving all paginated REST API responses Interfaces for Machines rest-api , metadata-retrieval	2	1031	4 July 2022
Parameters skipped when calling API with curl Interfaces for Machines rest-api , metadata-retrieval , curl	2	1159	11 July 2022
Date range search of index changes seems to retrieve too many records Technical Support rest-api , metadata , snapshots , metadata-retrieval	13	3656	16 February 2022

Issues with pagination using API

Related topics