My colleagues on the technical team have some suggestions:
You could divide the set you need to download by the date of creation, and download various creation date ranges in parallel. Creation date should safe because it does not change, and every DOI has only one creation date. So a DOI should belong to exactly one creation date range, assuming all possible ranges are downloaded. The full range to cover is from 2002-07-25 (inclusive, this the older creation date in our data) to the current date.
For example, I can download DOIs updated since April and created in 2020, in parallel download DOIs updated since April and created in 2019, ⦠, and in parallel download DOIs updated from April and created in 2002, using parallel requests https://0-api-crossref-org.libus.csd.mu.edu/works?filter=from-update-date:2020-04-01,until-update-date:2020-11-16,from-created-date:2020,until-created-date:2020&cursor=ā¦
and https://0-api-crossref-org.libus.csd.mu.edu/works?filter=from-update-date:2020-04-01,until-update-date:2020-11-16,from-created-date:2019,until-created-date:2019&cursor=ā¦
and so on.
Or, I could use smaller ranges and download separately DOIs updated from April and created in 2020-11, DOIs updated from April and created in 2020-10, and so one down to 2002-07. Or use just a few days as the range. The smallest range is 1 day long, as this is the creation date filter āresolutionā.
Those subsets may not be well balanced in terms of the numbers of DOIs, but it should allow you to speed the whole thing up a bit.
Does that make sense?