Hello CrossRef community,
In August I set up a weekly job to search for new research on poverty in developing countries. I used quite a long search string and some basic filters, as follows:
query: "global poverty reduction evidence climate urban migration gender developing low middle income country LIC MIC",
filter: "from-pub-date:[7 days ago],has-abstract:1,type:journal-article,type:report",
rows: 1000
When I ran this search initially it returned around 800 results for a single week. During September, there seems to have been a crash in volumes, and for the last few weeks, the same query has been returning only a handful of results - about 20 per week.
This week, I repeated the search for the initial time window in August, and also retrieved only around 20 records this time.
Has the behaviour of this endpoint changed recently? Is there any way I can get back to the previous volumes of results?
Apologies for the vague issue - any help would be greatly appreciated.
Hi @tomwagstaff-opml ,
I assume your API query looks like this, right?
https://0-api-crossref-org.libus.csd.mu.edu/works?query.bibliographic=global+poverty+reduction+evidence+climate+urban+migration+gender+developing+low+middle+income+country+LIC+MIC&filter=from-pub-date:2024-11-08,has-abstract:1,type:journal-article,type:report - 29 results
Compared to a query like this: https://0-api-crossref-org.libus.csd.mu.edu/works?query.bibliographic=global+poverty+reduction+evidence+climate+urban+migration+gender+developing+low+middle+income+country+LIC+MIC&filter=from-pub-date:2024-01-01,has-abstract:1,type:journal-article,type:report - 1343 results
I would expect to see more results for both of these queries, so I am checking with a couple of my colleagues about what I am seeing here.
More as I have it,
Isaac
Hi @ifarley,
Thanks very much for picking this up. Glad itās not just me who thought this was peculiar behaviour
Youāre correct on my query URL, except Iām using the general query rather than query.bibliographic. As a sidenote - do you know what the difference is? I know bibliographic searches title, author and some publication details - is an unqualified query looking at all possible fields?
Here is my query URL (with apologies for the percent encoding):
https://0-api-crossref-org.libus.csd.mu.edu/works?query=global+poverty+reduction+evidence+climate+urban+migration+gender+developing+low+middle+income+country+LIC+MIC&filter=from-pub-date%3A2024-11-08%2Chas-abstract%3A1%2Ctype%3Ajournal-article%2Ctype%3Abook-chapter%2Ctype%3Areport
1 Like
And in case it helps - Iāve been doing a bit more investigating. These are the numbers of results I got on Friday, using another 7-day query:
Query |
Results (for 3/11 - 10/11, searched now) |
global poverty reduction evidence climate urban migration gender developing low middle income country LIC MIC |
25 |
poverty map |
56 |
poverty |
26 |
poverty low middle income country LIC MIC |
629 |
poverty low middle income country LIC MIC climate urban migration gender |
71 |
Iām surprised at how these results swing around. I thought the query terms were basically ORed together, so more terms should always bring back more results, but it doesnāt seem to be the case hereā¦
1 Like
Hello @tomwagstaff-opml ,
Thanks for posting your query and some example results.
query.bibliographic
is specifically for querying bibliographic information, which is useful for citation look up. query.bibliographic
includes matches from titles, authors, ISSNs, and publication years.
I think I was also under the impression that we performed something very similar to an OR query when a query included a list of multiple terms. I likely have referenced it that way in other posts within this community forum (Iām going to search and see if I can clarify in other similar community forum posts). And, I believe I came to that assumption because I was reviewing queries with fewer search terms. After discussing your example with Martyn on our program team and Dominika on our technical team, I learned that my understanding was incorrect and not quite nuanced enough.
So, we donāt quite have an OR query at play here. Instead, query
and query.bibliographic
require that at least 20% of query words match. We added this requirement a few years ago for performance reasons. So, with 10 input query words, 20% becomes 2 words, and the query results start to drop works that match only one word at that threshold. This is why a query containing poverty+low+middle+income+country+LIC+MIC
has more results than a query containing poverty+low+middle+income+country+LIC+MIC+climate+urban+migration+gender
. Weāve simply dropped results from the latter of the two queries because of that 20% matching rule that we implemented.
Based on your use case, you might be better served by something like OpenAlex which is designed more with search in mind.
I hope this helpful!
-Isaac
A couple of additional things I failed to mention in yesterdayās post:
- Weāll update our Swagger documentation to capture this behavior
- As you reported @tomwagstaff-opml , there do appear to be some discrepancies with the recent totals youāre reporting; my colleague @mrittman is investigating that
Thanks again for raising this!
More as we have it,
Isaac
Hi @ifarley,
Thanks very much for all this feedback. Itās good to understand there is this extra constraint at play of at least 20% matching - this explains the different performance of the various queries.
Thanks also for the suggestion of OpenAlex - it might as you say be better suited to our use case, although I would regret moving over from CrossRef, which has been such a good source overall.
Final question - and I know itās vague and fuzzy and probably impossible to answer - but do you have any idea why we saw these major drops in volumes since early September, even with the same query? (Apologies if thatās what @mrittman is already looking into!)
Thanks for following up @tomwagstaff-opml . Yes, thatās what @mrittman was going to investigate. Iāll follow up with him next week to check on his progress.
Have a lovely weekend,
Isaac