Advanced set/boolean operators

Is there any functionality to do OR and NOT operations when doing a search query with the /get/works endpoint?

eg. something like
(
ALL(zebrafish) AND ALL(pesticides) AND (
ALL(behaviour) OR ALL(behavior)
) AND NOT (
ALL(larva) OR ALL(larvae)
)
)

Hello @joshuasy10 . Thanks for your message, and welcome to the community forum.

No, our REST API does not support boolean operators, but there are ways to get the information you seek.

Let’s use this query as a starting point:
https://0-api-crossref-org.libus.csd.mu.edu/works?query=zebrafish+pesticides+behaviour+behavior&select=DOI,title,type,published,score&rows=750

In this example query, the API is providing results for any content registered with us with any of these words in the metadata record registered with us: zebrafish, pesticides, behaviour, OR behavior. As you can see, because we are including four search terms, we’re going to get many results - over 1.4 million. But, the results are ordered by relevance score, so works with all four words in its metadata will be ranked higher than works with only one of the search terms in its metadata. I’ve requested the top 750 results and also limited the metadata returned in the results to the relevant works’ DOI, title, work type (e.g., journal article), publication date, and relevance score.

As you can see from the limited metadata that I have searched on, the first two results, look much more relevant to your query than the 748th and 749th results:

These are the first two results on that first page:

These are the last two results on that first page of results:

My best,
Isaac

An update on something that I said in this thread. I incorrectly said that our REST API performs an OR search when given multiple terms in a query or query.bibliographic query. I was wrong about that.

We don’t quite have an OR query at play within our REST API. Instead, query and query.bibliographic require that at least 20% of query words match. We added this requirement a few years ago for performance reasons. So, with 10 input query words, 20% becomes 2 words, and the query results start to drop works that match only one word at that threshold. This is why a query containing poverty+low+middle+income+country+LIC+MIC has more results than a query containing poverty+low+middle+income+country+LIC+MIC+climate+urban+migration+gender . We’ve simply dropped results from the latter of the two queries because of that 20% matching rule that we implemented.

For more context on this see this thread.

-Isaac