Skip to content

Performance regression in BooleanQuery with multiple MUST_NOT clauses on interleaved posting lists #16049

@sgup432

Description

@sgup432

Description

For one of the users in OpenSearch, we observed that they had a sample query like below(redacted the whole boolean query and just keeping the must_not part for reference):

{"bool":{"must_not":[{"terms":{"category":["type_a","type_b","type_c","type_d","type_e","type_f","type_g","type_h","type_i","type_j","type_k"]}}]}}

They have an Opensearch index of ~2B docs across many shards, and after upgrading to lucene 10.x from lucene 9.x, we observed a significant performance regression on this query pattern.

I see there were major changes which were done in ReqExclBulkScorer(which handles the exclusion part) where we added docIdRunEnd() optimization, basically excluding a batch of matching docs efficiently rather than going over each matching docs one by one. When a term matches a dense contiguous block of documents, this lets the scorer jump over the entire block in a single step instead of iterating through each excluded doc individually.

How Lucene 9 handled it:

In Lucene 9, multiple MUST_NOT terms were merged using a min-heap (DisiPriorityQueue). The heap always surfaces the smallest excluded doc ID across all term iterators. Each excluded doc costs one heap operation ie O(log K) where K is the number of excluded terms. Simple, predictable, and scales well regardless of how the excluded docs are distributed.

What changed in Lucene 10:

In Lucene 10, ReqExclBulkScorer now calls docIDRunEnd() on every excluded doc hit to try to skip an entire run at once. For a single MUST_NOT term with a dense posting list, this is a genuine win. The scorer can skip thousands of docs in one step. However, when there are many MUST_NOT terms (like 11 in this case) whose values are interleaved across the index, docIDRunEnd() triggers DisjunctionDISIApproximation.computeTopList(), which linearly scans all K iterators on every excluded doc hit.

In this case or generally, where we have 11 interleaved terms, and the lead iterator cannot claim a run longer than 1(or a very small value) as the next excluded doc always belongs to a different term at a different position. Yet computeTopList() still has to linearly scan all K iterators on every hit just to confirm this, paying O(K) cost for a result that's no better than the old upTo += 1. This turns the per-doc cost from O(log K) in Lucene 9 to O(K) in Lucene 10 for this pattern.

Looks like this optimization works well with few MUST_NOT terms (1-2). It regresses with many MUST_NOT terms (like 11) whose posting lists are interleaved, which is a common pattern when filtering by multiple category values on a keyword field.

Sample hot thread dump we saw:

70.1% (350.2ms out of 500ms) cpu usage by thread 'opensearch[...][search][T#42]'
  3/10 snapshots sharing following 34 elements
    app//org.apache.lucene.search.DisjunctionDISIApproximation.computeTopList(DisjunctionDISIApproximation.java:179)
    app//org.apache.lucene.search.DisjunctionDISIApproximation.topList(DisjunctionDISIApproximation.java:169)
    app//org.apache.lucene.search.DisjunctionDISIApproximation.docIDRunEnd(DisjunctionDISIApproximation.java:194)
    app//org.apache.lucene.search.ReqExclBulkScorer.score(ReqExclBulkScorer.java:62)
    app//org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:367)
    app//org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)

Suggested fix

Maybe skip this approach when the exclusion is a disjunction of many terms(>2), falling back to simpler one-doc at-a-time approach like in lucene 9, which seems more efficient for this pattern.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions