CPU profile often tells SingletonSortedNumericDocValues#nextDoc() is using a high percentage of CPU when running luceneutil, but the nextDoc() of dense cases should be rather simple. So I suspect that it is too many layers of abstraction (and wrap) that cause the stress of JVM. Unwraping it to NumericDocvalues shows around 30% speed up.
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermTitleBDVSort 132.24 (20.6%) 125.67 (9.9%) -5.0% ( -29% - 32%) 0.330
LowTerm 1424.13 (3.2%) 1381.34 (4.4%) -3.0% ( -10% - 4%) 0.014
OrHighNotHigh 707.82 (3.3%) 687.49 (6.0%) -2.9% ( -11% - 6%) 0.062
TermDTSort 155.32 (10.9%) 151.02 (10.2%) -2.8% ( -21% - 20%) 0.406
OrNotHighMed 618.46 (3.7%) 602.65 (4.4%) -2.6% ( -10% - 5%) 0.047
Fuzzy1 76.22 (5.3%) 74.71 (6.6%) -2.0% ( -13% - 10%) 0.293
HighTermMonthSort 174.89 (10.4%) 171.45 (10.6%) -2.0% ( -20% - 21%) 0.554
OrHighNotMed 776.08 (4.9%) 761.70 (7.8%) -1.9% ( -13% - 11%) 0.367
HighTermDayOfYearSort 56.23 (10.7%) 55.26 (10.9%) -1.7% ( -21% - 22%) 0.615
MedTerm 1449.48 (3.7%) 1425.87 (5.1%) -1.6% ( -10% - 7%) 0.250
OrNotHighHigh 687.92 (4.9%) 677.06 (5.5%) -1.6% ( -11% - 9%) 0.339
OrHighNotLow 742.99 (4.7%) 732.23 (5.9%) -1.4% ( -11% - 9%) 0.390
OrNotHighLow 789.37 (2.7%) 778.80 (4.7%) -1.3% ( -8% - 6%) 0.270
HighPhrase 75.84 (2.2%) 75.14 (3.0%) -0.9% ( -6% - 4%) 0.269
HighSloppyPhrase 20.71 (5.9%) 20.56 (5.2%) -0.7% ( -11% - 11%) 0.678
IntNRQ 106.38 (18.4%) 105.67 (18.2%) -0.7% ( -31% - 44%) 0.908
OrHighMed 45.10 (1.5%) 44.83 (1.8%) -0.6% ( -3% - 2%) 0.261
MedSpanNear 192.49 (2.5%) 191.51 (3.5%) -0.5% ( -6% - 5%) 0.593
OrHighLow 489.82 (5.5%) 487.79 (5.7%) -0.4% ( -11% - 11%) 0.815
MedSloppyPhrase 27.33 (2.9%) 27.22 (2.3%) -0.4% ( -5% - 5%) 0.623
MedPhrase 208.94 (2.9%) 208.09 (3.7%) -0.4% ( -6% - 6%) 0.696
Respell 71.84 (2.4%) 71.55 (2.4%) -0.4% ( -5% - 4%) 0.600
OrHighHigh 36.26 (1.3%) 36.13 (1.1%) -0.4% ( -2% - 2%) 0.344
BrowseMonthSSDVFacets 15.95 (2.7%) 15.90 (2.5%) -0.4% ( -5% - 5%) 0.672
AndHighMed 85.83 (2.2%) 85.53 (2.7%) -0.3% ( -5% - 4%) 0.658
Prefix3 123.15 (2.6%) 122.74 (2.5%) -0.3% ( -5% - 4%) 0.678
Fuzzy2 76.41 (4.7%) 76.23 (4.2%) -0.2% ( -8% - 9%) 0.867
BrowseDayOfYearSSDVFacets 14.52 (2.4%) 14.49 (2.2%) -0.2% ( -4% - 4%) 0.747
MedIntervalsOrdered 56.39 (4.2%) 56.27 (4.1%) -0.2% ( -8% - 8%) 0.871
HighIntervalsOrdered 9.29 (4.7%) 9.27 (4.4%) -0.2% ( -8% - 9%) 0.896
AndHighMedDayTaxoFacets 119.76 (2.5%) 119.53 (2.9%) -0.2% ( -5% - 5%) 0.831
HighSpanNear 20.89 (2.0%) 20.85 (2.3%) -0.2% ( -4% - 4%) 0.803
LowIntervalsOrdered 45.51 (4.9%) 45.47 (4.8%) -0.1% ( -9% - 10%) 0.952
LowPhrase 64.17 (2.6%) 64.14 (2.6%) -0.1% ( -5% - 5%) 0.951
LowSpanNear 104.45 (2.2%) 104.41 (1.9%) -0.0% ( -4% - 4%) 0.959
Wildcard 103.83 (2.8%) 103.80 (2.8%) -0.0% ( -5% - 5%) 0.970
AndHighHigh 42.33 (2.6%) 42.33 (2.4%) -0.0% ( -4% - 5%) 0.991
BrowseRandomLabelSSDVFacets 10.62 (2.5%) 10.62 (1.8%) 0.0% ( -4% - 4%) 0.981
AndHighHighDayTaxoFacets 29.75 (2.3%) 29.76 (2.7%) 0.1% ( -4% - 5%) 0.949
MedTermDayTaxoFacets 26.56 (3.0%) 26.58 (2.5%) 0.1% ( -5% - 5%) 0.945
AndHighLow 1012.26 (4.5%) 1013.62 (4.3%) 0.1% ( -8% - 9%) 0.923
LowSloppyPhrase 78.82 (6.8%) 79.03 (6.0%) 0.3% ( -11% - 14%) 0.897
PKLookup 204.09 (3.0%) 204.82 (2.9%) 0.4% ( -5% - 6%) 0.703
OrHighMedDayTaxoFacets 14.53 (3.4%) 14.59 (2.7%) 0.4% ( -5% - 6%) 0.694
HighTerm 1607.26 (5.2%) 1623.99 (5.6%) 1.0% ( -9% - 12%) 0.543
BrowseRandomLabelTaxoFacets 11.93 (6.9%) 15.52 (2.5%) 30.1% ( 19% - 42%) 0.000
BrowseDateTaxoFacets 13.46 (9.0%) 18.28 (3.6%) 35.8% ( 21% - 53%) 0.000
BrowseDayOfYearTaxoFacets 13.59 (9.1%) 18.53 (3.6%) 36.3% ( 21% - 53%) 0.000
BrowseMonthTaxoFacets 13.93 (10.9%) 19.70 (14.9%) 41.4% ( 14% - 75%) 0.000
Baseline
PERCENT CPU SAMPLES STACK
3.85% 12316 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
3.78% 12076 org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
3.72% 11905 org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc()
2.88% 9199 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
2.31% 7380 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.27% 7270 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
2.25% 7211 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
2.23% 7139 org.apache.lucene.index.SingletonSortedNumericDocValues#nextValue()
1.88% 6006 java.nio.Buffer#checkIndex()
1.86% 5965 jdk.internal.misc.Unsafe#convEndian()
1.85% 5916 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
1.72% 5491 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.49% 4780 java.nio.DirectByteBuffer#ix()
1.42% 4548 java.nio.Buffer#scope()
1.40% 4465 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
1.39% 4434 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.33% 4254 org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.32% 4219 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
1.28% 4109 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#nextDoc()
1.28% 4089 jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.16% 3709 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.10% 3517 org.apache.lucene.store.ByteBufferGuard#getInt()
1.07% 3427 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
0.98% 3149 org.apache.lucene.search.ConjunctionDISI#doNext()
0.98% 3120 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
0.93% 2969 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue()
0.92% 2927 org.apache.lucene.store.ByteBufferGuard#getByte()
0.88% 2828 com.carrotsearch.hppc.IntIntHashMap#indexOf()
0.82% 2635 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
0.82% 2633 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
Candidate
PERCENT CPU SAMPLES STACK
4.15% 12823 org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
3.94% 12186 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
3.32% 10266 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.98% 9208 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
2.38% 7351 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
2.07% 6386 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#nextDoc()
1.85% 5723 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
1.81% 5600 jdk.internal.misc.Unsafe#convEndian()
1.81% 5588 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.75% 5409 java.nio.Buffer#checkIndex()
1.72% 5310 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
1.50% 4631 java.nio.Buffer#scope()
1.44% 4437 jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.43% 4408 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.39% 4297 java.nio.DirectByteBuffer#ix()
1.39% 4280 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
1.33% 4111 org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.31% 4052 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#nextDoc()
1.29% 3974 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
1.22% 3761 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.13% 3502 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.04% 3219 org.apache.lucene.search.ConjunctionDISI#doNext()
1.00% 3099 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
0.99% 3067 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue()
0.99% 3052 org.apache.lucene.store.ByteBufferGuard#getInt()
0.89% 2762 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
0.87% 2690 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
0.86% 2663 org.apache.lucene.store.ByteBufferGuard#getByte()
0.80% 2476 org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.78% 2420 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
Migrated from LUCENE-10346 by Feng Guo (@gf2121), updated Jan 04 2022
Pull requests: #574
CPU profile often tells
SingletonSortedNumericDocValues#nextDoc()is using a high percentage of CPU when running luceneutil, but thenextDoc()of dense cases should be rather simple. So I suspect that it is too many layers of abstraction (and wrap) that cause the stress of JVM. Unwraping it toNumericDocvaluesshows around 30% speed up.Baseline
Candidate
Migrated from LUCENE-10346 by Feng Guo (@gf2121), updated Jan 04 2022
Pull requests: #574