Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: concurrent, overlapping L0→Lbase compactions permitted #875

Open
jbowens opened this issue Sep 4, 2020 · 2 comments · May be fixed by #885
Open

db: concurrent, overlapping L0→Lbase compactions permitted #875

jbowens opened this issue Sep 4, 2020 · 2 comments · May be fixed by #885

Comments

@jbowens
Copy link
Collaborator

@jbowens jbowens commented Sep 4, 2020

I was testing a nightly metamorphic tests TeamCity configuration and it uncovered an invariant violation.

https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_Pebble_Metamorphic/2249146:id/stress.log

[16:06:59][Step 1/1]             ===== SEED =====
[16:06:59][Step 1/1]             1599250004201769425
[16:06:59][Step 1/1]             ===== ERR =====
[16:06:59][Step 1/1]             exit status 1
[16:06:59][Step 1/1]             ===== OUT =====
[16:06:59][Step 1/1]             pebble: internal error: L5 files 000907 and 000908 have overlapping ranges: [jrcmlrtjrp#536,MERGE-jrcmlrtjrp#536,MERGE] vs [ijwnptq#535,MERGE-zxngb#72057594037927935,RANGEDEL]
[16:06:59][Step 1/1]             
[16:06:59][Step 1/1]             ===== OPTIONS =====
[16:06:59][Step 1/1]             [Version]
[16:06:59][Step 1/1]               pebble_version=0.1
[16:06:59][Step 1/1]             
[16:06:59][Step 1/1]             [Options]
[16:06:59][Step 1/1]               bytes_per_sync=32768
[16:06:59][Step 1/1]               cache_size=16777216
[16:06:59][Step 1/1]               cleaner=delete
[16:06:59][Step 1/1]               comparer=leveldb.BytewiseComparator
[16:06:59][Step 1/1]               delete_range_flush_delay=0s
[16:06:59][Step 1/1]               disable_wal=true
[16:06:59][Step 1/1]               flush_split_bytes=131072
[16:06:59][Step 1/1]               l0_compaction_concurrency=1
[16:06:59][Step 1/1]               l0_compaction_threshold=2
[16:06:59][Step 1/1]               l0_stop_writes_threshold=44
[16:06:59][Step 1/1]               l0_sublevel_compactions=true
[16:06:59][Step 1/1]               lbase_max_bytes=4096
[16:06:59][Step 1/1]               max_concurrent_compactions=3
[16:06:59][Step 1/1]               max_manifest_file_size=524288
[16:06:59][Step 1/1]               max_open_files=1000
[16:06:59][Step 1/1]               mem_table_size=1024
[16:06:59][Step 1/1]               mem_table_stop_writes_threshold=3
[16:06:59][Step 1/1]               min_compaction_rate=4194304
[16:06:59][Step 1/1]               min_flush_rate=1048576
[16:06:59][Step 1/1]               merger=pebble.concatenate
[16:06:59][Step 1/1]               table_property_collectors=[]
[16:06:59][Step 1/1]               wal_dir=wal
[16:06:59][Step 1/1]             
[16:06:59][Step 1/1]             [Level "0"]
[16:06:59][Step 1/1]               block_restart_interval=64
[16:06:59][Step 1/1]               block_size=512
[16:06:59][Step 1/1]               compression=Default
[16:06:59][Step 1/1]               filter_policy=none
[16:06:59][Step 1/1]               filter_type=table
[16:06:59][Step 1/1]               index_block_size=524288
[16:06:59][Step 1/1]               target_file_size=8388608
[16:06:59][Step 1/1]             
[16:06:59][Step 1/1]             [TestOptions]
[16:06:59][Step 1/1]               ingest_using_apply=true
@jbowens
Copy link
Collaborator Author

@jbowens jbowens commented Sep 4, 2020

Running

go test -mod=vendor -tags 'invariants' -exec 'stress -p 1' -timeout 0 -test.v \
   -run TestMeta$ ./internal/metamorphic -seed 1599250004201769425

on master ( c29cc16 ) should reproduce locally.

@jbowens
Copy link
Collaborator Author

@jbowens jbowens commented Sep 4, 2020

            // INFO: [JOB 1495] flushing: sstable created 000907
            // INFO: [JOB 1496] compacting L0 [000905 000906] (1.6 K) + L5 [] (0 B)
            // INFO: [JOB 1496] compacting: sstable created 000908
            // INFO: [JOB 1495] flushed 2 memtables to L0 [000907] (793 B), in 0.0s, output rate 2.4 M/s
            // INFO: write stall ending
            db.DeleteRange("craaxyri", "jjyjut") // <nil> #1533
            // INFO: [JOB 1497] compacting L0 [000907] (793 B) + L5 [] (0 B)
            // INFO: [JOB 1497] compacted L0 [000907] (793 B) + L5 [] (0 B) -> L5 [000907] (793 B), in 0.0s, output rate 259 M/s
            // INFO: [JOB 1498] flushing 2 memtables to L0
            // INFO: [JOB 1498] flushing: sstable created 000909
            // INFO: [JOB 1496] compaction to L5 error: pebble: internal error: L5 files 000907 and 000908 have overlapping ranges: [jrcmlrtjrp#536,MERGE-jrcmlrtjrp#536,MERGE] vs [ijwnptq#535,MERGE-zxngb#72057594037927935,RANGEDEL]

I’m kinda surprised that we haven’t hit this before. It looks like two files were moved from L0→L5 concurrently. AFAIK, we don’t have any checks in place to prevent this. We just check to make sure compaction inputs aren’t already compacting, but two L0→Lbase compactions can overlap without sharing any compaction inputs if their intersection does not overlap any tables in Lbase.

This pickFile comment is relevant:

pebble/compaction_picker.go

Lines 772 to 774 in b66f9d0

// TODO(peter): For concurrent compactions, we may want to try harder to
// pick a seed file whose resulting compaction bounds do not overlap with
// an in-progress compaction.

@jbowens jbowens changed the title db: internal error: L5 files 000907 and 000908 have overlapping ranges db: concurrent, overlapping L0→Lbase compactions permitted Sep 4, 2020
jbowens added a commit to jbowens/pebble that referenced this issue Sep 9, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
jbowens added a commit to jbowens/pebble that referenced this issue Sep 9, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
jbowens added a commit to jbowens/pebble that referenced this issue Sep 9, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
jbowens added a commit to jbowens/pebble that referenced this issue Sep 10, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
jbowens added a commit to jbowens/pebble that referenced this issue Sep 11, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
jbowens added a commit to jbowens/pebble that referenced this issue Sep 11, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
jbowens added a commit to jbowens/pebble that referenced this issue Sep 14, 2020
Previously, it was possible for compaction picking to pick an L0->LBase
compaction that conflicted with an existing L0->LBase compaction under
specific conditions.

A flush of a new sstable constructs a new Version and sublevel
intervals. If a flushed table does not overlap any individual
currently-compacting files, its intervals within sublevels
datastructures are not marked as compacting, although the flushed table
may still overlap with the active compaction's key range (eg, if the new
sstable sits squarely between two compacting files' keyspaces). If no
files exist in LBase that intersect with the intersection of the newly
flushed sstable's keyspace and the active compaction's keyspace, a
new compaction of the flush sstable would be allowed to start.

This change augments our existing concurrent compaction failsafe to
compare picked L0->LBase compaction key ranges against the key ranges of
in-progress compactions, preventing the compaction if necessary.

I think we'll probably want to also adapt L0 sublevels to avoid picking
these base compactions earlier through passing active compactions into
InitCompactingFileInfo, but adding this to the failsafe also patches the
non-sublevels codepaths and will provide defense in depth once we fix
L0-sublevel picking.

Fix cockroachdb#875.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

1 participant
You can’t perform that action at this time.