Description
Describe the bug
When doing a bulk import on tables that have a Bloom filter enabled, the external compactions fail with the error:
compactor_q1 org.apache.accumulo.compactor.Compactor 449 ERROR Compactor thread was interrupted waiting for compaction to start, cancelling job
java.lang.UnsupportedOperationException
at org.apache.accumulo.core.file.BloomFilterLayer$Reader.estimateOverlappingEntries(BloomFilterLayer.java:434)
at org.apache.accumulo.compactor.Compactor.estimateOverlappingEntries(Compactor.java:635)
at org.apache.accumulo.compactor.Compactor$2.lambda$initialize$0(Compactor.java:546)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at org.apache.accumulo.compactor.Compactor$2.initialize(Compactor.java:540)
at org.apache.accumulo.compactor.Compactor.run(Compactor.java:751)
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
at java.base/java.lang.Thread.run(Thread.java:1583)
Versions (OS, Maven, Java, and others, as appropriate):
- Affected version(s) of this project: 2.1.3
To Reproduce
Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
Start a local fluo-uno cluster with the default external compactors enabled as defined in fluo-uno/install/accumulo-2.1.3/conf/cluster.yaml
branch: main (e8f3ba9), accumulo version 2.1.3
Generate local bulk import files with accumulo-examples/src/main/java/org/apache/accumulo/examples/mapreduce/bulk/BulkIngestExample.java
I disabled the client.tableOperations().importDirectory and performed the bulk import in the Accumulo shell.
I changed the default 1k rows to 2M rows.
branch: 2.1 (9d400cd)
Then copy to HDFS
hadoop fs -mkdir -p /tmp/bulkWork
hadoop fs -copyFromLocal /.../accumulo-examples/tmp/bulkWork/ /tmp/bulkWork
In the Accumulo shell:
createtable test1
Enable bloom filter
config -t test1 -s table.bloom.enabled=true
This config is very likely not necessary (but I figured it'd help triggering compactions)
config -t test1 -s table.split.threshold=100K
Configure external compactions in the shell:
config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"large","type":"external","queue":"q1"}]'
config -t test1 -s table.compaction.dispatcher.opts.service=cs1
Do the bulk load
importdirectory -t test1 /tmp/bulkWork/bulkWork/files true
Start compaction in the shell
compact -t test1 -w
Resulting in the specified errors in the Monitor.
Expected behavior
No errors when externally compacting bulk-loaded bloom filter-enabled tables.
Additional context
Note unrelated to the problem:
I had to disable
<arg>-Xlint:all</arg>
in the root pom.xml for the project to compile (with a clean clone, Java 21).