Skip to content

External compactions fail after a bulk import on a Bloom filter-enabled table #5517

Open
@DarwinKatanamp

Description

@DarwinKatanamp

Describe the bug
When doing a bulk import on tables that have a Bloom filter enabled, the external compactions fail with the error:

compactor_q1	org.apache.accumulo.compactor.Compactor	449	ERROR	Compactor thread was interrupted waiting for compaction to start, cancelling job	
java.lang.UnsupportedOperationException
	at org.apache.accumulo.core.file.BloomFilterLayer$Reader.estimateOverlappingEntries(BloomFilterLayer.java:434)
	at org.apache.accumulo.compactor.Compactor.estimateOverlappingEntries(Compactor.java:635)
	at org.apache.accumulo.compactor.Compactor$2.lambda$initialize$0(Compactor.java:546)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at org.apache.accumulo.compactor.Compactor$2.initialize(Compactor.java:540)
	at org.apache.accumulo.compactor.Compactor.run(Compactor.java:751)
	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Versions (OS, Maven, Java, and others, as appropriate):

  • Affected version(s) of this project: 2.1.3

To Reproduce
Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
Start a local fluo-uno cluster with the default external compactors enabled as defined in fluo-uno/install/accumulo-2.1.3/conf/cluster.yaml
branch: main (e8f3ba9), accumulo version 2.1.3

Generate local bulk import files with accumulo-examples/src/main/java/org/apache/accumulo/examples/mapreduce/bulk/BulkIngestExample.java
I disabled the client.tableOperations().importDirectory and performed the bulk import in the Accumulo shell.
I changed the default 1k rows to 2M rows.
branch: 2.1 (9d400cd)

Then copy to HDFS

hadoop fs -mkdir -p /tmp/bulkWork
hadoop fs -copyFromLocal /.../accumulo-examples/tmp/bulkWork/ /tmp/bulkWork

In the Accumulo shell:
createtable test1

Enable bloom filter
config -t test1 -s table.bloom.enabled=true
This config is very likely not necessary (but I figured it'd help triggering compactions)
config -t test1 -s table.split.threshold=100K

Configure external compactions in the shell:

config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"large","type":"external","queue":"q1"}]'
config -t test1 -s table.compaction.dispatcher.opts.service=cs1

Do the bulk load
importdirectory -t test1 /tmp/bulkWork/bulkWork/files true

Start compaction in the shell
compact -t test1 -w

Resulting in the specified errors in the Monitor.

Expected behavior
No errors when externally compacting bulk-loaded bloom filter-enabled tables.

Additional context
Note unrelated to the problem:
I had to disable
<arg>-Xlint:all</arg>
in the root pom.xml for the project to compile (with a clean clone, Java 21).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue has been verified to be a bug.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions