External compactions fail after a bulk import on a Bloom filter-enabled table

**Describe the bug**
When doing a bulk import on tables that have a Bloom filter enabled, the external compactions fail with the error:

```
compactor_q1	org.apache.accumulo.compactor.Compactor	449	ERROR	Compactor thread was interrupted waiting for compaction to start, cancelling job	
java.lang.UnsupportedOperationException
	at org.apache.accumulo.core.file.BloomFilterLayer$Reader.estimateOverlappingEntries(BloomFilterLayer.java:434)
	at org.apache.accumulo.compactor.Compactor.estimateOverlappingEntries(Compactor.java:635)
	at org.apache.accumulo.compactor.Compactor$2.lambda$initialize$0(Compactor.java:546)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at org.apache.accumulo.compactor.Compactor$2.initialize(Compactor.java:540)
	at org.apache.accumulo.compactor.Compactor.run(Compactor.java:751)
	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
	at java.base/java.lang.Thread.run(Thread.java:1583)
```

**Versions (OS, Maven, Java, and others, as appropriate):**
 - Affected version(s) of this project: 2.1.3

**To Reproduce**
Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
Start a local fluo-uno cluster with the default external compactors enabled as defined in fluo-uno/install/accumulo-2.1.3/conf/cluster.yaml
branch: main (e8f3ba9), accumulo version 2.1.3

Generate local bulk import files with accumulo-examples/src/main/java/org/apache/accumulo/examples/mapreduce/bulk/BulkIngestExample.java
I disabled the client.tableOperations().importDirectory and performed the bulk import in the Accumulo shell.
I changed the default 1k rows to 2M rows.
branch: 2.1 (9d400cd)

Then copy to HDFS
```
hadoop fs -mkdir -p /tmp/bulkWork
hadoop fs -copyFromLocal /.../accumulo-examples/tmp/bulkWork/ /tmp/bulkWork
```

In the Accumulo shell:
```createtable test1```

Enable bloom filter
```config -t test1 -s table.bloom.enabled=true```
This config is very likely not necessary (but I figured it'd help triggering compactions)
```config -t test1 -s table.split.threshold=100K```

Configure external compactions in the shell:
```
config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"large","type":"external","queue":"q1"}]'
config -t test1 -s table.compaction.dispatcher.opts.service=cs1
```

Do the bulk load
```importdirectory -t test1 /tmp/bulkWork/bulkWork/files true```

Start compaction in the shell
```compact -t test1 -w```

Resulting in the specified errors in the Monitor.

**Expected behavior**
No errors when externally compacting bulk-loaded bloom filter-enabled tables.

**Additional context**
Note unrelated to the problem:
I had to disable 
```              <arg>-Xlint:all</arg>```
in the root pom.xml for the project to compile (with a clean clone, Java 21).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

External compactions fail after a bulk import on a Bloom filter-enabled table #5517

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

External compactions fail after a bulk import on a Bloom filter-enabled table #5517

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions