Skip to content

Use parallel consumers for bulk import indexing#1039

Open
henrik242 wants to merge 3 commits intokomoot:masterfrom
entur:henrik/import-perf-threading-only
Open

Use parallel consumers for bulk import indexing#1039
henrik242 wants to merge 3 commits intokomoot:masterfrom
entur:henrik/import-perf-threading-only

Conversation

@henrik242
Copy link
Copy Markdown
Contributor

Use 5 parallel consumer threads for OpenSearch bulk indexing, one per shard. Each thread has its own Importer instance with an independent BulkRequest buffer, sharing the thread-safe OpenSearchClient.

Replaces the sentinel-based single-thread shutdown with a volatile flag + poll timeout for clean multi-thread termination.

Benchmark (Belgium 11GB, 4.9M docs): 221s -> 166s (25% faster).

Use 5 parallel consumer threads for OpenSearch bulk indexing,
one per shard. Each thread has its own Importer instance with
an independent BulkRequest buffer, sharing the thread-safe
OpenSearchClient.

Replaces the sentinel-based single-thread shutdown with a
volatile flag + poll timeout for clean multi-thread termination.

Benchmark (Belgium 11GB, 4.9M docs): 221s -> 166s (25% faster).
close() replaces manual shutdown + awaitTermination (Java 19+).
The poll timeout only matters for detecting the end of import,
so 500ms is plenty.
@lonvia
Copy link
Copy Markdown
Collaborator

lonvia commented Mar 23, 2026

Last time I experimented with parallel threads, it resulted in quite noticable index bloat. So, let me test what this does to a planet.

The other question I'd have: is there an official recommendation to have as many import threads as shards? Have you tried with more/less threads?

@henrik242
Copy link
Copy Markdown
Contributor Author

henrik242 commented Mar 23, 2026

@lonvia I tried with various number of threads. Having threadnum=shardnum gave by far the best result. (I have also prepared another commit which depends on this, which gives another 8% improvement)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants