fix: Tokenizer initialization race condition (multiple parallel downloads) by albertoperdomo2 · Pull Request #291 · llm-d/llm-d-kv-cache

albertoperdomo2 · 2026-02-04T11:54:35Z

Summary

Under high QPS during cold start, multiple concurrent requests triggered duplicate tokenizer downloads due to a check-then-act race condition. It is mentioned in #191 and this caused:

Parallel downloads for a single model tokenizer
High request failure rate from file corruption
High memory usage (for parallel downloads)

The steps I followed to verify the condition with meta-llama/Llama-3.2-1B are:

Create venv with requirements

cd services/uds_tokenizer
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install psutil  # for memory monitoring

Clear Tokenizer cache

rm -rf ~/.cache/huggingface/hub/models--*
rm -rf models/meta-llama

Ensure socket directory exists: mkdir -p /tmp/tokenizer
Start the gRPC service

python run_grpc_server.py

In a new terminal window, I ran a script to verify the race condition existed.

To solve it, I implemented double-checked locking with per-model locks in, with a fast path (cache hit) with zero locking overhead and a slow path (cache miss) using threading.Lock to ensure only one thread downloads each model.

Test plan

The initial runs of the script (which launches 50 requests at the same time) reported:

============================================================
Memory Summary
============================================================
Initial:  75.7 MB
Peak:     494.5 MB
Final:    298.6 MB
Increase: 223.0 MB

Total test duration: 31.33s

with several errors in the service logs like:

2026-02-04 10:30:15,891 [ERROR] [root] Failed to load tokenizer from /<path>/<to>/llm-d-kv-cache/services/uds_tokenizer/models/meta-llama/Llama-3.2-1B: No such file or directory (os error 2)

2026-02-04 10:30:15,891 [ERROR] [root] Failed to initialize tokenizer for model meta-llama/Llama-3.2-1B: Failed to load tokenizer: No such file or directory (os error 2)

2026-02-04 10:30:20,153 [ERROR] [root] Failed to initialize tokenizer for model meta-llama/Llama-3.2-1B: Failed to load tokenizer: ...

And after the proposed fix:

============================================================
Memory Summary
============================================================
Initial:  75.3 MB
Peak:     173.7 MB
Final:    173.7 MB
Increase: 98.4 MB

Total test duration: 2.16s

Ensuring that the initial memory leak is gone, which directly impacts performance too.

Related issues

[bug] tokenizer memory-leak on aggressive startup #191

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

liu-cong · 2026-02-06T03:31:03Z

pkg/preprocessing/chat_completions/tokenizer_wrapper.py

+            acquired_locks.append(lock)
+
+            _tokenizer_cache.clear()
+            return "Tokenizer caches cleared"


does this mean you will never have more than 1 item in acquired_locks given you return immediately after acquiring one?

This should be outside of the loop, my bad.

liu-cong · 2026-02-06T03:37:44Z

pkg/preprocessing/chat_completions/tokenizer_wrapper.py

-        _tokenizer_cache[key] = tokenizer
-        return key
+
+        lock = _cache_locks[key]


don't we need a lock to synchronize access to the _cache_locks dict?

You are right, that would be necessary. This is primarily based on my approach to have per model locking, but if we do not need that, we can make it simpler with a single global lock.

liu-cong · 2026-02-06T03:40:50Z

pkg/preprocessing/chat_completions/tokenizer_wrapper.py

-    _tokenizer_cache.clear()
-    return "Tokenizer caches cleared"
+    # Sorted locks to avoid deadlock
+    keys_to_lock = sorted(_cache_locks.keys())


It looks like we never insert any item to _cache_locks map (relying on defaultdict to get the lock). So this will always be empty?

It is a defaultdict() so if you call it and the key does not exist, it will automatically create it on first access.

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

vMaroon · 2026-02-07T14:27:41Z

Does this still occur with the new approach of loading tokenizer on module initialization and not after requests come in? Might have missed closing that issue. cc @sagearc

albertoperdomo2 · 2026-02-07T14:33:31Z

I could certainly reproduce the issue (3 days ago).

services/uds_tokenizer/tokenizer_service/tokenizer.py

github-actions · 2026-02-13T21:52:33Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

sagearc · 2026-02-17T15:08:05Z

@vMaroon Initially, the tokenizer initialization at startup was intended for daulet/tokenizers. I'm unsure of the status for tokenizers initialized within the chat template python file. If I remember correctly, even the python interpreter itself is not safely initialized (it should only happen once) and is currently prone to race conditions within the CGO bindings file.
Soon it won't be an issue.

fix: Tokenizer initialization race condition

ea8afdb

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 requested review from dannyharnik, delavet, kfirtoledo and vMaroon as code owners February 4, 2026 11:54

vMaroon requested review from hyeongyun0916, liu-cong, sagearc and yankay February 4, 2026 11:54

liu-cong reviewed Feb 6, 2026

View reviewed changes

fix: Add global guard for lock access

8ff8b2e

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

vMaroon requested a review from liu-cong February 6, 2026 22:15

liu-cong reviewed Feb 13, 2026

View reviewed changes

services/uds_tokenizer/tokenizer_service/tokenizer.py Outdated Show resolved Hide resolved

fix: Missing lock guard

5e7eec7

vMaroon requested a review from liu-cong February 13, 2026 21:52

albertoperdomo2 changed the base branch from main to release/v0.4.0 February 26, 2026 21:39

albertoperdomo2 changed the base branch from release/v0.4.0 to main February 27, 2026 09:04

albertoperdomo2 mentioned this pull request Feb 27, 2026

fix: Tokenizer memory leak on initial high QPS #368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Tokenizer initialization race condition (multiple parallel downloads)#291

fix: Tokenizer initialization race condition (multiple parallel downloads)#291
albertoperdomo2 wants to merge 3 commits intollm-d:mainfrom
albertoperdomo2:fix/tokenizer-initialization-leak

albertoperdomo2 commented Feb 4, 2026 •

edited

Loading

Uh oh!

liu-cong Feb 6, 2026

Uh oh!

albertoperdomo2 Feb 6, 2026

Uh oh!

liu-cong Feb 6, 2026

Uh oh!

albertoperdomo2 Feb 6, 2026

Uh oh!

liu-cong Feb 6, 2026

Uh oh!

albertoperdomo2 Feb 6, 2026 •

edited

Loading

Uh oh!

vMaroon commented Feb 7, 2026

Uh oh!

albertoperdomo2 commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

sagearc commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

albertoperdomo2 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Related issues

Uh oh!

liu-cong Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

albertoperdomo2 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

liu-cong Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

albertoperdomo2 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

liu-cong Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

albertoperdomo2 Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vMaroon commented Feb 7, 2026

Uh oh!

albertoperdomo2 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

sagearc commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertoperdomo2 commented Feb 4, 2026 •

edited

Loading

albertoperdomo2 Feb 6, 2026 •

edited

Loading

albertoperdomo2 commented Feb 7, 2026 •

edited

Loading