Fix tokenizer lifecycle close races by tazarov · Pull Request #114 · amikos-tech/pure-tokenizers

tazarov · 2026-03-17T15:07:51Z

Summary

guard Tokenizer lifecycle with an RWMutex and ErrTokenizerClosed so Close() cannot race in-flight operations
fix EncodePairs buffer cleanup on all return paths and harden Close() teardown ordering
add concurrent lifecycle coverage for Encode, EncodePairs, Decode, VocabSize, and concurrent double-close

Testing

go test ./... -count=1
go test -race ./... -count=1

Closes #108

claude · 2026-03-17T15:10:32Z

Review: Fix tokenizer lifecycle close races

Overall this is a well-structured fix for the races described in #108. The beginOperation() helper cleanly avoids lock/check boilerplate, and the LIFO defer ordering ensures the RLock is held while freeBuffer defers run in EncodePairs. The test coverage is solid.

Issues

time.Sleep in concurrent tests is fragile
TestConcurrentEncodeAndClose and TestConcurrentMixedOperationsAndClose both call time.Sleep(10ms) to try to ensure goroutines are mid-encode before Close() is called. On a heavily loaded CI runner the goroutines may not have started yet, meaning Close() fires before any FFI call begins — the race you're trying to test is never exercised. Consider using a sync.WaitGroup or a shared counter (e.g. atomic.Int64) that increments when each worker has started its first encode, then block Close() until that counter reaches the goroutine count.

EncodePairs buffer cleanup doesn't cover the error path
The new defer freeBuffer block is placed after the rc != 0 early return:

if rc != 0 {
    return nil, errors.Wrap(...)  // buffers not freed here
}
defer func() { /* free buffers */ }()

If the Rust side returns an error but has partially populated buffers, those slots will leak. Even if the current Rust implementation never does this, placing the defer before the error check (and guarding with buffers[i] != zero) would be safer and explicit.

TestCloseWaitsForActiveOperations directly manipulates lifecycleMu

tok.lifecycleMu.RLock()

This is fine for a white-box test in the same package, but it could silently break if the locking strategy changes. A small comment noting the intent ("simulate an in-flight FFI call") would help future readers.

Minor

EncodePair delegates to EncodePairs, so it's implicitly protected — no change needed, just noting it's covered.
The err named return in Close() combined with the defer closeLibrary pattern is correct but subtle; a brief comment explaining the propagation intent would help.
Related open issues [TST] Add regression coverage for concurrent tokenizer use and shutdown #112/[DOC] Document tokenizer concurrency and shutdown semantics #113 (tests) and [BUG] Guard Rust FFI entrypoints against process-aborting panics #109 (guard Rust FFI panics) are not addressed here — worth linking them to this PR or a follow-up.

github-actions · 2026-03-17T15:28:14Z

Benchmark Comparison

goos: linux
goarch: amd64
pkg: github.com/amikos-tech/pure-tokenizers
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
                                  │ base_bench.txt │            pr_bench.txt            │
                                  │     sec/op     │    sec/op     vs base              │
Encode/Short-4                        9.913µ ± ∞ ¹   9.829µ ± ∞ ¹       ~ (p=0.135 n=5)
Encode/Medium-4                       41.75µ ± ∞ ¹   42.42µ ± ∞ ¹  +1.61% (p=0.016 n=5)
Encode/Long-4                         322.0µ ± ∞ ¹   309.2µ ± ∞ ¹       ~ (p=0.690 n=5)
EncodeWithOptions/Default-4           41.96µ ± ∞ ¹   42.09µ ± ∞ ¹       ~ (p=0.222 n=5)
EncodeWithOptions/WithTypeIDs-4       41.94µ ± ∞ ¹   42.27µ ± ∞ ¹       ~ (p=0.151 n=5)
EncodeWithOptions/WithTokens-4        41.83µ ± ∞ ¹   41.82µ ± ∞ ¹       ~ (p=0.690 n=5)
EncodeWithOptions/WithOffsets-4       42.00µ ± ∞ ¹   42.20µ ± ∞ ¹       ~ (p=0.151 n=5)
EncodeWithOptions/AllOptions-4        44.52µ ± ∞ ¹   44.59µ ± ∞ ¹       ~ (p=1.000 n=5)
Decode/WithSpecialTokens-4            17.92µ ± ∞ ¹   18.67µ ± ∞ ¹       ~ (p=0.222 n=5)
Decode/SkipSpecialTokens-4            17.99µ ± ∞ ¹   18.86µ ± ∞ ¹  +4.84% (p=0.032 n=5)
BatchEncode-4                         425.5µ ± ∞ ¹   423.6µ ± ∞ ¹       ~ (p=1.000 n=5)
FromHuggingFace/CreationOnly-4        41.96m ± ∞ ¹   39.24m ± ∞ ¹  -6.47% (p=0.008 n=5)
FromHuggingFace/FullLifecycle-4       41.60m ± ∞ ¹   39.11m ± ∞ ¹  -5.98% (p=0.008 n=5)
VocabSize-4                           3.471m ± ∞ ¹   3.217m ± ∞ ¹  -7.31% (p=0.008 n=5)
EncodeDecode/Short-4                  14.30µ ± ∞ ¹   14.27µ ± ∞ ¹       ~ (p=0.690 n=5)
EncodeDecode/Medium-4                 61.68µ ± ∞ ¹   61.76µ ± ∞ ¹       ~ (p=0.841 n=5)
EncodeDecode/Long-4                   433.6µ ± ∞ ¹   451.3µ ± ∞ ¹  +4.09% (p=0.032 n=5)
Truncation-4                          320.5µ ± ∞ ¹   313.3µ ± ∞ ¹       ~ (p=0.548 n=5)
Padding-4                             114.2µ ± ∞ ¹   115.2µ ± ∞ ¹       ~ (p=0.310 n=5)
ConcurrentCacheRead-4                 3.439µ ± ∞ ¹   3.433µ ± ∞ ¹       ~ (p=0.460 n=5)
ConcurrentCacheValidation-4           4.265µ ± ∞ ¹   4.268µ ± ∞ ¹       ~ (p=1.000 n=5)
ConcurrentHFCacheLookup-4             6.468µ ± ∞ ¹   6.452µ ± ∞ ¹       ~ (p=0.222 n=5)
DownloadWithFailureRecovery-4          1.122 ± ∞ ¹    1.101 ± ∞ ¹       ~ (p=0.690 n=5)
ConcurrentDownloadsWithFailures-4     44.26m ± ∞ ¹   43.97m ± ∞ ¹       ~ (p=1.000 n=5)
LoadFromCache-4                       5.428µ ± ∞ ¹   5.414µ ± ∞ ¹  -0.26% (p=0.048 n=5)
FromHuggingFaceWithoutCache-4         110.2µ ± ∞ ¹   110.9µ ± ∞ ¹       ~ (p=0.310 n=5)
geomean                               152.7µ         151.9µ        -0.53%
¹ need >= 6 samples for confidence interval at level 0.95

                                  │ base_bench.txt │             pr_bench.txt              │
                                  │      B/op      │     B/op       vs base                │
Encode/Short-4                         920.0 ± ∞ ¹     936.0 ± ∞ ¹  +1.74% (p=0.008 n=5)
Encode/Medium-4                      1.516Ki ± ∞ ¹   1.531Ki ± ∞ ¹  +1.03% (p=0.008 n=5)
Encode/Long-4                        6.703Ki ± ∞ ¹   6.719Ki ± ∞ ¹  +0.23% (p=0.008 n=5)
EncodeWithOptions/Default-4          1.516Ki ± ∞ ¹   1.531Ki ± ∞ ¹  +1.03% (p=0.008 n=5)
EncodeWithOptions/WithTypeIDs-4      1.609Ki ± ∞ ¹   1.625Ki ± ∞ ¹  +0.97% (p=0.008 n=5)
EncodeWithOptions/WithTokens-4       1.516Ki ± ∞ ¹   1.531Ki ± ∞ ¹  +1.03% (p=0.008 n=5)
EncodeWithOptions/WithOffsets-4      1.703Ki ± ∞ ¹   1.719Ki ± ∞ ¹  +0.92% (p=0.008 n=5)
EncodeWithOptions/AllOptions-4       2.109Ki ± ∞ ¹   2.125Ki ± ∞ ¹  +0.74% (p=0.008 n=5)
Decode/WithSpecialTokens-4             740.0 ± ∞ ¹     756.0 ± ∞ ¹  +2.16% (p=0.008 n=5)
Decode/SkipSpecialTokens-4             740.0 ± ∞ ¹     756.0 ± ∞ ¹  +2.16% (p=0.008 n=5)
BatchEncode-4                        11.30Ki ± ∞ ¹   11.38Ki ± ∞ ¹  +0.69% (p=0.008 n=5)
FromHuggingFace/CreationOnly-4       6.127Mi ± ∞ ¹   6.144Mi ± ∞ ¹  +0.26% (p=0.032 n=5)
FromHuggingFace/FullLifecycle-4      6.156Mi ± ∞ ¹   6.133Mi ± ∞ ¹       ~ (p=0.690 n=5)
VocabSize-4                            288.0 ± ∞ ¹     304.0 ± ∞ ¹  +5.56% (p=0.008 n=5)
EncodeDecode/Short-4                 1.516Ki ± ∞ ¹   1.547Ki ± ∞ ¹  +2.06% (p=0.008 n=5)
EncodeDecode/Medium-4                2.242Ki ± ∞ ¹   2.273Ki ± ∞ ¹  +1.39% (p=0.008 n=5)
EncodeDecode/Long-4                  8.430Ki ± ∞ ¹   8.461Ki ± ∞ ¹  +0.37% (p=0.008 n=5)
Truncation-4                         5.500Ki ± ∞ ¹   5.516Ki ± ∞ ¹  +0.28% (p=0.008 n=5)
Padding-4                            15.89Ki ± ∞ ¹   15.92Ki ± ∞ ¹  +0.20% (p=0.008 n=5)
ConcurrentCacheRead-4                2.062Ki ± ∞ ¹   2.062Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
ConcurrentCacheValidation-4          3.023Ki ± ∞ ¹   3.023Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
ConcurrentHFCacheLookup-4            3.196Ki ± ∞ ¹   3.195Ki ± ∞ ¹       ~ (p=1.000 n=5)
DownloadWithFailureRecovery-4        58.08Ki ± ∞ ¹   60.30Ki ± ∞ ¹       ~ (p=0.841 n=5)
ConcurrentDownloadsWithFailures-4    18.98Ki ± ∞ ¹   18.98Ki ± ∞ ¹       ~ (p=0.841 n=5)
LoadFromCache-4                      1.695Ki ± ∞ ¹   1.695Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
FromHuggingFaceWithoutCache-4        16.36Ki ± ∞ ¹   16.36Ki ± ∞ ¹       ~ (p=0.516 n=5)
geomean                              5.420Ki         5.474Ki        +1.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                                  │ base_bench.txt │             pr_bench.txt              │
                                  │   allocs/op    │  allocs/op    vs base                 │
Encode/Short-4                         16.00 ± ∞ ¹    17.00 ± ∞ ¹   +6.25% (p=0.008 n=5)
Encode/Medium-4                        35.00 ± ∞ ¹    36.00 ± ∞ ¹   +2.86% (p=0.008 n=5)
Encode/Long-4                          165.0 ± ∞ ¹    166.0 ± ∞ ¹   +0.61% (p=0.008 n=5)
EncodeWithOptions/Default-4            35.00 ± ∞ ¹    36.00 ± ∞ ¹   +2.86% (p=0.008 n=5)
EncodeWithOptions/WithTypeIDs-4        36.00 ± ∞ ¹    37.00 ± ∞ ¹   +2.78% (p=0.008 n=5)
EncodeWithOptions/WithTokens-4         35.00 ± ∞ ¹    36.00 ± ∞ ¹   +2.86% (p=0.008 n=5)
EncodeWithOptions/WithOffsets-4        36.00 ± ∞ ¹    37.00 ± ∞ ¹   +2.78% (p=0.008 n=5)
EncodeWithOptions/AllOptions-4         41.00 ± ∞ ¹    42.00 ± ∞ ¹   +2.44% (p=0.008 n=5)
Decode/WithSpecialTokens-4             10.00 ± ∞ ¹    11.00 ± ∞ ¹  +10.00% (p=0.008 n=5)
Decode/SkipSpecialTokens-4             10.00 ± ∞ ¹    11.00 ± ∞ ¹  +10.00% (p=0.008 n=5)
BatchEncode-4                          261.0 ± ∞ ¹    266.0 ± ∞ ¹   +1.92% (p=0.008 n=5)
FromHuggingFace/CreationOnly-4        92.20k ± ∞ ¹   92.20k ± ∞ ¹        ~ (p=0.405 n=5)
FromHuggingFace/FullLifecycle-4       92.21k ± ∞ ¹   92.21k ± ∞ ¹        ~ (p=1.000 n=5)
VocabSize-4                            5.000 ± ∞ ¹    6.000 ± ∞ ¹  +20.00% (p=0.008 n=5)
EncodeDecode/Short-4                   26.00 ± ∞ ¹    28.00 ± ∞ ¹   +7.69% (p=0.008 n=5)
EncodeDecode/Medium-4                  45.00 ± ∞ ¹    47.00 ± ∞ ¹   +4.44% (p=0.008 n=5)
EncodeDecode/Long-4                    175.0 ± ∞ ¹    177.0 ± ∞ ¹   +1.14% (p=0.008 n=5)
Truncation-4                           127.0 ± ∞ ¹    128.0 ± ∞ ¹   +0.79% (p=0.008 n=5)
Padding-4                              535.0 ± ∞ ¹    537.0 ± ∞ ¹   +0.37% (p=0.008 n=5)
ConcurrentCacheRead-4                  25.00 ± ∞ ¹    25.00 ± ∞ ¹        ~ (p=1.000 n=5) ²
ConcurrentCacheValidation-4            43.00 ± ∞ ¹    43.00 ± ∞ ¹        ~ (p=1.000 n=5) ²
ConcurrentHFCacheLookup-4              39.00 ± ∞ ¹    39.00 ± ∞ ¹        ~ (p=1.000 n=5) ²
DownloadWithFailureRecovery-4          456.0 ± ∞ ¹    457.0 ± ∞ ¹        ~ (p=0.889 n=5)
ConcurrentDownloadsWithFailures-4      233.0 ± ∞ ¹    233.0 ± ∞ ¹        ~ (p=1.000 n=5)
LoadFromCache-4                        7.000 ± ∞ ¹    7.000 ± ∞ ¹        ~ (p=1.000 n=5) ²
FromHuggingFaceWithoutCache-4          219.0 ± ∞ ¹    219.0 ± ∞ ¹        ~ (p=1.000 n=5) ²
geomean                                89.93          92.62         +2.98%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

tazarov · 2026-03-22T14:25:52Z

@claude PTAL

claude · 2026-03-22T14:26:07Z

Claude finished @tazarov's task in 1m 37s —— View job

Re-review: Fix tokenizer lifecycle close races

Read current state of changed files
Check if previous review feedback was addressed
Provide updated review

Good progress since the last review. The follow-up commits added TestConcurrentCloseIsIdempotent, TestConcurrentMixedOperationsAndClose, improved beginOperation() to return the unlock function directly (cleaner API), and restructured Close() to unlock early before slow cleanup. The core design is solid.

Here's what I found on re-review:

Remaining issue from previous review

EncodePairs buffer cleanup still doesn't cover the error path (tokenizers.go:558-567)

The defer freeBuffer is placed after the rc < 0 early return, unchanged from what was flagged previously:

if rc < 0 {
    lastError := getErrorForCode(rc)
    return nil, errors.Wrap(lastError, "failed to encode pairs")  // buffers not freed
}
defer func() {
    for i := range buffers {
        t.freeBuffer(&buffers[i])
    }
}()

Note: Encode has the same pattern (tokenizers.go:461-467), but that predates this PR. For EncodePairs with its array of buffers, the risk of a partial-populate-then-error from Rust is higher. Moving the defer above the error check (guarding each freeBuffer call with a nil/zero check on the buffer fields) would close this gap. Fix this →

`time.Sleep` in concurrent tests (noted, not blocking)

TestConcurrentEncodeAndClose (line 125) and TestConcurrentMixedOperationsAndClose (line 207) still use time.Sleep(10ms) before calling Close(). The close(start) barrier ensures goroutines begin running, which is the important part — the sleep just gives them time to reach the FFI call. On a loaded CI runner this could be too short, making Close() fire before any FFI call begins.

This isn't blocking — the tests still validate correctness (no panics, no unexpected errors). But replacing the sleep with an atomic.Int64 counter that each worker increments after its first successful encode, then blocking Close() until the counter reaches goroutines, would make the race window deterministic.

Everything else looks good

beginOperation() returning the unlock func is clean and eliminates the defer t.lifecycleMu.RUnlock() pattern that previously separated the lock from its unlock site
Close() teardown ordering is correct: write-lock → set closed → snapshot locals → nil pointers → unlock → freeTokenizer → closeLibrary (LIFO via defer)
The early unlock in Close() means in-flight operations drain before the write lock is acquired, then slow cleanup (FFI free + dlclose) runs without holding the lock — good for latency
TestCloseWaitsForActiveOperations directly holding lifecycleMu.RLock() is a valid white-box approach for simulating an in-flight FFI call
GetLibraryVersion correctly returns "unknown" when closed rather than erroring

Verdict: One actionable item (buffer cleanup on error path). The time.Sleep nit is low priority. Otherwise LGTM.

tazarov added 3 commits March 17, 2026 14:59

Guard tokenizer lifecycle against close races

fb3de25

Address lifecycle review feedback

8fc64d5

Tighten lifecycle shutdown coverage

53b9229

tazarov mentioned this pull request Mar 17, 2026

[ENH] Raise Go support floor to 1.25.x #115

Open

tazarov mentioned this pull request Mar 22, 2026

[BUG] Reclaim Rust encode allocations on CString conversion failures #116

Open

tazarov merged commit 00c068a into main Mar 22, 2026
13 of 15 checks passed

tazarov deleted the codex/evaluate-concurrent-tokenization branch March 22, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tokenizer lifecycle close races#114

Fix tokenizer lifecycle close races#114
tazarov merged 3 commits into
mainfrom
codex/evaluate-concurrent-tokenization

tazarov commented Mar 17, 2026

Uh oh!

claude Bot commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

tazarov commented Mar 22, 2026

Uh oh!

claude Bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tazarov commented Mar 17, 2026

Summary

Testing

Uh oh!

claude Bot commented Mar 17, 2026

Review: Fix tokenizer lifecycle close races

Issues

Minor

Uh oh!

github-actions Bot commented Mar 17, 2026

Benchmark Comparison

Uh oh!

tazarov commented Mar 22, 2026

Uh oh!

claude Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: Fix tokenizer lifecycle close races

Remaining issue from previous review

time.Sleep in concurrent tests (noted, not blocking)

Everything else looks good

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Mar 22, 2026 •

edited

Loading

`time.Sleep` in concurrent tests (noted, not blocking)