Skip to content

Fix tokenizer lifecycle close races#114

Merged
tazarov merged 3 commits into
mainfrom
codex/evaluate-concurrent-tokenization
Mar 22, 2026
Merged

Fix tokenizer lifecycle close races#114
tazarov merged 3 commits into
mainfrom
codex/evaluate-concurrent-tokenization

Conversation

@tazarov

@tazarov tazarov commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • guard Tokenizer lifecycle with an RWMutex and ErrTokenizerClosed so Close() cannot race in-flight operations
  • fix EncodePairs buffer cleanup on all return paths and harden Close() teardown ordering
  • add concurrent lifecycle coverage for Encode, EncodePairs, Decode, VocabSize, and concurrent double-close

Testing

  • go test ./... -count=1
  • go test -race ./... -count=1

Closes #108

@claude

claude Bot commented Mar 17, 2026

Copy link
Copy Markdown

Review: Fix tokenizer lifecycle close races

Overall this is a well-structured fix for the races described in #108. The beginOperation() helper cleanly avoids lock/check boilerplate, and the LIFO defer ordering ensures the RLock is held while freeBuffer defers run in EncodePairs. The test coverage is solid.

Issues

time.Sleep in concurrent tests is fragile
TestConcurrentEncodeAndClose and TestConcurrentMixedOperationsAndClose both call time.Sleep(10ms) to try to ensure goroutines are mid-encode before Close() is called. On a heavily loaded CI runner the goroutines may not have started yet, meaning Close() fires before any FFI call begins — the race you're trying to test is never exercised. Consider using a sync.WaitGroup or a shared counter (e.g. atomic.Int64) that increments when each worker has started its first encode, then block Close() until that counter reaches the goroutine count.

EncodePairs buffer cleanup doesn't cover the error path
The new defer freeBuffer block is placed after the rc != 0 early return:

if rc != 0 {
    return nil, errors.Wrap(...)  // buffers not freed here
}
defer func() { /* free buffers */ }()

If the Rust side returns an error but has partially populated buffers, those slots will leak. Even if the current Rust implementation never does this, placing the defer before the error check (and guarding with buffers[i] != zero) would be safer and explicit.

TestCloseWaitsForActiveOperations directly manipulates lifecycleMu

tok.lifecycleMu.RLock()

This is fine for a white-box test in the same package, but it could silently break if the locking strategy changes. A small comment noting the intent ("simulate an in-flight FFI call") would help future readers.

Minor

@github-actions

Copy link
Copy Markdown

Benchmark Comparison

goos: linux
goarch: amd64
pkg: github.com/amikos-tech/pure-tokenizers
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
                                  │ base_bench.txt │            pr_bench.txt            │
                                  │     sec/op     │    sec/op     vs base              │
Encode/Short-4                        9.913µ ± ∞ ¹   9.829µ ± ∞ ¹       ~ (p=0.135 n=5)
Encode/Medium-4                       41.75µ ± ∞ ¹   42.42µ ± ∞ ¹  +1.61% (p=0.016 n=5)
Encode/Long-4                         322.0µ ± ∞ ¹   309.2µ ± ∞ ¹       ~ (p=0.690 n=5)
EncodeWithOptions/Default-4           41.96µ ± ∞ ¹   42.09µ ± ∞ ¹       ~ (p=0.222 n=5)
EncodeWithOptions/WithTypeIDs-4       41.94µ ± ∞ ¹   42.27µ ± ∞ ¹       ~ (p=0.151 n=5)
EncodeWithOptions/WithTokens-4        41.83µ ± ∞ ¹   41.82µ ± ∞ ¹       ~ (p=0.690 n=5)
EncodeWithOptions/WithOffsets-4       42.00µ ± ∞ ¹   42.20µ ± ∞ ¹       ~ (p=0.151 n=5)
EncodeWithOptions/AllOptions-4        44.52µ ± ∞ ¹   44.59µ ± ∞ ¹       ~ (p=1.000 n=5)
Decode/WithSpecialTokens-4            17.92µ ± ∞ ¹   18.67µ ± ∞ ¹       ~ (p=0.222 n=5)
Decode/SkipSpecialTokens-4            17.99µ ± ∞ ¹   18.86µ ± ∞ ¹  +4.84% (p=0.032 n=5)
BatchEncode-4                         425.5µ ± ∞ ¹   423.6µ ± ∞ ¹       ~ (p=1.000 n=5)
FromHuggingFace/CreationOnly-4        41.96m ± ∞ ¹   39.24m ± ∞ ¹  -6.47% (p=0.008 n=5)
FromHuggingFace/FullLifecycle-4       41.60m ± ∞ ¹   39.11m ± ∞ ¹  -5.98% (p=0.008 n=5)
VocabSize-4                           3.471m ± ∞ ¹   3.217m ± ∞ ¹  -7.31% (p=0.008 n=5)
EncodeDecode/Short-4                  14.30µ ± ∞ ¹   14.27µ ± ∞ ¹       ~ (p=0.690 n=5)
EncodeDecode/Medium-4                 61.68µ ± ∞ ¹   61.76µ ± ∞ ¹       ~ (p=0.841 n=5)
EncodeDecode/Long-4                   433.6µ ± ∞ ¹   451.3µ ± ∞ ¹  +4.09% (p=0.032 n=5)
Truncation-4                          320.5µ ± ∞ ¹   313.3µ ± ∞ ¹       ~ (p=0.548 n=5)
Padding-4                             114.2µ ± ∞ ¹   115.2µ ± ∞ ¹       ~ (p=0.310 n=5)
ConcurrentCacheRead-4                 3.439µ ± ∞ ¹   3.433µ ± ∞ ¹       ~ (p=0.460 n=5)
ConcurrentCacheValidation-4           4.265µ ± ∞ ¹   4.268µ ± ∞ ¹       ~ (p=1.000 n=5)
ConcurrentHFCacheLookup-4             6.468µ ± ∞ ¹   6.452µ ± ∞ ¹       ~ (p=0.222 n=5)
DownloadWithFailureRecovery-4          1.122 ± ∞ ¹    1.101 ± ∞ ¹       ~ (p=0.690 n=5)
ConcurrentDownloadsWithFailures-4     44.26m ± ∞ ¹   43.97m ± ∞ ¹       ~ (p=1.000 n=5)
LoadFromCache-4                       5.428µ ± ∞ ¹   5.414µ ± ∞ ¹  -0.26% (p=0.048 n=5)
FromHuggingFaceWithoutCache-4         110.2µ ± ∞ ¹   110.9µ ± ∞ ¹       ~ (p=0.310 n=5)
geomean                               152.7µ         151.9µ        -0.53%
¹ need >= 6 samples for confidence interval at level 0.95

                                  │ base_bench.txt │             pr_bench.txt              │
                                  │      B/op      │     B/op       vs base                │
Encode/Short-4                         920.0 ± ∞ ¹     936.0 ± ∞ ¹  +1.74% (p=0.008 n=5)
Encode/Medium-4                      1.516Ki ± ∞ ¹   1.531Ki ± ∞ ¹  +1.03% (p=0.008 n=5)
Encode/Long-4                        6.703Ki ± ∞ ¹   6.719Ki ± ∞ ¹  +0.23% (p=0.008 n=5)
EncodeWithOptions/Default-4          1.516Ki ± ∞ ¹   1.531Ki ± ∞ ¹  +1.03% (p=0.008 n=5)
EncodeWithOptions/WithTypeIDs-4      1.609Ki ± ∞ ¹   1.625Ki ± ∞ ¹  +0.97% (p=0.008 n=5)
EncodeWithOptions/WithTokens-4       1.516Ki ± ∞ ¹   1.531Ki ± ∞ ¹  +1.03% (p=0.008 n=5)
EncodeWithOptions/WithOffsets-4      1.703Ki ± ∞ ¹   1.719Ki ± ∞ ¹  +0.92% (p=0.008 n=5)
EncodeWithOptions/AllOptions-4       2.109Ki ± ∞ ¹   2.125Ki ± ∞ ¹  +0.74% (p=0.008 n=5)
Decode/WithSpecialTokens-4             740.0 ± ∞ ¹     756.0 ± ∞ ¹  +2.16% (p=0.008 n=5)
Decode/SkipSpecialTokens-4             740.0 ± ∞ ¹     756.0 ± ∞ ¹  +2.16% (p=0.008 n=5)
BatchEncode-4                        11.30Ki ± ∞ ¹   11.38Ki ± ∞ ¹  +0.69% (p=0.008 n=5)
FromHuggingFace/CreationOnly-4       6.127Mi ± ∞ ¹   6.144Mi ± ∞ ¹  +0.26% (p=0.032 n=5)
FromHuggingFace/FullLifecycle-4      6.156Mi ± ∞ ¹   6.133Mi ± ∞ ¹       ~ (p=0.690 n=5)
VocabSize-4                            288.0 ± ∞ ¹     304.0 ± ∞ ¹  +5.56% (p=0.008 n=5)
EncodeDecode/Short-4                 1.516Ki ± ∞ ¹   1.547Ki ± ∞ ¹  +2.06% (p=0.008 n=5)
EncodeDecode/Medium-4                2.242Ki ± ∞ ¹   2.273Ki ± ∞ ¹  +1.39% (p=0.008 n=5)
EncodeDecode/Long-4                  8.430Ki ± ∞ ¹   8.461Ki ± ∞ ¹  +0.37% (p=0.008 n=5)
Truncation-4                         5.500Ki ± ∞ ¹   5.516Ki ± ∞ ¹  +0.28% (p=0.008 n=5)
Padding-4                            15.89Ki ± ∞ ¹   15.92Ki ± ∞ ¹  +0.20% (p=0.008 n=5)
ConcurrentCacheRead-4                2.062Ki ± ∞ ¹   2.062Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
ConcurrentCacheValidation-4          3.023Ki ± ∞ ¹   3.023Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
ConcurrentHFCacheLookup-4            3.196Ki ± ∞ ¹   3.195Ki ± ∞ ¹       ~ (p=1.000 n=5)
DownloadWithFailureRecovery-4        58.08Ki ± ∞ ¹   60.30Ki ± ∞ ¹       ~ (p=0.841 n=5)
ConcurrentDownloadsWithFailures-4    18.98Ki ± ∞ ¹   18.98Ki ± ∞ ¹       ~ (p=0.841 n=5)
LoadFromCache-4                      1.695Ki ± ∞ ¹   1.695Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
FromHuggingFaceWithoutCache-4        16.36Ki ± ∞ ¹   16.36Ki ± ∞ ¹       ~ (p=0.516 n=5)
geomean                              5.420Ki         5.474Ki        +1.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                                  │ base_bench.txt │             pr_bench.txt              │
                                  │   allocs/op    │  allocs/op    vs base                 │
Encode/Short-4                         16.00 ± ∞ ¹    17.00 ± ∞ ¹   +6.25% (p=0.008 n=5)
Encode/Medium-4                        35.00 ± ∞ ¹    36.00 ± ∞ ¹   +2.86% (p=0.008 n=5)
Encode/Long-4                          165.0 ± ∞ ¹    166.0 ± ∞ ¹   +0.61% (p=0.008 n=5)
EncodeWithOptions/Default-4            35.00 ± ∞ ¹    36.00 ± ∞ ¹   +2.86% (p=0.008 n=5)
EncodeWithOptions/WithTypeIDs-4        36.00 ± ∞ ¹    37.00 ± ∞ ¹   +2.78% (p=0.008 n=5)
EncodeWithOptions/WithTokens-4         35.00 ± ∞ ¹    36.00 ± ∞ ¹   +2.86% (p=0.008 n=5)
EncodeWithOptions/WithOffsets-4        36.00 ± ∞ ¹    37.00 ± ∞ ¹   +2.78% (p=0.008 n=5)
EncodeWithOptions/AllOptions-4         41.00 ± ∞ ¹    42.00 ± ∞ ¹   +2.44% (p=0.008 n=5)
Decode/WithSpecialTokens-4             10.00 ± ∞ ¹    11.00 ± ∞ ¹  +10.00% (p=0.008 n=5)
Decode/SkipSpecialTokens-4             10.00 ± ∞ ¹    11.00 ± ∞ ¹  +10.00% (p=0.008 n=5)
BatchEncode-4                          261.0 ± ∞ ¹    266.0 ± ∞ ¹   +1.92% (p=0.008 n=5)
FromHuggingFace/CreationOnly-4        92.20k ± ∞ ¹   92.20k ± ∞ ¹        ~ (p=0.405 n=5)
FromHuggingFace/FullLifecycle-4       92.21k ± ∞ ¹   92.21k ± ∞ ¹        ~ (p=1.000 n=5)
VocabSize-4                            5.000 ± ∞ ¹    6.000 ± ∞ ¹  +20.00% (p=0.008 n=5)
EncodeDecode/Short-4                   26.00 ± ∞ ¹    28.00 ± ∞ ¹   +7.69% (p=0.008 n=5)
EncodeDecode/Medium-4                  45.00 ± ∞ ¹    47.00 ± ∞ ¹   +4.44% (p=0.008 n=5)
EncodeDecode/Long-4                    175.0 ± ∞ ¹    177.0 ± ∞ ¹   +1.14% (p=0.008 n=5)
Truncation-4                           127.0 ± ∞ ¹    128.0 ± ∞ ¹   +0.79% (p=0.008 n=5)
Padding-4                              535.0 ± ∞ ¹    537.0 ± ∞ ¹   +0.37% (p=0.008 n=5)
ConcurrentCacheRead-4                  25.00 ± ∞ ¹    25.00 ± ∞ ¹        ~ (p=1.000 n=5) ²
ConcurrentCacheValidation-4            43.00 ± ∞ ¹    43.00 ± ∞ ¹        ~ (p=1.000 n=5) ²
ConcurrentHFCacheLookup-4              39.00 ± ∞ ¹    39.00 ± ∞ ¹        ~ (p=1.000 n=5) ²
DownloadWithFailureRecovery-4          456.0 ± ∞ ¹    457.0 ± ∞ ¹        ~ (p=0.889 n=5)
ConcurrentDownloadsWithFailures-4      233.0 ± ∞ ¹    233.0 ± ∞ ¹        ~ (p=1.000 n=5)
LoadFromCache-4                        7.000 ± ∞ ¹    7.000 ± ∞ ¹        ~ (p=1.000 n=5) ²
FromHuggingFaceWithoutCache-4          219.0 ± ∞ ¹    219.0 ± ∞ ¹        ~ (p=1.000 n=5) ²
geomean                                89.93          92.62         +2.98%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

@tazarov

tazarov commented Mar 22, 2026

Copy link
Copy Markdown
Contributor Author

@claude PTAL

@claude

claude Bot commented Mar 22, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 1m 37s —— View job


Re-review: Fix tokenizer lifecycle close races

  • Read current state of changed files
  • Check if previous review feedback was addressed
  • Provide updated review

Good progress since the last review. The follow-up commits added TestConcurrentCloseIsIdempotent, TestConcurrentMixedOperationsAndClose, improved beginOperation() to return the unlock function directly (cleaner API), and restructured Close() to unlock early before slow cleanup. The core design is solid.

Here's what I found on re-review:

Remaining issue from previous review

EncodePairs buffer cleanup still doesn't cover the error path (tokenizers.go:558-567)

The defer freeBuffer is placed after the rc < 0 early return, unchanged from what was flagged previously:

if rc < 0 {
    lastError := getErrorForCode(rc)
    return nil, errors.Wrap(lastError, "failed to encode pairs")  // buffers not freed
}
defer func() {
    for i := range buffers {
        t.freeBuffer(&buffers[i])
    }
}()

Note: Encode has the same pattern (tokenizers.go:461-467), but that predates this PR. For EncodePairs with its array of buffers, the risk of a partial-populate-then-error from Rust is higher. Moving the defer above the error check (guarding each freeBuffer call with a nil/zero check on the buffer fields) would close this gap. Fix this →

time.Sleep in concurrent tests (noted, not blocking)

TestConcurrentEncodeAndClose (line 125) and TestConcurrentMixedOperationsAndClose (line 207) still use time.Sleep(10ms) before calling Close(). The close(start) barrier ensures goroutines begin running, which is the important part — the sleep just gives them time to reach the FFI call. On a loaded CI runner this could be too short, making Close() fire before any FFI call begins.

This isn't blocking — the tests still validate correctness (no panics, no unexpected errors). But replacing the sleep with an atomic.Int64 counter that each worker increments after its first successful encode, then blocking Close() until the counter reaches goroutines, would make the race window deterministic.

Everything else looks good

  • beginOperation() returning the unlock func is clean and eliminates the defer t.lifecycleMu.RUnlock() pattern that previously separated the lock from its unlock site
  • Close() teardown ordering is correct: write-lock → set closed → snapshot locals → nil pointers → unlock → freeTokenizer → closeLibrary (LIFO via defer)
  • The early unlock in Close() means in-flight operations drain before the write lock is acquired, then slow cleanup (FFI free + dlclose) runs without holding the lock — good for latency
  • TestCloseWaitsForActiveOperations directly holding lifecycleMu.RLock() is a valid white-box approach for simulating an in-flight FFI call
  • GetLibraryVersion correctly returns "unknown" when closed rather than erroring

Verdict: One actionable item (buffer cleanup on error path). The time.Sleep nit is low priority. Otherwise LGTM.

@tazarov tazarov merged commit 00c068a into main Mar 22, 2026
13 of 15 checks passed
@tazarov tazarov deleted the codex/evaluate-concurrent-tokenization branch March 22, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Serialize tokenizer lifecycle operations to prevent Close/Encode races

1 participant