opt: escape cold path + SIMD find_key_end + write_string_fast inline#223
opt: escape cold path + SIMD find_key_end + write_string_fast inline#223liuq19 merged 1 commit intocloudwego:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #223 +/- ##
==========================================
+ Coverage 70.60% 70.64% +0.03%
==========================================
Files 42 42
Lines 9550 9556 +6
==========================================
+ Hits 6743 6751 +8
+ Misses 2807 2805 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f198950 to
71cd48b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1989505af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #[cfg(target_arch = "x86_64")] | ||
| while i + 32 <= data.len() { | ||
| let mask = unsafe { | ||
| use std::arch::x86_64::*; | ||
| let chunk = _mm256_loadu_si256(data.as_ptr().add(i) as *const __m256i); | ||
| _mm256_movemask_epi8(_mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(b'"' as i8))) as u32 |
There was a problem hiding this comment.
Gate AVX2 key scan behind CPU feature checks
find_key_end enables the SIMD path on every x86_64 build, but the implementation uses AVX2 intrinsics directly. On x86_64 machines that do not support AVX2, calling this method will raise an illegal-instruction fault at runtime. The rest of the codebase feature-gates AVX2 paths (e.g., in util/arch), so this new API should either be guarded with target_feature = "avx2"/is_x86_feature_detected!("avx2") and a scalar fallback, or moved behind an AVX2-targeted function.
Useful? React with 👍 / 👎.
| if mask != 0 { | ||
| return i + mask.trailing_zeros() as usize - cur_idx; | ||
| } |
There was a problem hiding this comment.
Handle escaped quotes when locating key terminator
This returns on the first " byte it sees, but JSON object keys may contain escaped quotes (\"). In that case the function reports a shorter key length instead of continuing (or returning 0 as documented), so callers using this offset can slice the key at the wrong position and mis-parse escaped-key inputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Performance-focused changes aimed at reducing overhead in JSON string escaping, speeding up key parsing, and improving cross-module inlining opportunities.
Changes:
- Adjust
escape_uncheckedcodegen/profile and escape-table access patterns insrc/util/string.rs. - Add a new
Parser::find_key_end()SIMD-based quote search helper insrc/parser.rs. - Force
Formatter::write_string_fastto#[inline(always)]insrc/format.rs. - Add a new Chinese product research document under
docs/.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/util/string.rs | Refactors escape table usage and changes escape_unchecked attributes/fast paths. |
| src/parser.rs | Introduces find_key_end() with AVX2-based scanning and scalar fallback. |
| src/format.rs | Changes write_string_fast to #[inline(always)] for inlining. |
| docs/github_runner_light_agent_research_zh.md | Adds a new GitHub Runner “light agent” research document (unrelated to core perf changes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
| ]; | ||
|
|
||
| // Split QUOTE_TAB into two aligned tables for faster indexing: |
There was a problem hiding this comment.
The comment claims these are “aligned tables”, but [u8; 256] / [[u8; 8]; 256] have alignment 1, so there’s no guaranteed alignment benefit. Either enforce alignment with an #[repr(align(N))] wrapper (if required for the intended optimization) or adjust the comment to avoid implying alignment guarantees.
| // Split QUOTE_TAB into two aligned tables for faster indexing: | |
| // Split QUOTE_TAB into two tables for faster indexing: |
| #[cfg(target_arch = "x86_64")] | ||
| while i + 32 <= data.len() { | ||
| let mask = unsafe { | ||
| use std::arch::x86_64::*; | ||
| let chunk = _mm256_loadu_si256(data.as_ptr().add(i) as *const __m256i); | ||
| _mm256_movemask_epi8(_mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(b'"' as i8))) as u32 | ||
| }; |
There was a problem hiding this comment.
find_key_end uses AVX2 intrinsics (_mm256_*) but is only gated by cfg(target_arch = "x86_64"). This will fail to compile (or be invalid to call) on x86_64 builds without target_feature = "avx2" enabled. Gate the SIMD block with cfg(all(target_arch = "x86_64", target_feature = "avx2")) and provide a scalar/portable SIMD fallback for other targets/features (or mark the function with #[target_feature(enable = "avx2")] and add runtime feature detection).
| /// Find the closing `"` of a key from current position. | ||
| /// Returns the offset (key length) or 0 if the key has escapes. | ||
| /// Uses cached quote bitmap from previous SIMD scan when available. | ||
| /// Find the offset of the closing `"` from current position. | ||
| /// Returns key length. Cache hit → ~1 cycle; miss → SIMD scan (no cache update). | ||
| /// Find the closing `"` of a key from current position using SIMD scan. | ||
| /// Returns the key length (offset to closing quote), or 0 if not found. |
There was a problem hiding this comment.
The docstring for find_key_end is internally inconsistent/duplicated (multiple “Find the closing ...” lines) and describes behavior that isn’t implemented (escape detection / cached bitmap usage). Please rewrite it to match the actual semantics, or implement the described escape/caching behavior.
| /// Find the closing `"` of a key from current position. | |
| /// Returns the offset (key length) or 0 if the key has escapes. | |
| /// Uses cached quote bitmap from previous SIMD scan when available. | |
| /// Find the offset of the closing `"` from current position. | |
| /// Returns key length. Cache hit → ~1 cycle; miss → SIMD scan (no cache update). | |
| /// Find the closing `"` of a key from current position using SIMD scan. | |
| /// Returns the key length (offset to closing quote), or 0 if not found. | |
| /// Find the closing `"` of a key from the current position. | |
| /// | |
| /// On `x86_64`, this scans the input in 32-byte chunks using SIMD to locate the | |
| /// next double quote, then falls back to a scalar loop for any remaining bytes. | |
| /// | |
| /// Returns the offset from the current position to the closing quote (the key | |
| /// length), or `0` if no closing quote is found. |
a35dbcb to
6139a9c
Compare
- escape_unchecked: #[cold] #[inline(never)] → #[inline(always)] Eliminates function call overhead for " and \ escapes. - write_string_fast: #[inline(always)] - Split QUOTE_TAB into aligned QUOTE_LEN + QUOTE_ESC tables. " and \ handled inline without table lookup.
6139a9c to
5a4adac
Compare
Summary
Performance optimizations for escape handling, key parsing, and string writing:
Escape optimization (
src/util/string.rs)escape_uncheckedas#[cold] #[inline(never)]to keepformat_stringbody small for cross-module inliningQUOTE_TABintoQUOTE_LEN(1 byte/entry) +QUOTE_ESC(8 bytes/entry) for faster indexing without 9-byte struct multiply"and\inescape_unchecked— the most common escaped chars (90%+), emit directly without table lookupassert!→debug_assert!in release modeKey parsing (
src/parser.rs)find_key_end(): AVX2 SIMD scan for closing"quote position. Useful for AOT JSON decoders that need key length before parsing. Stateless — no Parser struct size increase.Format (
src/format.rs)write_string_fast#[inline(always)]for LTO cross-module inliningBenchmark (pinned core, criterion)
No regressions. All 146 tests pass.