Skip to content

opt: escape cold path + SIMD find_key_end + write_string_fast inline#223

Merged
liuq19 merged 1 commit intocloudwego:mainfrom
liuq19:opt/rog-perf-improvements
Mar 31, 2026
Merged

opt: escape cold path + SIMD find_key_end + write_string_fast inline#223
liuq19 merged 1 commit intocloudwego:mainfrom
liuq19:opt/rog-perf-improvements

Conversation

@liuq19
Copy link
Copy Markdown
Collaborator

@liuq19 liuq19 commented Mar 30, 2026

Summary

Performance optimizations for escape handling, key parsing, and string writing:

Escape optimization (src/util/string.rs)

  • Mark escape_unchecked as #[cold] #[inline(never)] to keep format_string body small for cross-module inlining
  • Split QUOTE_TAB into QUOTE_LEN (1 byte/entry) + QUOTE_ESC (8 bytes/entry) for faster indexing without 9-byte struct multiply
  • Fast path for " and \ in escape_unchecked — the most common escaped chars (90%+), emit directly without table lookup
  • assert!debug_assert! in release mode

Key parsing (src/parser.rs)

  • Add find_key_end(): AVX2 SIMD scan for closing " quote position. Useful for AOT JSON decoders that need key length before parsing. Stateless — no Parser struct size increase.

Format (src/format.rs)

  • write_string_fast #[inline(always)] for LTO cross-module inlining

Benchmark (pinned core, criterion)

Benchmark Change
twitter/serde_json::to_string -16.7%
citm_catalog/simd_json::from_slice -18.5%
citm_catalog/serde_json::from_slice -7.1%
twitter/simd_json::from_slice -4.5%
canada/sonic_rs::to_string -2.0%
canada/serde_json::to_string -2.1%
citm_catalog/sonic_rs::from_slice_unchecked -1.8%
twitter/sonic_rs::from_slice_unchecked +1.0% (neutral)
canada/sonic_rs::from_slice +0.3% (neutral)

No regressions. All 146 tests pass.

Copilot AI review requested due to automatic review settings March 30, 2026 13:57
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.64%. Comparing base (e339451) to head (5a4adac).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
+ Coverage   70.60%   70.64%   +0.03%     
==========================================
  Files          42       42              
  Lines        9550     9556       +6     
==========================================
+ Hits         6743     6751       +8     
+ Misses       2807     2805       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liuq19 liuq19 force-pushed the opt/rog-perf-improvements branch from f198950 to 71cd48b Compare March 30, 2026 14:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1989505af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/parser.rs Outdated
Comment on lines +1280 to +1285
#[cfg(target_arch = "x86_64")]
while i + 32 <= data.len() {
let mask = unsafe {
use std::arch::x86_64::*;
let chunk = _mm256_loadu_si256(data.as_ptr().add(i) as *const __m256i);
_mm256_movemask_epi8(_mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(b'"' as i8))) as u32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate AVX2 key scan behind CPU feature checks

find_key_end enables the SIMD path on every x86_64 build, but the implementation uses AVX2 intrinsics directly. On x86_64 machines that do not support AVX2, calling this method will raise an illegal-instruction fault at runtime. The rest of the codebase feature-gates AVX2 paths (e.g., in util/arch), so this new API should either be guarded with target_feature = "avx2"/is_x86_feature_detected!("avx2") and a scalar fallback, or moved behind an AVX2-targeted function.

Useful? React with 👍 / 👎.

Comment thread src/parser.rs Outdated
Comment on lines +1287 to +1289
if mask != 0 {
return i + mask.trailing_zeros() as usize - cur_idx;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle escaped quotes when locating key terminator

This returns on the first " byte it sees, but JSON object keys may contain escaped quotes (\"). In that case the function reports a shorter key length instead of continuing (or returning 0 as documented), so callers using this offset can slice the key at the wrong position and mis-parse escaped-key inputs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Performance-focused changes aimed at reducing overhead in JSON string escaping, speeding up key parsing, and improving cross-module inlining opportunities.

Changes:

  • Adjust escape_unchecked codegen/profile and escape-table access patterns in src/util/string.rs.
  • Add a new Parser::find_key_end() SIMD-based quote search helper in src/parser.rs.
  • Force Formatter::write_string_fast to #[inline(always)] in src/format.rs.
  • Add a new Chinese product research document under docs/.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/util/string.rs Refactors escape table usage and changes escape_unchecked attributes/fast paths.
src/parser.rs Introduces find_key_end() with AVX2-based scanning and scalar fallback.
src/format.rs Changes write_string_fast to #[inline(always)] for inlining.
docs/github_runner_light_agent_research_zh.md Adds a new GitHub Runner “light agent” research document (unrelated to core perf changes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/util/string.rs
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
];

// Split QUOTE_TAB into two aligned tables for faster indexing:
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims these are “aligned tables”, but [u8; 256] / [[u8; 8]; 256] have alignment 1, so there’s no guaranteed alignment benefit. Either enforce alignment with an #[repr(align(N))] wrapper (if required for the intended optimization) or adjust the comment to avoid implying alignment guarantees.

Suggested change
// Split QUOTE_TAB into two aligned tables for faster indexing:
// Split QUOTE_TAB into two tables for faster indexing:

Copilot uses AI. Check for mistakes.
Comment thread src/parser.rs Outdated
Comment on lines +1280 to +1286
#[cfg(target_arch = "x86_64")]
while i + 32 <= data.len() {
let mask = unsafe {
use std::arch::x86_64::*;
let chunk = _mm256_loadu_si256(data.as_ptr().add(i) as *const __m256i);
_mm256_movemask_epi8(_mm256_cmpeq_epi8(chunk, _mm256_set1_epi8(b'"' as i8))) as u32
};
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_key_end uses AVX2 intrinsics (_mm256_*) but is only gated by cfg(target_arch = "x86_64"). This will fail to compile (or be invalid to call) on x86_64 builds without target_feature = "avx2" enabled. Gate the SIMD block with cfg(all(target_arch = "x86_64", target_feature = "avx2")) and provide a scalar/portable SIMD fallback for other targets/features (or mark the function with #[target_feature(enable = "avx2")] and add runtime feature detection).

Copilot uses AI. Check for mistakes.
Comment thread src/parser.rs Outdated
Comment on lines +1268 to +1274
/// Find the closing `"` of a key from current position.
/// Returns the offset (key length) or 0 if the key has escapes.
/// Uses cached quote bitmap from previous SIMD scan when available.
/// Find the offset of the closing `"` from current position.
/// Returns key length. Cache hit → ~1 cycle; miss → SIMD scan (no cache update).
/// Find the closing `"` of a key from current position using SIMD scan.
/// Returns the key length (offset to closing quote), or 0 if not found.
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for find_key_end is internally inconsistent/duplicated (multiple “Find the closing ...” lines) and describes behavior that isn’t implemented (escape detection / cached bitmap usage). Please rewrite it to match the actual semantics, or implement the described escape/caching behavior.

Suggested change
/// Find the closing `"` of a key from current position.
/// Returns the offset (key length) or 0 if the key has escapes.
/// Uses cached quote bitmap from previous SIMD scan when available.
/// Find the offset of the closing `"` from current position.
/// Returns key length. Cache hit → ~1 cycle; miss → SIMD scan (no cache update).
/// Find the closing `"` of a key from current position using SIMD scan.
/// Returns the key length (offset to closing quote), or 0 if not found.
/// Find the closing `"` of a key from the current position.
///
/// On `x86_64`, this scans the input in 32-byte chunks using SIMD to locate the
/// next double quote, then falls back to a scalar loop for any remaining bytes.
///
/// Returns the offset from the current position to the closing quote (the key
/// length), or `0` if no closing quote is found.

Copilot uses AI. Check for mistakes.
Comment thread src/parser.rs Outdated
Comment thread src/parser.rs Outdated
@liuq19 liuq19 force-pushed the opt/rog-perf-improvements branch 2 times, most recently from a35dbcb to 6139a9c Compare March 31, 2026 03:33
- escape_unchecked: #[cold] #[inline(never)] → #[inline(always)]
  Eliminates function call overhead for " and \ escapes.

- write_string_fast: #[inline(always)]

- Split QUOTE_TAB into aligned QUOTE_LEN + QUOTE_ESC tables.
  " and \ handled inline without table lookup.
@liuq19 liuq19 enabled auto-merge (rebase) March 31, 2026 12:15
@liuq19 liuq19 disabled auto-merge March 31, 2026 12:15
@liuq19 liuq19 force-pushed the opt/rog-perf-improvements branch from 6139a9c to 5a4adac Compare March 31, 2026 12:29
@liuq19 liuq19 merged commit 6e707d8 into cloudwego:main Mar 31, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants