markers: replace regexp with scanning in StripMarkers and EscapeMarkers by dhartunian · Pull Request #37 · cockroachdb/redact

dhartunian · 2026-03-13T14:33:22Z

Replace the regexp-based implementation of StripMarkers() and
EscapeMarkers() with manual byte scanning using a shared
stripMarkersBytes() function.

The new implementation exploits the fact that all three marker
characters (‹ U+2039, › U+203A, † U+2020) are 3-byte UTF-8 sequences
sharing the prefix 0xE2 0x80. It scans for 0xE2 bytes and checks the
subsequent two bytes to identify markers, then either removes them or
replaces them with the escape character.

Also removes the now-unused ReStripMarkers regexp and its regexp import.

Benchmark results (Apple M1 Pro):

name                           old time/op    new time/op    delta
StripMarkers_String-10           738ns ± 2%     105ns ± 1%  -85.83%  (p=0.000 n=8)
StripMarkers_Bytes-10            733ns ± 4%      58ns ± 1%  -92.10%  (p=0.000 n=8)
StripMarkers_NoMarkers-10        660ns ± 1%      23ns ± 0%  -96.53%  (p=0.000 n=8)
EscapeMarkers-10                 377ns ± 2%      45ns ±18%  -87.99%  (p=0.000 n=8)

name                           old alloc/op   new alloc/op   delta
StripMarkers_String-10           185B ± 1%     144B ± 0%  -22.16%  (p=0.000 n=8)
StripMarkers_Bytes-10            136B ± 0%      48B ± 0%  -64.71%  (p=0.000 n=8)
StripMarkers_NoMarkers-10        144B ± 0%       0B ± 0%  -100.00% (p=0.000 n=8)
EscapeMarkers-10                40.0B ± 0%    24.0B ± 0%  -40.00%  (p=0.000 n=8)

name                           old allocs/op  new allocs/op  delta
StripMarkers_String-10           6.00 ± 0%     3.00 ± 0%  -50.00%  (p=0.000 n=8)
StripMarkers_Bytes-10            5.00 ± 0%     1.00 ± 0%  -80.00%  (p=0.000 n=8)
StripMarkers_NoMarkers-10        3.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=8)
EscapeMarkers-10                 3.00 ± 0%     1.00 ± 0%  -66.67%  (p=0.000 n=8)

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

This change is

Add exhaustive table-driven tests for StripMarkers (both string and byte variants) and EscapeMarkers covering edge cases including empty inputs, unicode content, consecutive markers, hash prefix markers, and boundary conditions. Add benchmarks to establish a performance baseline before optimizing these functions. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

visheshbardia · 2026-03-13T15:56:19Z

The performance difference is great.
LGTM! just a small nit.
Thanks.

internal/markers/markers.go

Replace the regexp-based implementation of StripMarkers() and EscapeMarkers() with manual byte scanning using a shared stripMarkersBytes() function. The new implementation exploits the fact that all three marker characters (‹ U+2039, › U+203A, † U+2020) are 3-byte UTF-8 sequences sharing the prefix 0xE2 0x80. It scans for 0xE2 bytes and checks the subsequent two bytes to identify markers, then either removes them or replaces them with the escape character. Also removes the now-unused ReStripMarkers regexp and its regexp import. Benchmark results (Apple M1 Pro): name old time/op new time/op delta StripMarkers_String-10 738ns ± 2% 105ns ± 1% -85.83% (p=0.000 n=8) StripMarkers_Bytes-10 733ns ± 4% 58ns ± 1% -92.10% (p=0.000 n=8) StripMarkers_NoMarkers-10 660ns ± 1% 23ns ± 0% -96.53% (p=0.000 n=8) EscapeMarkers-10 377ns ± 2% 45ns ±18% -87.99% (p=0.000 n=8) name old alloc/op new alloc/op delta StripMarkers_String-10 185B ± 1% 144B ± 0% -22.16% (p=0.000 n=8) StripMarkers_Bytes-10 136B ± 0% 48B ± 0% -64.71% (p=0.000 n=8) StripMarkers_NoMarkers-10 144B ± 0% 0B ± 0% -100.00% (p=0.000 n=8) EscapeMarkers-10 40.0B ± 0% 24.0B ± 0% -40.00% (p=0.000 n=8) name old allocs/op new allocs/op delta StripMarkers_String-10 6.00 ± 0% 3.00 ± 0% -50.00% (p=0.000 n=8) StripMarkers_Bytes-10 5.00 ± 0% 1.00 ± 0% -80.00% (p=0.000 n=8) StripMarkers_NoMarkers-10 3.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=8) EscapeMarkers-10 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=8) Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

visheshbardia

@visheshbardia made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion.

dhartunian requested a review from visheshbardia March 13, 2026 14:33

visheshbardia reviewed Mar 13, 2026

View reviewed changes

internal/markers/markers.go Show resolved Hide resolved

dhartunian force-pushed the davidh/strip-markers-perf branch from cfc9aed to ccab926 Compare March 13, 2026 16:45

dhartunian requested a review from visheshbardia March 13, 2026 16:45

visheshbardia approved these changes Mar 13, 2026

View reviewed changes

dhartunian merged commit d01e850 into master Mar 13, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

markers: replace regexp with scanning in StripMarkers and EscapeMarkers #37

markers: replace regexp with scanning in StripMarkers and EscapeMarkers #37
dhartunian merged 2 commits intomasterfrom
davidh/strip-markers-perf

dhartunian commented Mar 13, 2026 •

edited by cockroach-dev-inf

Loading

Uh oh!

visheshbardia commented Mar 13, 2026

Uh oh!

Uh oh!

visheshbardia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhartunian commented Mar 13, 2026 • edited by cockroach-dev-inf Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

visheshbardia commented Mar 13, 2026

Uh oh!

Uh oh!

visheshbardia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhartunian commented Mar 13, 2026 •

edited by cockroach-dev-inf

Loading