Skip to content

Conversation

@Aaalibaba42
Copy link
Contributor

What does this PR do?

Migrate from the regex crate to regex-lite.

Motivation

The regex crate is very fast, but takes up lots of space in the binaries. regex-lite might introduce regression in performance, but we should see it's much more optimized in space.

How to test the change?

There is (I believe so) a benchmark on the size of the artifacts (and the performance). We can evaluate if this change is worth it based on those.

@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.96%. Comparing base (a7c8765) to head (c23f2e6).
⚠️ Report is 35 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
+ Coverage   71.65%   71.96%   +0.31%     
==========================================
  Files         354      368      +14     
  Lines       56063    58118    +2055     
==========================================
+ Hits        40172    41826    +1654     
- Misses      15891    16292     +401     
Components Coverage Δ
datadog-crashtracker 50.67% <ø> (+1.39%) ⬆️
datadog-crashtracker-ffi 5.86% <ø> (-0.07%) ⬇️
datadog-alloc 98.73% <ø> (ø)
data-pipeline 87.86% <ø> (-2.45%) ⬇️
data-pipeline-ffi 88.19% <ø> (ø)
ddcommon 84.31% <ø> (+0.01%) ⬆️
ddcommon-ffi 73.84% <ø> (ø)
ddtelemetry 59.98% <ø> (-0.04%) ⬇️
ddtelemetry-ffi 21.24% <ø> (ø)
dogstatsd-client 83.26% <ø> (ø)
datadog-ipc 82.39% <ø> (ø)
datadog-profiling 76.90% <ø> (ø)
datadog-profiling-ffi 62.12% <ø> (ø)
datadog-sidecar 36.92% <ø> (-0.17%) ⬇️
datdog-sidecar-ffi 12.12% <ø> (+0.75%) ⬆️
spawn-worker 55.18% <ø> (-0.17%) ⬇️
tinybytes 92.44% <ø> (+0.21%) ⬆️
datadog-trace-normalization 98.24% <ø> (+<0.01%) ⬆️
datadog-trace-obfuscation 94.17% <100.00%> (+<0.01%) ⬆️
datadog-trace-protobuf 59.65% <ø> (-17.45%) ⬇️
datadog-trace-utils 90.34% <ø> (+0.59%) ⬆️
datadog-tracer-flare 62.42% <ø> (+7.89%) ⬆️
datadog-log 75.57% <ø> (-0.75%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pr-commenter
Copy link

pr-commenter bot commented Sep 19, 2025

Benchmarks

Comparison

Benchmark execution time: 2025-10-27 23:16:12

Comparing candidate commit c23f2e6 in PR branch jwiriath/regex-to-regex-lite with baseline commit c4cdaa2 in branch main.

Found 1 performance improvements and 14 performance regressions! Performance is the same for 38 metrics, 2 unstable metrics.

scenario:concentrator/add_spans_to_concentrator

  • 🟩 execution_time [-2.314ms; -2.309ms] or [-21.615%; -21.572%]

scenario:credit_card/is_card_number/37828224631000521389798

  • 🟥 execution_time [+6.996µs; +7.050µs] or [+15.317%; +15.435%]
  • 🟥 throughput [-2930468.603op/s; -2905621.768op/s] or [-13.385%; -13.271%]

scenario:credit_card/is_card_number_no_luhn/ 378282246310005

  • 🟥 execution_time [+4.586µs; +4.663µs] or [+8.454%; +8.595%]
  • 🟥 throughput [-1459274.288op/s; -1436316.665op/s] or [-7.916%; -7.792%]

scenario:credit_card/is_card_number_no_luhn/378282246310005

  • 🟥 execution_time [+4.974µs; +5.038µs] or [+9.829%; +9.955%]
  • 🟥 throughput [-1790199.319op/s; -1767479.986op/s] or [-9.059%; -8.944%]

scenario:credit_card/is_card_number_no_luhn/37828224631000521389798

  • 🟥 execution_time [+6.970µs; +7.024µs] or [+15.248%; +15.367%]
  • 🟥 throughput [-2917134.991op/s; -2892169.101op/s] or [-13.334%; -13.219%]

scenario:ip_address/quantize_peer_ip_address_benchmark

  • 🟥 execution_time [+3.175µs; +3.187µs] or [+62.840%; +63.068%]

scenario:normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...

  • 🟥 execution_time [+37.767µs; +38.109µs] or [+7.599%; +7.668%]
  • 🟥 throughput [-143328.353op/s; -142058.081op/s] or [-7.123%; -7.060%]

scenario:normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters

  • 🟥 execution_time [+21.631µs; +21.696µs] or [+12.812%; +12.850%]
  • 🟥 throughput [-674548.827op/s; -672554.558op/s] or [-11.389%; -11.355%]

scenario:tags/replace_trace_tags

  • 🟥 execution_time [+11.218µs; +11.225µs] or [+468.980%; +469.301%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 3.897µs 3.915µs ± 0.003µs 3.915µs ± 0.002µs 3.916µs 3.919µs 3.920µs 3.923µs 0.22% -1.028 6.505 0.07% 0.000µs 1 200
credit_card/is_card_number/ throughput 254885115.148op/s 255455365.436op/s ± 189725.267op/s 255442091.278op/s ± 104009.253op/s 255548995.142op/s 255731224.550op/s 255793341.566op/s 256639840.186op/s 0.47% 1.044 6.609 0.07% 13415.602op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 76.853µs 78.226µs ± 0.750µs 78.136µs ± 0.456µs 78.595µs 79.625µs 80.340µs 81.215µs 3.94% 0.837 1.217 0.96% 0.053µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 12312947.805op/s 12784635.136op/s ± 121581.184op/s 12798176.262op/s ± 74250.071op/s 12865991.130op/s 12977583.107op/s 12999925.053op/s 13011836.603op/s 1.67% -0.766 1.026 0.95% 8597.088op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 70.652µs 72.391µs ± 0.713µs 72.319µs ± 0.500µs 72.867µs 73.571µs 74.169µs 74.590µs 3.14% 0.434 -0.061 0.98% 0.050µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 13406647.440op/s 13815114.339op/s ± 135533.879op/s 13827641.367op/s ± 95300.206op/s 13915666.606op/s 14010910.369op/s 14069742.388op/s 14153954.498op/s 2.36% -0.383 -0.122 0.98% 9583.692op/s 1 200
credit_card/is_card_number/37828224631 execution_time 3.895µs 3.914µs ± 0.003µs 3.914µs ± 0.002µs 3.916µs 3.918µs 3.920µs 3.923µs 0.22% -1.324 11.375 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 254938345.331op/s 255488200.849op/s ± 172507.044op/s 255494877.328op/s ± 100113.616op/s 255594378.225op/s 255696515.561op/s 255774810.627op/s 256707113.862op/s 0.47% 1.348 11.549 0.07% 12198.090op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 67.212µs 68.934µs ± 0.708µs 68.948µs ± 0.463µs 69.399µs 70.202µs 70.750µs 70.879µs 2.80% 0.228 -0.095 1.03% 0.050µs 1 200
credit_card/is_card_number/378282246310005 throughput 14108487.115op/s 14508120.431op/s ± 148802.795op/s 14503629.906op/s ± 97417.465op/s 14601147.453op/s 14747635.195op/s 14834354.823op/s 14878394.095op/s 2.58% -0.171 -0.132 1.02% 10521.947op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 52.497µs 52.698µs ± 0.081µs 52.694µs ± 0.051µs 52.750µs 52.839µs 52.871µs 52.902µs 0.39% 0.124 -0.293 0.15% 0.006µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 18902987.975op/s 18975927.849op/s ± 29051.149op/s 18977635.807op/s ± 18414.926op/s 18994562.333op/s 19026268.243op/s 19037913.433op/s 19048883.244op/s 0.38% -0.116 -0.294 0.15% 2054.226op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 6.428µs 6.440µs ± 0.010µs 6.439µs ± 0.003µs 6.442µs 6.455µs 6.476µs 6.513µs 1.15% 3.670 18.527 0.16% 0.001µs 1 200
credit_card/is_card_number/x371413321323331 throughput 153545462.549op/s 155276683.205op/s ± 240551.830op/s 155309516.676op/s ± 82245.821op/s 155391865.238op/s 155512624.577op/s 155561400.404op/s 155575753.929op/s 0.17% -3.638 18.223 0.15% 17009.583op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 3.898µs 3.915µs ± 0.003µs 3.916µs ± 0.002µs 3.917µs 3.920µs 3.925µs 3.934µs 0.47% 0.376 6.366 0.09% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 254211301.230op/s 255396813.443op/s ± 224817.252op/s 255394139.136op/s ± 128742.423op/s 255544388.083op/s 255702050.657op/s 255759487.700op/s 256547689.454op/s 0.45% -0.354 6.357 0.09% 15896.980op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 64.931µs 65.145µs ± 0.146µs 65.107µs ± 0.077µs 65.206µs 65.431µs 65.668µs 65.742µs 0.98% 1.619 3.116 0.22% 0.010µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 15210947.694op/s 15350371.573op/s ± 34349.805op/s 15359331.562op/s ± 18140.442op/s 15374002.562op/s 15386630.404op/s 15394332.286op/s 15400885.551op/s 0.27% -1.603 3.043 0.22% 2428.898op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 58.427µs 58.872µs ± 0.244µs 58.829µs ± 0.151µs 58.993µs 59.342µs 59.621µs 59.732µs 1.54% 0.974 0.983 0.41% 0.017µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 16741344.553op/s 16986236.262op/s ± 69996.854op/s 16998361.358op/s ± 43536.607op/s 17035294.169op/s 17077296.884op/s 17086918.920op/s 17115393.696op/s 0.69% -0.949 0.914 0.41% 4949.525op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 3.897µs 3.915µs ± 0.004µs 3.915µs ± 0.002µs 3.917µs 3.921µs 3.924µs 3.938µs 0.58% 0.935 10.274 0.09% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 253945101.306op/s 255404353.352op/s ± 232437.353op/s 255430499.444op/s ± 105495.267op/s 255515086.216op/s 255702725.417op/s 255772717.684op/s 256606279.929op/s 0.46% -0.904 10.198 0.09% 16435.803op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 55.267µs 55.612µs ± 0.168µs 55.571µs ± 0.080µs 55.680µs 55.917µs 56.113µs 56.568µs 1.79% 1.777 5.505 0.30% 0.012µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 17677934.549op/s 17982040.474op/s ± 53891.786op/s 17995021.677op/s ± 26067.503op/s 18016058.615op/s 18043185.734op/s 18055728.034op/s 18093849.177op/s 0.55% -1.738 5.259 0.30% 3810.725op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 52.474µs 52.705µs ± 0.088µs 52.703µs ± 0.058µs 52.758µs 52.858µs 52.896µs 52.992µs 0.55% 0.193 0.194 0.17% 0.006µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 18870924.762op/s 18973579.913op/s ± 31769.244op/s 18974102.249op/s ± 20900.545op/s 18995794.332op/s 19020175.272op/s 19051052.236op/s 19057043.448op/s 0.44% -0.182 0.188 0.17% 2246.425op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 6.427µs 6.436µs ± 0.005µs 6.436µs ± 0.003µs 6.438µs 6.445µs 6.449µs 6.462µs 0.40% 1.452 4.809 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 154745288.160op/s 155366333.421op/s ± 110057.542op/s 155370285.434op/s ± 61717.098op/s 155441048.095op/s 155506787.288op/s 155535744.610op/s 155583432.867op/s 0.14% -1.442 4.749 0.07% 7782.243op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [3.914µs; 3.915µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/ throughput [255429071.339op/s; 255481659.534op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [78.122µs; 78.330µs] or [-0.133%; +0.133%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [12767785.153op/s; 12801485.119op/s] or [-0.132%; +0.132%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [72.293µs; 72.490µs] or [-0.137%; +0.137%] None None None
credit_card/is_card_number/ 378282246310005 throughput [13796330.647op/s; 13833898.031op/s] or [-0.136%; +0.136%] None None None
credit_card/is_card_number/37828224631 execution_time [3.914µs; 3.914µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/37828224631 throughput [255464293.031op/s; 255512108.666op/s] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/378282246310005 execution_time [68.836µs; 69.032µs] or [-0.142%; +0.142%] None None None
credit_card/is_card_number/378282246310005 throughput [14487497.795op/s; 14528743.068op/s] or [-0.142%; +0.142%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [52.687µs; 52.710µs] or [-0.021%; +0.021%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [18971901.639op/s; 18979954.059op/s] or [-0.021%; +0.021%] None None None
credit_card/is_card_number/x371413321323331 execution_time [6.439µs; 6.442µs] or [-0.022%; +0.022%] None None None
credit_card/is_card_number/x371413321323331 throughput [155243345.035op/s; 155310021.375op/s] or [-0.021%; +0.021%] None None None
credit_card/is_card_number_no_luhn/ execution_time [3.915µs; 3.916µs] or [-0.012%; +0.012%] None None None
credit_card/is_card_number_no_luhn/ throughput [255365655.934op/s; 255427970.952op/s] or [-0.012%; +0.012%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [65.125µs; 65.166µs] or [-0.031%; +0.031%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [15345611.020op/s; 15355132.126op/s] or [-0.031%; +0.031%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [58.838µs; 58.906µs] or [-0.057%; +0.057%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [16976535.372op/s; 16995937.153op/s] or [-0.057%; +0.057%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [3.915µs; 3.916µs] or [-0.013%; +0.013%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [255372139.770op/s; 255436566.934op/s] or [-0.013%; +0.013%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [55.588µs; 55.635µs] or [-0.042%; +0.042%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [17974571.591op/s; 17989509.357op/s] or [-0.042%; +0.042%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [52.693µs; 52.717µs] or [-0.023%; +0.023%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [18969177.001op/s; 18977982.825op/s] or [-0.023%; +0.023%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [6.436µs; 6.437µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [155351080.504op/s; 155381586.338op/s] or [-0.010%; +0.010%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 8.376ms 8.394ms ± 0.011ms 8.393ms ± 0.008ms 8.400ms 8.410ms 8.421ms 8.454ms 0.73% 1.388 5.214 0.13% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [8.392ms; 8.395ms] or [-0.018%; +0.018%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 185.515µs 186.019µs ± 0.263µs 185.986µs ± 0.215µs 186.226µs 186.447µs 186.605µs 186.715µs 0.39% 0.235 -0.805 0.14% 0.019µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 5355766.138op/s 5375816.766op/s ± 7593.599op/s 5376738.308op/s ± 6201.857op/s 5381825.736op/s 5386724.139op/s 5389095.353op/s 5390385.896op/s 0.25% -0.230 -0.809 0.14% 536.949op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 17.930µs 18.060µs ± 0.060µs 18.060µs ± 0.034µs 18.094µs 18.133µs 18.163µs 18.467µs 2.26% 2.084 12.984 0.33% 0.004µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 54150992.078op/s 55370328.478op/s ± 183985.259op/s 55372402.678op/s ± 103614.534op/s 55468733.743op/s 55648357.415op/s 55727378.804op/s 55773624.700op/s 0.72% -1.979 12.232 0.33% 13009.722op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 10.371µs 10.478µs ± 0.049µs 10.473µs ± 0.034µs 10.515µs 10.561µs 10.582µs 10.595µs 1.17% 0.186 -0.585 0.46% 0.003µs 1 200
normalization/normalize_name/normalize_name/good throughput 94383327.437op/s 95442283.442op/s ± 442761.156op/s 95484390.183op/s ± 314079.388op/s 95766446.885op/s 96110193.051op/s 96333652.281op/s 96424491.526op/s 0.98% -0.167 -0.593 0.46% 31307.942op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [185.982µs; 186.055µs] or [-0.020%; +0.020%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [5374764.366op/s; 5376869.166op/s] or [-0.020%; +0.020%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [18.052µs; 18.069µs] or [-0.046%; +0.046%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [55344829.891op/s; 55395827.066op/s] or [-0.046%; +0.046%] None None None
normalization/normalize_name/normalize_name/good execution_time [10.471µs; 10.485µs] or [-0.064%; +0.064%] None None None
normalization/normalize_name/normalize_name/good throughput [95380921.004op/s; 95503645.880op/s] or [-0.064%; +0.064%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching serializing traces from their internal representation to msgpack execution_time 15.015ms 15.066ms ± 0.030ms 15.059ms ± 0.011ms 15.073ms 15.121ms 15.176ms 15.264ms 1.36% 2.894 12.156 0.20% 0.002ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching serializing traces from their internal representation to msgpack execution_time [15.062ms; 15.070ms] or [-0.028%; +0.028%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 533.879µs 534.943µs ± 0.928µs 534.758µs ± 0.341µs 535.102µs 536.720µs 539.166µs 540.104µs 1.00% 2.748 10.302 0.17% 0.066µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1851494.864op/s 1869362.608op/s ± 3229.055op/s 1870006.396op/s ± 1191.207op/s 1871161.917op/s 1872478.890op/s 1873076.830op/s 1873084.427op/s 0.16% -2.723 10.129 0.17% 228.329op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 380.168µs 380.812µs ± 0.292µs 380.809µs ± 0.215µs 381.032µs 381.322µs 381.525µs 381.648µs 0.22% 0.323 -0.206 0.08% 0.021µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2620216.084op/s 2625966.568op/s ± 2014.234op/s 2625985.808op/s ± 1482.287op/s 2627422.119op/s 2628814.842op/s 2629786.014op/s 2630416.628op/s 0.17% -0.319 -0.210 0.08% 142.428op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 190.121µs 190.498µs ± 0.179µs 190.488µs ± 0.118µs 190.602µs 190.813µs 190.946µs 191.063µs 0.30% 0.490 0.116 0.09% 0.013µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5233870.118op/s 5249398.235op/s ± 4923.930op/s 5249684.767op/s ± 3240.031op/s 5253029.333op/s 5256328.427op/s 5258464.501op/s 5259795.459op/s 0.19% -0.485 0.108 0.09% 348.174op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 36.833µs 37.015µs ± 0.118µs 36.998µs ± 0.100µs 37.120µs 37.212µs 37.279µs 37.287µs 0.78% 0.307 -1.166 0.32% 0.008µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 26819167.817op/s 27016443.599op/s ± 86053.559op/s 27028268.924op/s ± 73401.879op/s 27095940.749op/s 27123960.258op/s 27141486.086op/s 27149853.135op/s 0.45% -0.300 -1.176 0.32% 6084.905op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 45.980µs 46.115µs ± 0.119µs 46.105µs ± 0.041µs 46.141µs 46.233µs 46.274µs 47.550µs 3.13% 8.923 104.994 0.26% 0.008µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 21030558.102op/s 21685209.321op/s ± 54599.360op/s 21689545.734op/s ± 19483.284op/s 21710554.049op/s 21730913.368op/s 21741206.886op/s 21748770.288op/s 0.27% -8.713 101.598 0.25% 3860.758op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [534.815µs; 535.072µs] or [-0.024%; +0.024%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1868915.092op/s; 1869810.125op/s] or [-0.024%; +0.024%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [380.772µs; 380.853µs] or [-0.011%; +0.011%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2625687.415op/s; 2626245.722op/s] or [-0.011%; +0.011%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [190.473µs; 190.523µs] or [-0.013%; +0.013%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5248715.826op/s; 5250080.644op/s] or [-0.013%; +0.013%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [36.999µs; 37.031µs] or [-0.044%; +0.044%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [27004517.403op/s; 27028369.795op/s] or [-0.044%; +0.044%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [46.098µs; 46.131µs] or [-0.036%; +0.036%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [21677642.375op/s; 21692776.267op/s] or [-0.035%; +0.035%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
receiver_entry_point/report/2597 execution_time 6.290ms 6.571ms ± 0.097ms 6.601ms ± 0.021ms 6.622ms 6.669ms 6.686ms 6.730ms 1.96% -1.479 0.988 1.47% 0.007ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
receiver_entry_point/report/2597 execution_time [6.557ms; 6.584ms] or [-0.204%; +0.204%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 17.752µs 25.827µs ± 10.231µs 18.052µs ± 0.181µs 35.592µs 45.278µs 55.631µs 67.804µs 275.61% 1.061 0.747 39.52% 0.723µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [24.409µs; 27.245µs] or [-5.490%; +5.490%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
ip_address/quantize_peer_ip_address_benchmark execution_time 8.182µs 8.234µs ± 0.023µs 8.239µs ± 0.014µs 8.250µs 8.263µs 8.269µs 8.273µs 0.42% -0.685 -0.346 0.27% 0.002µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark execution_time [8.231µs; 8.237µs] or [-0.038%; +0.038%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 241.376ns 251.362ns ± 12.132ns 245.473ns ± 2.426ns 254.884ns 281.403ns 284.902ns 286.787ns 16.83% 1.571 1.180 4.81% 0.858ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [249.681ns; 253.044ns] or [-0.669%; +0.669%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 13.550µs 13.614µs ± 0.017µs 13.617µs ± 0.011µs 13.625µs 13.636µs 13.642µs 13.694µs 0.56% -0.086 1.939 0.13% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [13.611µs; 13.616µs] or [-0.017%; +0.017%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 161.654µs 162.167µs ± 0.244µs 162.143µs ± 0.124µs 162.258µs 162.637µs 162.865µs 163.143µs 0.62% 1.023 1.889 0.15% 0.017µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [162.133µs; 162.201µs] or [-0.021%; +0.021%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.212µs 3.246µs ± 1.483µs 2.995µs ± 0.028µs 3.029µs 3.651µs 14.524µs 15.514µs 417.95% 7.298 54.628 45.58% 0.105µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [3.040µs; 3.451µs] or [-6.333%; +6.333%] None None None

Group 13

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 84.670µs 85.086µs ± 0.156µs 85.075µs ± 0.072µs 85.148µs 85.236µs 85.482µs 86.541µs 1.72% 4.476 38.399 0.18% 0.011µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [85.065µs; 85.108µs] or [-0.025%; +0.025%] None None None

Group 14

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 62.330ms 63.082ms ± 2.415ms 62.783ms ± 0.155ms 62.926ms 63.205ms 82.177ms 83.004ms 32.21% 7.846 60.318 3.82% 0.171ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [62.748ms; 63.417ms] or [-0.531%; +0.531%] None None None

Group 15

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz c23f2e6 1761606161 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 32.972µs 33.451µs ± 0.864µs 33.056µs ± 0.036µs 33.145µs 35.264µs 35.300µs 36.996µs 11.92% 1.811 1.700 2.58% 0.061µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [33.332µs; 33.571µs] or [-0.358%; +0.358%] None None None

Baseline

Omitted due to size.

@Aaalibaba42 Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch from 87e6cfa to b9ee66d Compare September 22, 2025 08:21
@Aaalibaba42
Copy link
Contributor Author

tools:

  • Regex used to find typedef/defines at the beginning of c files (Like #define MY_VAR 35) -> Not trivial but doable without regex without being too expensive I think

datadog-trace-obfuscation:

  • ip_address.rs: Just segmentation of protocol vs address, maybe we could do without it but harder to optimize better than regex crates
  • replacer.rs: Used to run the clients' regex for obfuscation, we can't do without

ddcommon:

  • azure_app_services.rs: parsing and getting the "resource group" of azure. Given the Regex pattern we should be able to it without regexes relatively quickly
  • entitiy_id/unix/mod.rs: Only used for testing in this file, I don't even think that it would be in the release crate with the configuration
  • entitiy_id/unix/container_id.rs: Used to match cgroup to identify running container id, would be a bit harder to do without

datadog-live-debugger:

  • expr_eval.rs: I don't have the full context of this, but when condition is checked for strings, it can be in the form of a regex match. So we couldn't do without them if the pattern is not known before-hand.
  • redacted_names.rs: Most of the file uses regex_automata crate, only one time the regex crate to escape regular expression meta characters. Don't know whether we could do without

data-pipeline:

  • CAN'T MIGRATE: testing done with httpmock implements traits from the regex crate that are not the same as the regex-lite crate (I think, I did not 100% investigate but compilation message lead me to believe this)
  • src/telemetry/mod.rs: Just used for testing, Same as above, I don't even believe this would be present in the final release binary

@Aaalibaba42 Aaalibaba42 marked this pull request as ready for review September 22, 2025 12:11
@Aaalibaba42 Aaalibaba42 requested review from a team as code owners September 22, 2025 12:11
@Aaalibaba42
Copy link
Contributor Author

https://gitlab.ddbuild.io/DataDog/apm-reliability/libddprof-build/-/jobs/1140399351 Job for size benchmark failed, so I'm pasting the results here:

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 70.98 MB 67.98 MB --4.22% (-2.99 MB) 💪
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.19 MB 6.69 MB --6.94% (-511.95 KB) 💪
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 9.25 MB 8.65 MB --6.49% (-615.48 KB) 💪
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 83.24 MB 79.85 MB --4.07% (-3.39 MB) 💪
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 18.39 MB 16.83 MB --8.46% (-1.55 MB) 💪
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 65.01 KB 65.01 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 124.93 MB 120.41 MB --3.61% (-4.51 MB) 💪
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 653.09 MB 641.35 MB --1.79% (-11.73 MB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 5.89 MB 5.36 MB --8.88% (-536.00 KB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 65.01 KB 65.01 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 17.36 MB 16.24 MB --6.43% (-1.11 MB) 💪
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 32.22 MB 30.14 MB --6.44% (-2.07 MB) 💪
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 15.67 MB 14.73 MB --6.05% (-971.50 KB) 💪
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 66.01 KB 66.01 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 127.26 MB 124.42 MB --2.22% (-2.83 MB) 💪
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 643.25 MB 634.37 MB --1.38% (-8.87 MB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 4.49 MB 4.10 MB --8.80% (-405.50 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 66.01 KB 66.01 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 18.50 MB 17.62 MB --4.77% (-904.00 KB) 💪
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 30.26 MB 28.72 MB --5.08% (-1.53 MB) 💪
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 63.63 MB 60.13 MB --5.49% (-3.49 MB) 💪
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.50 MB 7.93 MB --6.79% (-591.96 KB) 💪
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 78.03 MB 74.42 MB --4.62% (-3.61 MB) 💪
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 9.84 MB 9.21 MB --6.46% (-652.00 KB) 💪

Copy link
Contributor

@paullegranddc paullegranddc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One not about specifying the dependency but otherwise LGTM


[dependencies]
regex = "1"
regex-lite = "^0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"^0.1" is equivalent to "0.1" so you don't really need to add it
https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#caret-requirements

Also you should probably add it as a workspace dependencies, with a minimum constraint to the highest minor available ("0.1.7" right now)

Copy link
Contributor Author

@Aaalibaba42 Aaalibaba42 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about instances where it is likely using a whole regex engine (even the lite one) is superfluous as described in this comment: #1232 (comment)

Is it worth exploring ? Would it be a separate PR ?

@Aaalibaba42
Copy link
Contributor Author

Aaalibaba42 commented Sep 23, 2025

There is also the potential pitfall of Unicode support: One of the corners cut by regex-lite to be smol was to sacrifice a little correctness, notably around Unicode support. In the instances where this PR changes regex to regex-lite:

  • are there instances where unicode could be used ?
  • and if so would the corners cut by regex-lite lead to bugs ?

@Aaalibaba42 Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch 2 times, most recently from aee876a to 8d550b8 Compare October 27, 2025 18:56
@Aaalibaba42 Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch from 8d550b8 to 42aaebb Compare October 27, 2025 18:59
@Aaalibaba42
Copy link
Contributor Author

Please check out this subproject (especially the readme) for information about the change

@Aaalibaba42 Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch from 71fad1e to 6447fee Compare October 27, 2025 23:00
@Aaalibaba42 Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch from 6447fee to c23f2e6 Compare October 27, 2025 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants