Skip to content

perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead#344

Merged
erraggy merged 1 commit intomainfrom
perf/all-optimizations
Feb 22, 2026
Merged

perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead#344
erraggy merged 1 commit intomainfrom
perf/all-optimizations

Conversation

@erraggy
Copy link
Copy Markdown
Owner

@erraggy erraggy commented Feb 22, 2026

Summary

  • Replace fmt.Sprintf with string concatenation for JSON path construction across validator, converter, and fixer
  • Replace regexp.MatchString/ReplaceAllString with strings.HasPrefix + string slicing in converter ref rewriting
  • Replace fmt.Sprintf with strconv.Itoa/strconv.FormatFloat for numeric formatting in validator and schemautil
  • Pre-allocate maps with capacity hints in validator ref building
  • Thread schema nesting depth as explicit counter instead of counting dots in path strings
  • Refactor fixer ref collection from slice-return to pointer-accumulator pattern
  • Use clear() to reuse map allocations in SchemaHasher

Benchmark Results

benchstat, 10 iterations, Apple M4:

Validator (primary target — hot path optimization)

Benchmark Time Memory Allocs
ValidateParsed/Small -24.0% -17.7% -55.4%
ValidateParsed/Medium -28.4% -22.7% -53.6%
ValidateParsed/Large -29.4% -23.5% -56.9%

SchemaUtil (hasher)

Benchmark Time Memory Allocs
Hash/Simple -22.9% -82.8% -28.6%
Hash/ComplexObject -13.7% -40.4% -9.5%
GroupByHash/1000 -13.7% -56.2% -14.3%

Converter

Benchmark Time Memory Allocs
Parsed/OAS2→OAS3/Medium -9.0% -1.6% -10.3%
Parsed/OAS3→OAS2/Medium -5.8% -0.7% -2.5%

Fixer

Allocation pattern improved (pointer-accumulator), but no measurable time/memory impact — ref collection is a small fraction of overall fix cost.

Optimization Techniques

Technique Where Impact
fmt.Sprintf → string concat validator, converter Eliminates reflect-based formatting in hot loops
regexpstrings.HasPrefix converter/ref_rewrite Removes regex engine overhead for simple prefix matching
fmt.Sprintfstrconv schemautil/hash, validator Avoids %v/%d format parsing for known types
Map pre-allocation validator/refs Reduces rehashing for known-size component maps
clear(map) vs make(map) schemautil/hash Reuses backing array instead of reallocating
Explicit depth counter validator/schema Removes strings.Count(path, ".") per recursion level
*[]string accumulator fixer/prune Avoids intermediate slice allocations and append-spread

Test plan

  • make check passes (8498 tests)
  • All benchmarks run with 10 iterations for statistical significance
  • All p=0.000 confirming results are not noise
  • Code review: no bugs, no semantic changes (3 independent reviewers)
  • Schema depth guard reviewed: more permissive but semantically correct

🤖 Generated with Claude Code

…iminate regex overhead

Replace fmt.Sprintf with string concatenation for JSON path construction,
replace regexp with strings.HasPrefix for ref prefix matching, replace
fmt.Sprintf with strconv for numeric formatting, pre-allocate maps with
capacity hints, thread schema depth as explicit counter, refactor fixer
ref collection to pointer-accumulator pattern, and use map clear() to
reuse allocations in SchemaHasher.

Benchmark results (benchstat, 10 iterations, Apple M4):

  validator/ValidateParsed/Large:  -29% time, -23% memory, -57% allocs
  validator/ValidateParsed/Medium: -28% time, -23% memory, -54% allocs
  validator/ValidateParsed/Small:  -24% time, -18% memory, -55% allocs
  schemautil/Hash/Simple:          -23% time, -83% memory, -29% allocs
  schemautil/GroupByHash/1000:     -14% time, -56% memory, -14% allocs
  converter/Parsed/OAS2→3/Medium:   -9% time
  converter/Parsed/OAS3→2/Medium:   -6% time

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

The pull request refactors the codebase to optimize string construction and simplify control flow. Changes include replacing fmt.Sprintf calls with string concatenation and strconv functions for numeric formatting, refactoring internal functions to use pointer-based slice mutations instead of returns, simplifying regex-based ref rewriting with prefix matching, replacing regex-based parameter extraction with manual parsing, and adding depth tracking parameters to validation methods.

Changes

Cohort / File(s) Summary
Ref Rewriting & Parameter Parsing Simplification
converter/ref_rewrite.go, validator/path_validation.go
Replaced regex-based approaches with simpler direct matching: introduced refMapping struct and prefix-based slices for ref rewriting; replaced regex parameter extraction with manual loop-based parser scanning for { and } pairs.
String Construction Optimization
validator/helpers.go, validator/oas2.go, validator/oas3.go, validator/refs.go, validator/schema.go
Systematically replaced fmt.Sprintf calls with direct string concatenation and strconv.Itoa for numeric index formatting across error path construction and validation helpers.
Control Flow Refactoring
fixer/prune.go
Refactored internal ref collection functions to use pointer-based slice mutations (refs *[]string) instead of returning new slices, propagating changes through recursive call chains.
Hash Construction & Formatting
internal/schemautil/hash.go
Replaced fmt-based numeric formatting with strconv.FormatFloat and strconv.Itoa for constraint values; replaced hash visited map reallocation with non-allocating clear().
Validation Depth Tracking
validator/schema.go
Added depth int parameter to validateSchema, validateSchemaWithVisited, and validateNestedSchemas methods to track nesting depth, replacing path-based depth calculations.

Possibly Related PRs

  • PR #300 — Directly related to the ref rewriting changes in converter/ref_rewrite.go, modifying the same refMapping-based ref rewriting functionality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead' directly and specifically summarizes the main changes—replacing fmt.Sprintf/regexp with string concatenation/HasPrefix to optimize allocations and remove regex overhead.
Description check ✅ Passed The description is directly related to the changeset, providing a detailed summary of optimizations, benchmark results, techniques used, and test plan—all covering the actual changes made across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/all-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 22, 2026

Codecov Report

❌ Patch coverage is 72.77487% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.78%. Comparing base (eecc3ce) to head (4644bf7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
validator/refs.go 62.71% 22 Missing ⚠️
internal/schemautil/hash.go 25.00% 9 Missing ⚠️
fixer/prune.go 82.50% 7 Missing ⚠️
validator/schema.go 64.28% 5 Missing ⚠️
validator/oas2.go 75.00% 4 Missing ⚠️
validator/oas3.go 88.88% 3 Missing ⚠️
validator/helpers.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #344      +/-   ##
==========================================
+ Coverage   84.72%   84.78%   +0.05%     
==========================================
  Files         193      193              
  Lines       27273    27249      -24     
==========================================
- Hits        23107    23102       -5     
+ Misses       2847     2828      -19     
  Partials     1319     1319              
Files with missing lines Coverage Δ
converter/ref_rewrite.go 58.24% <100.00%> (ø)
validator/path_validation.go 92.98% <100.00%> (+1.14%) ⬆️
validator/helpers.go 61.53% <50.00%> (ø)
validator/oas3.go 86.09% <88.88%> (+0.15%) ⬆️
validator/oas2.go 82.79% <75.00%> (-0.08%) ⬇️
validator/schema.go 63.01% <64.28%> (+1.57%) ⬆️
fixer/prune.go 75.00% <82.50%> (-0.45%) ⬇️
internal/schemautil/hash.go 46.87% <25.00%> (ø)
validator/refs.go 58.99% <62.71%> (+2.18%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
validator/refs.go (1)

47-92: ⚠️ Potential issue | 🟠 Major

Guard map capacity sums against int overflow.
The summed len(...) values can overflow int and produce a negative capacity, which will panic in make(map, ...) for extreme inputs. Please sum in uint64 and clamp to math.MaxInt (or drop the capacity hint).

🛡️ Proposed fix with overflow guard
@@
-import (
-	"fmt"
-	"strconv"
-	"strings"
+import (
+	"fmt"
+	"math"
+	"strconv"
+	"strings"
@@
 func buildOAS2ValidRefs(doc *parser.OAS2Document) map[string]bool {
-	capacity := len(doc.Definitions) + len(doc.Parameters) + len(doc.Responses) + len(doc.SecurityDefinitions)
-	validRefs := make(map[string]bool, capacity)
+	var cap64 uint64
+	cap64 += uint64(len(doc.Definitions))
+	cap64 += uint64(len(doc.Parameters))
+	cap64 += uint64(len(doc.Responses))
+	cap64 += uint64(len(doc.SecurityDefinitions))
+	if cap64 > math.MaxInt {
+		cap64 = math.MaxInt
+	}
+	validRefs := make(map[string]bool, int(cap64))
@@
 func buildOAS3ValidRefs(doc *parser.OAS3Document) map[string]bool {
 	if doc.Components == nil {
 		return make(map[string]bool)
 	}
 
-	capacity := len(doc.Components.Schemas) +
-		len(doc.Components.Responses) +
-		len(doc.Components.Parameters) +
-		len(doc.Components.Examples) +
-		len(doc.Components.RequestBodies) +
-		len(doc.Components.Headers) +
-		len(doc.Components.SecuritySchemes) +
-		len(doc.Components.Links) +
-		len(doc.Components.Callbacks) +
-		len(doc.Components.PathItems)
-	validRefs := make(map[string]bool, capacity)
+	var cap64 uint64
+	cap64 += uint64(len(doc.Components.Schemas))
+	cap64 += uint64(len(doc.Components.Responses))
+	cap64 += uint64(len(doc.Components.Parameters))
+	cap64 += uint64(len(doc.Components.Examples))
+	cap64 += uint64(len(doc.Components.RequestBodies))
+	cap64 += uint64(len(doc.Components.Headers))
+	cap64 += uint64(len(doc.Components.SecuritySchemes))
+	cap64 += uint64(len(doc.Components.Links))
+	cap64 += uint64(len(doc.Components.Callbacks))
+	cap64 += uint64(len(doc.Components.PathItems))
+	if cap64 > math.MaxInt {
+		cap64 = math.MaxInt
+	}
+	validRefs := make(map[string]bool, int(cap64))

As per coding guidelines: Handle size computation overflow (CWE-190) by converting to uint64, checking sum against math.MaxInt, then converting back to int before using as slice capacity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@validator/refs.go` around lines 47 - 92, The map capacity computation can
overflow when summing many len(...) values; update buildOAS2ValidRefs and
buildOAS3ValidRefs to compute the sum in uint64, compare against
uint64(math.MaxInt), clamp to math.MaxInt if exceeded, and then convert back to
int for make(map[string]bool, capacity); alternatively you may omit the capacity
hint if clamping is undesirable—apply this uint64-sum-and-clamp pattern to the
capacity variables used in both functions (e.g., the local variables named
capacity and validRefs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@validator/refs.go`:
- Around line 47-92: The map capacity computation can overflow when summing many
len(...) values; update buildOAS2ValidRefs and buildOAS3ValidRefs to compute the
sum in uint64, compare against uint64(math.MaxInt), clamp to math.MaxInt if
exceeded, and then convert back to int for make(map[string]bool, capacity);
alternatively you may omit the capacity hint if clamping is undesirable—apply
this uint64-sum-and-clamp pattern to the capacity variables used in both
functions (e.g., the local variables named capacity and validRefs).

@erraggy
Copy link
Copy Markdown
Owner Author

erraggy commented Feb 22, 2026

Re: map capacity overflow guard — Rejected.

len() returns int, which is 64-bit on all platforms Go targets for non-embedded use. To overflow int64 by summing 10 map lengths, each map would need ~922 quadrillion entries — the process would OOM long before reaching this. Even on 32-bit, you'd need >2 billion total component entries across an OAS document, which would require hundreds of gigabytes of RAM just for the parsed document.

The math.MaxInt guard adds complexity to protect against a scenario that physically cannot occur.

@erraggy erraggy merged commit bdf626c into main Feb 22, 2026
9 checks passed
@erraggy erraggy deleted the perf/all-optimizations branch February 22, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant