perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead by erraggy · Pull Request #344 · erraggy/oastools

erraggy · 2026-02-22T21:16:58Z

Summary

Replace fmt.Sprintf with string concatenation for JSON path construction across validator, converter, and fixer
Replace regexp.MatchString/ReplaceAllString with strings.HasPrefix + string slicing in converter ref rewriting
Replace fmt.Sprintf with strconv.Itoa/strconv.FormatFloat for numeric formatting in validator and schemautil
Pre-allocate maps with capacity hints in validator ref building
Thread schema nesting depth as explicit counter instead of counting dots in path strings
Refactor fixer ref collection from slice-return to pointer-accumulator pattern
Use clear() to reuse map allocations in SchemaHasher

Benchmark Results

benchstat, 10 iterations, Apple M4:

Validator (primary target — hot path optimization)

Benchmark	Time	Memory	Allocs
ValidateParsed/Small	-24.0%	-17.7%	-55.4%
ValidateParsed/Medium	-28.4%	-22.7%	-53.6%
ValidateParsed/Large	-29.4%	-23.5%	-56.9%

SchemaUtil (hasher)

Benchmark	Time	Memory	Allocs
Hash/Simple	-22.9%	-82.8%	-28.6%
Hash/ComplexObject	-13.7%	-40.4%	-9.5%
GroupByHash/1000	-13.7%	-56.2%	-14.3%

Converter

Benchmark	Time	Memory	Allocs
Parsed/OAS2→OAS3/Medium	-9.0%	-1.6%	-10.3%
Parsed/OAS3→OAS2/Medium	-5.8%	-0.7%	-2.5%

Fixer

Allocation pattern improved (pointer-accumulator), but no measurable time/memory impact — ref collection is a small fraction of overall fix cost.

Optimization Techniques

Technique	Where	Impact
`fmt.Sprintf` → string concat	validator, converter	Eliminates reflect-based formatting in hot loops
`regexp` → `strings.HasPrefix`	converter/ref_rewrite	Removes regex engine overhead for simple prefix matching
`fmt.Sprintf` → `strconv`	schemautil/hash, validator	Avoids `%v`/`%d` format parsing for known types
Map pre-allocation	validator/refs	Reduces rehashing for known-size component maps
`clear(map)` vs `make(map)`	schemautil/hash	Reuses backing array instead of reallocating
Explicit depth counter	validator/schema	Removes `strings.Count(path, ".")` per recursion level
`*[]string` accumulator	fixer/prune	Avoids intermediate slice allocations and append-spread

Test plan

make check passes (8498 tests)
All benchmarks run with 10 iterations for statistical significance
All p=0.000 confirming results are not noise
Code review: no bugs, no semantic changes (3 independent reviewers)
Schema depth guard reviewed: more permissive but semantically correct

🤖 Generated with Claude Code

…iminate regex overhead Replace fmt.Sprintf with string concatenation for JSON path construction, replace regexp with strings.HasPrefix for ref prefix matching, replace fmt.Sprintf with strconv for numeric formatting, pre-allocate maps with capacity hints, thread schema depth as explicit counter, refactor fixer ref collection to pointer-accumulator pattern, and use map clear() to reuse allocations in SchemaHasher. Benchmark results (benchstat, 10 iterations, Apple M4): validator/ValidateParsed/Large: -29% time, -23% memory, -57% allocs validator/ValidateParsed/Medium: -28% time, -23% memory, -54% allocs validator/ValidateParsed/Small: -24% time, -18% memory, -55% allocs schemautil/Hash/Simple: -23% time, -83% memory, -29% allocs schemautil/GroupByHash/1000: -14% time, -56% memory, -14% allocs converter/Parsed/OAS2→3/Medium: -9% time converter/Parsed/OAS3→2/Medium: -6% time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-22T21:17:20Z

📝 Walkthrough

Walkthrough

The pull request refactors the codebase to optimize string construction and simplify control flow. Changes include replacing fmt.Sprintf calls with string concatenation and strconv functions for numeric formatting, refactoring internal functions to use pointer-based slice mutations instead of returns, simplifying regex-based ref rewriting with prefix matching, replacing regex-based parameter extraction with manual parsing, and adding depth tracking parameters to validation methods.

Changes

Cohort / File(s)	Summary
Ref Rewriting & Parameter Parsing Simplification `converter/ref_rewrite.go`, `validator/path_validation.go`	Replaced regex-based approaches with simpler direct matching: introduced `refMapping` struct and prefix-based slices for ref rewriting; replaced regex parameter extraction with manual loop-based parser scanning for `{` and `}` pairs.
String Construction Optimization `validator/helpers.go`, `validator/oas2.go`, `validator/oas3.go`, `validator/refs.go`, `validator/schema.go`	Systematically replaced `fmt.Sprintf` calls with direct string concatenation and `strconv.Itoa` for numeric index formatting across error path construction and validation helpers.
Control Flow Refactoring `fixer/prune.go`	Refactored internal ref collection functions to use pointer-based slice mutations (`refs *[]string`) instead of returning new slices, propagating changes through recursive call chains.
Hash Construction & Formatting `internal/schemautil/hash.go`	Replaced `fmt`-based numeric formatting with `strconv.FormatFloat` and `strconv.Itoa` for constraint values; replaced hash visited map reallocation with non-allocating `clear()`.
Validation Depth Tracking `validator/schema.go`	Added `depth int` parameter to `validateSchema`, `validateSchemaWithVisited`, and `validateNestedSchemas` methods to track nesting depth, replacing path-based depth calculations.

Possibly Related PRs

PR #300 — Directly related to the ref rewriting changes in converter/ref_rewrite.go, modifying the same refMapping-based ref rewriting functionality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead' directly and specifically summarizes the main changes—replacing fmt.Sprintf/regexp with string concatenation/HasPrefix to optimize allocations and remove regex overhead.
Description check	✅ Passed	The description is directly related to the changeset, providing a detailed summary of optimizations, benchmark results, techniques used, and test plan—all covering the actual changes made across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch perf/all-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-22T21:18:37Z

Codecov Report

❌ Patch coverage is 72.77487% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.78%. Comparing base (eecc3ce) to head (4644bf7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
validator/refs.go	62.71%	22 Missing ⚠️
internal/schemautil/hash.go	25.00%	9 Missing ⚠️
fixer/prune.go	82.50%	7 Missing ⚠️
validator/schema.go	64.28%	5 Missing ⚠️
validator/oas2.go	75.00%	4 Missing ⚠️
validator/oas3.go	88.88%	3 Missing ⚠️
validator/helpers.go	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #344      +/-   ##
==========================================
+ Coverage   84.72%   84.78%   +0.05%     
==========================================
  Files         193      193              
  Lines       27273    27249      -24     
==========================================
- Hits        23107    23102       -5     
+ Misses       2847     2828      -19     
  Partials     1319     1319

Files with missing lines	Coverage Δ
converter/ref_rewrite.go	`58.24% <100.00%> (ø)`
validator/path_validation.go	`92.98% <100.00%> (+1.14%)`	⬆️
validator/helpers.go	`61.53% <50.00%> (ø)`
validator/oas3.go	`86.09% <88.88%> (+0.15%)`	⬆️
validator/oas2.go	`82.79% <75.00%> (-0.08%)`	⬇️
validator/schema.go	`63.01% <64.28%> (+1.57%)`	⬆️
fixer/prune.go	`75.00% <82.50%> (-0.45%)`	⬇️
internal/schemautil/hash.go	`46.87% <25.00%> (ø)`
validator/refs.go	`58.99% <62.71%> (+2.18%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

validator/refs.go (1)

47-92: ⚠️ Potential issue | 🟠 Major

Guard map capacity sums against int overflow.
The summed len(...) values can overflow int and produce a negative capacity, which will panic in make(map, ...) for extreme inputs. Please sum in uint64 and clamp to math.MaxInt (or drop the capacity hint).

🛡️ Proposed fix with overflow guard

@@
-import (
-	"fmt"
-	"strconv"
-	"strings"
+import (
+	"fmt"
+	"math"
+	"strconv"
+	"strings"
@@
 func buildOAS2ValidRefs(doc *parser.OAS2Document) map[string]bool {
-	capacity := len(doc.Definitions) + len(doc.Parameters) + len(doc.Responses) + len(doc.SecurityDefinitions)
-	validRefs := make(map[string]bool, capacity)
+	var cap64 uint64
+	cap64 += uint64(len(doc.Definitions))
+	cap64 += uint64(len(doc.Parameters))
+	cap64 += uint64(len(doc.Responses))
+	cap64 += uint64(len(doc.SecurityDefinitions))
+	if cap64 > math.MaxInt {
+		cap64 = math.MaxInt
+	}
+	validRefs := make(map[string]bool, int(cap64))
@@
 func buildOAS3ValidRefs(doc *parser.OAS3Document) map[string]bool {
 	if doc.Components == nil {
 		return make(map[string]bool)
 	}
 
-	capacity := len(doc.Components.Schemas) +
-		len(doc.Components.Responses) +
-		len(doc.Components.Parameters) +
-		len(doc.Components.Examples) +
-		len(doc.Components.RequestBodies) +
-		len(doc.Components.Headers) +
-		len(doc.Components.SecuritySchemes) +
-		len(doc.Components.Links) +
-		len(doc.Components.Callbacks) +
-		len(doc.Components.PathItems)
-	validRefs := make(map[string]bool, capacity)
+	var cap64 uint64
+	cap64 += uint64(len(doc.Components.Schemas))
+	cap64 += uint64(len(doc.Components.Responses))
+	cap64 += uint64(len(doc.Components.Parameters))
+	cap64 += uint64(len(doc.Components.Examples))
+	cap64 += uint64(len(doc.Components.RequestBodies))
+	cap64 += uint64(len(doc.Components.Headers))
+	cap64 += uint64(len(doc.Components.SecuritySchemes))
+	cap64 += uint64(len(doc.Components.Links))
+	cap64 += uint64(len(doc.Components.Callbacks))
+	cap64 += uint64(len(doc.Components.PathItems))
+	if cap64 > math.MaxInt {
+		cap64 = math.MaxInt
+	}
+	validRefs := make(map[string]bool, int(cap64))

As per coding guidelines: Handle size computation overflow (CWE-190) by converting to uint64, checking sum against math.MaxInt, then converting back to int before using as slice capacity.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@validator/refs.go` around lines 47 - 92, The map capacity computation can
overflow when summing many len(...) values; update buildOAS2ValidRefs and
buildOAS3ValidRefs to compute the sum in uint64, compare against
uint64(math.MaxInt), clamp to math.MaxInt if exceeded, and then convert back to
int for make(map[string]bool, capacity); alternatively you may omit the capacity
hint if clamping is undesirable—apply this uint64-sum-and-clamp pattern to the
capacity variables used in both functions (e.g., the local variables named
capacity and validRefs).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@validator/refs.go`:
- Around line 47-92: The map capacity computation can overflow when summing many
len(...) values; update buildOAS2ValidRefs and buildOAS3ValidRefs to compute the
sum in uint64, compare against uint64(math.MaxInt), clamp to math.MaxInt if
exceeded, and then convert back to int for make(map[string]bool, capacity);
alternatively you may omit the capacity hint if clamping is undesirable—apply
this uint64-sum-and-clamp pattern to the capacity variables used in both
functions (e.g., the local variables named capacity and validRefs).

erraggy · 2026-02-22T21:27:13Z

Re: map capacity overflow guard — Rejected.

len() returns int, which is 64-bit on all platforms Go targets for non-embedded use. To overflow int64 by summing 10 map lengths, each map would need ~922 quadrillion entries — the process would OOM long before reaching this. Even on 32-bit, you'd need >2 billion total component entries across an OAS document, which would require hundreds of gigabytes of RAM just for the parsed document.

The math.MaxInt guard adds complexity to protect against a scenario that physically cannot occur.

coderabbitai Bot reviewed Feb 22, 2026

View reviewed changes

erraggy merged commit bdf626c into main Feb 22, 2026
9 checks passed

erraggy deleted the perf/all-optimizations branch February 22, 2026 21:27

coderabbitai Bot mentioned this pull request Feb 27, 2026

refactor: consolidate equality helpers, fix silent limits, centralize file permissions #348

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead#344

perf(validator,converter,schemautil,fixer): reduce allocations and eliminate regex overhead#344
erraggy merged 1 commit intomainfrom
perf/all-optimizations

erraggy commented Feb 22, 2026

Uh oh!

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading

Walkthrough

Changes

Possibly Related PRs

Estimated code review effort

Uh oh!

codecov Bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

erraggy commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erraggy commented Feb 22, 2026

Summary

Benchmark Results

Validator (primary target — hot path optimization)

SchemaUtil (hasher)

Converter

Fixer

Optimization Techniques

Test plan

Uh oh!

coderabbitai Bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly Related PRs

Estimated code review effort

Uh oh!

codecov Bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

erraggy commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading

codecov Bot commented Feb 22, 2026 •

edited

Loading