Skip to content

[TT-16767][Global] [Implementation] Centralised Error Overrides Infrastructure#7867

Merged
MFCaballero merged 32 commits intomasterfrom
TT-16767
Apr 20, 2026
Merged

[TT-16767][Global] [Implementation] Centralised Error Overrides Infrastructure#7867
MFCaballero merged 32 commits intomasterfrom
TT-16767

Conversation

@MFCaballero
Copy link
Copy Markdown
Contributor

@MFCaballero MFCaballero commented Mar 9, 2026

Description

This PR introduces a centralized Error Overrides feature, allowing for the customization of error responses at the gateway level. This functionality enables users to standardize error formats, hide internal error details, and provide branded or localized messages for both gateway-generated errors (e.g., auth failures, rate limits) and upstream service errors (e.g., 4xx/5xx responses).

The implementation is optimized for performance, featuring a near-zero overhead path when no overrides are configured. All matching rules, including regular expressions and inline templates, are pre-compiled at gateway startup to ensure minimal latency during error handling.

Error Override Performance Benchmarks

Executive Summary

The error override feature adds negligible performance overhead to the error handling path:

  • No overrides configured: ~5 ns/op overhead (fast path with map length check)
  • Direct message override: ~5.8 μs/op vs ~11.1 μs/op baseline (48% faster than default template)
  • With template execution: ~7-11 μs/op (comparable to or faster than default template rendering)

Optimization: The code checks if overrides exist before entering the override path, ensuring minimal impact on existing deployments.

Detailed Results

1. ApplyOverride - Matching Performance

Testing the core matching logic that determines if an override should be applied:

BenchmarkApplyOverride/no_overrides_configured                        4.05 ns/op       0 B/op       0 allocs/op
BenchmarkApplyOverride/exact_code_match                               55.0 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/pattern_match_4xx                              66.2 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/regex_pattern_match                           219.8 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/regex_pattern_non-match                       169.8 ns/op       0 B/op       0 allocs/op
BenchmarkApplyOverride/JSON_body_field_match                         226.9 ns/op      48 B/op       2 allocs/op
BenchmarkApplyOverride/multiple_rules_-_first_match                  235.5 ns/op      32 B/op       1 allocs/op

Key Findings:

  • No configuration: ~4 ns overhead - very low
  • Exact code match: ~55 ns (O(1) hash map lookup)
  • Pattern match (4xx/5xx): ~66 ns (prefix calculation + map lookup)
  • Regex matching: ~220 ns (compiled regex)
  • JSON path matching: ~227 ns (gjson library)

2. Flag-Based Matching (Error Classification)

Flag matching uses the error classification system for semantic matching - matching by error type rather than text patterns:

BenchmarkApplyOverride/flag_match_-_exact_match                       92.9 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/flag_match_-_no_classification                 26.8 ns/op       0 B/op       0 allocs/op
BenchmarkApplyOverride/flag_match_-_fallback_to_regex                193.9 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/multiple_flag_rules_-_first_match             104.7 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/multiple_flag_rules_-_catch_all               110.7 ns/op      32 B/op       1 allocs/op

Flag vs Regex Performance Comparison:

BenchmarkApplyOverride/flag_vs_regex_-_flag                           75.1 ns/op      32 B/op       1 allocs/op
BenchmarkApplyOverride/flag_vs_regex_-_regex                         173.2 ns/op       0 B/op       0 allocs/op

Key Findings:

  • Flag matching is ~2.3x faster than regex (75 ns vs 173 ns)
  • No classification in context: 26.8 ns (very fast early exit)
  • Multiple flag rules: ~105-111 ns (efficient even with multiple rules)
  • Fallback to regex: ~194 ns (checks flag first, then regex)
  • Flag matching is a simple string comparison vs regex execution

3. WriteOverrideResponse vs WriteTemplateErrorResponse

Direct comparison of error response writing:

Response Method                                           Time         Memory    Allocs
--------------------------------------------------------------------------------------
BenchmarkWriteOverrideResponse/direct_message             5.8 μs      7312 B    31 allocs
BenchmarkWriteOverrideResponse/inline_template_JSON       9.5 μs      7745 B    49 allocs
BenchmarkWriteOverrideResponse/inline_template_XML        7.4 μs      7513 B    38 allocs
BenchmarkWriteOverrideResponse/file_template_JSON        10.2 μs      8001 B    51 allocs
BenchmarkWriteOverrideResponse/file_template_XML          7.7 μs      7769 B    40 allocs
BenchmarkWriteOverrideResponse/with_custom_headers       11.1 μs      8537 B    67 allocs

BenchmarkWriteTemplateErrorResponse/default_JSON         11.1 μs      7532 B    41 allocs

Key Findings:

  • Direct message writing: 5.8 μs - 48% faster than default template (11.1 μs)
    • Best performance when no templating needed
    • Writes JSON/XML response directly
  • Inline template: 7.4-9.5 μs - Also faster than default
    • Allows dynamic {{.StatusCode}} and {{.Message}} substitution
    • Very efficient
  • File template: 7.7-10.2 μs - Within acceptable range
    • Reuses existing template infrastructure
    • Performance varies by template complexity

4. Compilation Performance

One-time cost during gateway startup or API reload:

BenchmarkCompileErrorOverrides/single_exact_code          0.76 μs      520 B     6 allocs
BenchmarkCompileErrorOverrides/multiple_exact_codes       1.67 μs     1000 B    14 allocs
BenchmarkCompileErrorOverrides/with_regex_patterns        0.76 μs      624 B     6 allocs
BenchmarkCompileErrorOverrides/with_inline_templates     12.23 μs     8240 B    99 allocs
BenchmarkCompileErrorOverrides/mixed_exact_and_patterns   2.01 μs     1374 B    17 allocs

Key Findings:

  • Simple configurations compile in ~0.8-2 μs
  • Regex compilation adds minimal overhead
  • Template compilation is more expensive (~12 μs) but happens only at startup
  • All costs are one-time during configuration load

Fast Path Optimization

Testing the entry point tryWriteOverride with empty vs configured overrides:

BenchmarkTryWriteOverride/empty_config_-_fast_path               5.14 ns/op       0 B/op       0 allocs/op
BenchmarkTryWriteOverride/config_exists_-_match_with_direct_body 2450 ns/op    1824 B/op      22 allocs/op

Key Findings:

  • The optimization checks len(e.Spec.GlobalConfig.ErrorOverrides) == 0 before proceeding
  • When empty (no overrides configured), returns immediately without atomic loads
  • Fast path overhead: ~5 ns with zero allocations
  • This is 476x faster than processing a match (~2.5 μs)
  • This is 43x faster than a regex match (~220 ns)

Performance Impact Analysis

Hot Path (Every Error Response)

When no overrides are configured (most common case):

  • Overhead: ~5 ns per error (fast path with map length check)
  • Memory: 0 bytes allocated
  • Impact: Virtually zero - immeasurable in real-world operations

When overrides are configured and match:

  • Best case (direct message): 48% faster than default template
  • Typical case (inline template): ~15-33% faster than default
  • Worst case (file template JSON): ~10 μs (comparable to default)

Cold Path (Gateway Startup)

Compilation happens once during:

  • Gateway initialization
  • API configuration reload
  • Overhead: ~0.8-12 μs depending on complexity
  • Impact: Negligible - happens infrequently

Memory Allocation Analysis

Memory allocations per error response:

Operation                      Allocations    Bytes
----------------------------------------------------
No override check              0 allocs       0 B
Exact code match               1 alloc        32 B
Pattern match (4xx/5xx)        1 alloc        32 B
Flag match (exact)             1 alloc        32 B
Flag match (no classification) 0 allocs       0 B
Regex pattern match            1 alloc        32 B
Direct message write           31 allocs      7312 B
Inline template execution      49 allocs      7745 B
Default template               41 allocs      7532 B

Key Findings:

  • Zero allocations when no overrides configured
  • Zero allocations for flag match when no classification in context
  • Minimal allocation (32B) for matching logic
  • Direct message writing uses similar memory to default template
  • No memory leaks or unbounded allocations

Scalability Considerations

Large Body Handling

BenchmarkApplyOverride/large_body_truncation              1.28 μs      538 B     6 allocs
  • Large bodies (>4KB) are truncated before pattern matching
  • Prevents performance degradation with large error responses
  • Adds ~1.3 μs overhead only when bodies exceed 4KB

Multiple Rules

BenchmarkApplyOverride/multiple_rules_-_first_match       236 ns       32 B/op   1 allocs
  • First-match semantics ensure O(n) worst case
  • In practice, most errors match within 1-2 rules
  • No performance degradation with reasonable rule counts

Conclusions

Virtually zero overhead when disabled: ~5 ns (fast path with map length check) - immeasurable in production

Optimized check path: Early exit when no overrides configured prevents significant overhead

Status code matching is fast: Both exact (~55 ns) and pattern (~66 ns) matching are very efficient

Flag matching is ~2.3x faster than regex: 75 ns vs 173 ns for semantic error matching

Faster for simple overrides: Direct message writing is 48% faster than default templates

Acceptable overhead for advanced features: Template execution adds reasonable overhead for the flexibility gained

Efficient matching: O(1) lookups for exact codes, fast flag comparison, pre-compiled regex

No memory leaks: Bounded allocations, pre-compiled patterns

Recommendations

For best performance:

  1. Use flag-based matching when possible (~2.3x faster than regex)
  2. Use direct message writing (no template variables) when possible
  3. Keep regex patterns simple
  4. Pre-compile overrides at startup (already implemented)

For optimal flexibility:

  1. Use flag matching for semantic error types (e.g., RLT for rate limiting)
  2. Use inline templates with {{.StatusCode}} and {{.Message}}
  3. Use file templates for complex responses
  4. Use JSON body field matching for structured upstream error responses
  5. Use regex patterns as fallback when flag classification isn't available
  6. Both exact and pattern (4xx/5xx) status code matching are efficient (~55-66 ns)

Test Environment:

  • Machine: Apple M1 Pro
  • Go Version: 1.25.1
  • OS: macOS (darwin/arm64)
  • Benchmark Duration: 3 seconds per benchmark

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

Ticket Details

TT-16767
Status Ready for Testing
Summary [1a] Implement Centralised Error Overrides Infrastructure

Generated at: 2026-03-17 16:11:43

@probelabs
Copy link
Copy Markdown
Contributor

probelabs bot commented Mar 9, 2026

This PR introduces a centralized Error Overrides feature, enabling the customization of error responses at the gateway level. This allows users to standardize error formats, hide internal details, and provide custom messages for both gateway-generated errors (e.g., auth failures, rate limits) and upstream service errors (e.g., 4xx/5xx). The implementation is optimized for performance, featuring a near-zero overhead path when no overrides are configured. All matching rules, including regular expressions and inline templates, are pre-compiled at gateway startup to ensure minimal latency.

Files Changed Analysis

This is a significant feature addition, reflected in the 30 files changed with 2,398 additions and only 16 deletions. The changes are well-organized:

  • Core Feature Definition (apidef/): The new data structures (ErrorOverridesMap, ErrorOverride, ErrorMatcher, ErrorResponse) are defined in the new file apidef/error_overrides.go and integrated into the main apidef/api_definitions.go.
  • OpenAPI (OAS) Integration (apidef/oas/): The feature is exposed through the x-tyk-api-gateway OAS extension. This includes new models in apidef/oas/error_overrides.go, corresponding tests, and updates to the JSON schemas (apidef/oas/schema/).
  • Testing (ci/tests/error-overrides/): A comprehensive new integration test suite has been added. It includes a Taskfile.yml for orchestration and numerous API definitions (apps/*.json) designed to trigger a wide range of error conditions, ensuring the override mechanism is robust.

Architecture & Impact Assessment

  • What this PR accomplishes: It implements a centralized, performant, and highly configurable system for intercepting and customizing error responses at the gateway and API levels.

  • Key technical changes introduced:

    • New configuration fields (error_overrides and error_overrides_disabled) are added to the APIDefinition struct.
    • A rule-matching engine is introduced, supporting status codes (e.g., 500, 4xx), error classification flags (e.g., RLT for rate limiting), regex patterns against the response body, and JSON path matching (body_field, body_value).
    • A pre-compilation step at gateway startup or API reload processes and indexes override rules for efficient runtime lookups.
  • Affected system components:

    • API Definition: The core APIDefinition spec is extended to support error override rules.
    • Configuration Loading: The gateway's startup and hot-reload processes will now include the compilation of these rules.
    • Error Handling Middleware: The central error handling logic will be modified to apply these overrides (inferred, as the middleware itself is not in this diff).
    • OAS Engine: The OpenAPI import/export functionality is updated to handle the new x-tyk-api-gateway.errorOverrides extension.

Error Handling Flow with Overrides

sequenceDiagram
    participant Client
    participant Gateway
    participant ErrorHandler
    participant OverrideEngine

    Client->>Gateway: Makes a request
    Gateway-->>ErrorHandler: Request fails (e.g., auth error, upstream 5xx)
    ErrorHandler->>OverrideEngine: tryWriteOverride(code, message, body)
    alt Rule matches
        OverrideEngine-->>ErrorHandler: Return OverrideResult{new_code, new_body, ...}
        ErrorHandler->>ErrorHandler: writeOverrideResponse()
        ErrorHandler-->>Client: HTTP Response with custom body/code/headers
    else No matching override
        OverrideEngine-->>ErrorHandler: Return nil (no match)
        ErrorHandler->>ErrorHandler: writeTemplateErrorResponse() (default behavior)
        ErrorHandler-->>Client: Default error template response
    end
Loading

Scope Discovery & Context Expansion

  • The changes in this PR focus on the data structures and configuration schemas (apidef, apidef/oas). The runtime logic that applies these overrides (likely a middleware) is not included here but is a necessary counterpart to this infrastructure.
  • The feature is designed to be configurable at both the global (gateway-wide) and API-specific levels, offering flexible application of rules.
  • The extensive test files in ci/tests/error-overrides/apps/ demonstrate the feature's broad applicability to a wide range of errors, including authentication failures (AMF, AKI), rate limiting (RLT), body validation (BIV), and various upstream failures.
  • A key design choice is the use of flag matching, which leverages Tyk's internal error classification system (errors.ResponseFlag). This allows for more robust, semantic matching of error types rather than relying on fragile string matching of error messages.
Metadata
  • Review Effort: 4 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-04-20T15:49:30.966Z | Triggered by: pr_updated | Commit: 0e739a5

💡 TIP: You can chat with Visor using /visor ask <your question>

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

API Changes

--- prev.txt	2026-04-20 15:11:20.276889259 +0000
+++ current.txt	2026-04-20 15:11:14.634751057 +0000
@@ -310,6 +310,10 @@
 	// SecurityRequirements stores all OAS security requirements (auto-populated from OpenAPI description import)
 	// When len(SecurityRequirements) > 1, OR logic is automatically applied
 	SecurityRequirements [][]string `json:"security_requirements,omitempty" bson:"security_requirements,omitempty"`
+
+	// ErrorOverrides contains the configurations for error response customization.
+	ErrorOverrides         ErrorOverridesMap `bson:"error_overrides" json:"error_overrides"`
+	ErrorOverridesDisabled bool              `bson:"error_overrides_disabled" json:"error_overrides_disabled" `
 }
     APIDefinition represents the configuration for a single proxied API and it's
     versions.
@@ -523,6 +527,72 @@
 	Headers map[string]string    `bson:"headers" json:"headers"`
 }
 
+type ErrorMatcher struct {
+	// Flag matches against the error classification flag from the request context.
+	Flag errors.ResponseFlag `bson:"flag,omitempty" json:"flag,omitempty"`
+
+	// MessagePattern is a regex pattern to match against the response body.
+	MessagePattern string `bson:"message_pattern,omitempty" json:"message_pattern,omitempty"`
+
+	// BodyField is a JSON path (gjson syntax) to extract a value from the response body.
+	BodyField string `bson:"body_field,omitempty" json:"body_field,omitempty"`
+
+	// BodyValue is the expected value at BodyField for the match to succeed.
+	BodyValue string `bson:"body_value,omitempty" json:"body_value,omitempty"`
+
+	// CompiledPattern is the pre-compiled regex for MessagePattern.
+	CompiledPattern *regexp.Regexp `bson:"-" json:"-" ignored:"true"`
+}
+    ErrorMatcher defines additional matching criteria for error overrides.
+
+func (m *ErrorMatcher) Compile() error
+    Compile compiles the MessagePattern regex if present. Should be called after
+    unmarshaling from JSON or YAML.
+
+type ErrorOverride struct {
+	// Match contains optional additional matching criteria.
+	Match *ErrorMatcher `bson:"match,omitempty" json:"match,omitempty"`
+
+	// Response defines the response to return when matched.
+	Response ErrorResponse `bson:"response" json:"response"`
+
+	// Has unexported fields.
+}
+    ErrorOverride combines an optional matcher with its response.
+
+func (e *ErrorOverride) GetCompiledTemplate(isXML bool) any
+    GetCompiledTemplate returns the pre-compiled template for the given content
+    type. Returns nil if no inline Body template was compiled (e.g., using file
+    template).
+
+func (e *ErrorOverride) HasCompiledTemplate() bool
+    HasCompiledTemplate returns true if this override has a pre-compiled inline
+    Body template.
+
+func (e *ErrorOverride) SetCompiledTemplates(textTmpl, htmlTmpl any)
+    SetCompiledTemplates stores the pre-compiled templates for inline Body.
+
+type ErrorOverridesMap map[string][]ErrorOverride
+    ErrorOverridesMap maps status codes to their override rules.
+
+type ErrorResponse struct {
+	// StatusCode is the HTTP status code to return.
+	StatusCode int `bson:"status_code" json:"status_code"`
+
+	// Body is the HTTP response body (literal or inline template).
+	Body string `bson:"body,omitempty" json:"body,omitempty"`
+
+	// Message is the semantic error message passed to templates as {{.Message}}.
+	Message string `bson:"message,omitempty" json:"message,omitempty"`
+
+	// Template references an error template file in the templates/ directory.
+	Template string `bson:"template,omitempty" json:"template,omitempty"`
+
+	// Headers are HTTP headers to include in the response.
+	Headers map[string]string `bson:"headers,omitempty" json:"headers,omitempty"`
+}
+    ErrorResponse defines the override response for error overrides.
+
 type EventHandlerMetaConfig struct {
 	Events map[TykEvent][]EventHandlerTriggerConfig `bson:"events" json:"events"`
 }
@@ -3003,6 +3073,77 @@
 func (et *EnforceTimeout) Fill(meta apidef.HardTimeoutMeta)
     Fill fills *EnforceTimeout from apidef.HardTimeoutMeta.
 
+type ErrorMatcher struct {
+	// Flag matches against the error classification flag from the request context.
+	Flag errors.ResponseFlag `bson:"flag,omitempty" json:"flag,omitempty"`
+
+	// MessagePattern is a regex pattern to match against the response body.
+	MessagePattern string `bson:"messagePattern,omitempty" json:"messagePattern,omitempty"`
+
+	// BodyField is a JSON path (gjson syntax) to extract a value from the response body.
+	BodyField string `bson:"bodyField,omitempty" json:"bodyField,omitempty"`
+
+	// BodyValue is the expected value at BodyField for the match to succeed.
+	BodyValue string `bson:"bodyValue,omitempty" json:"bodyValue,omitempty"`
+}
+    ErrorMatcher defines additional matching criteria for error overrides.
+
+func (em *ErrorMatcher) ExtractTo(api *apidef.ErrorMatcher)
+
+type ErrorOverride struct {
+	// Match contains optional additional matching criteria.
+	Match *ErrorMatcher `bson:"match,omitempty" json:"match,omitempty"`
+
+	// Response defines the response to return when matched.
+	Response ErrorResponse `bson:"response" json:"response"`
+}
+    ErrorOverride combines an optional matcher with its response.
+
+func (eo *ErrorOverride) ExtractTo(api *apidef.ErrorOverride)
+
+func (eo *ErrorOverride) Fill(api apidef.ErrorOverride)
+
+type ErrorOverrides struct {
+	// Enabled determines if error overrides are active for this API.
+	// Maps to Tyk classic API definition: `error_overrides_disabled`
+	Enabled bool `bson:"enabled" json:"enabled"`
+
+	// Value contains the map of status codes to their override rules.
+	Value ErrorOverridesMap `bson:"value,omitempty" json:"value,omitempty"`
+}
+    ErrorOverrides defines the OAS extension configuration for error overrides.
+
+func (e *ErrorOverrides) ExtractTo(api *apidef.APIDefinition)
+
+func (e *ErrorOverrides) Fill(api apidef.APIDefinition)
+
+type ErrorOverridesMap map[string][]ErrorOverride
+    ErrorOverridesMap maps status codes to their override rules.
+
+func (e *ErrorOverridesMap) ExtractTo(api *apidef.APIDefinition)
+
+func (e *ErrorOverridesMap) Fill(api apidef.APIDefinition)
+
+type ErrorResponse struct {
+	// StatusCode is the HTTP status code to return.
+	StatusCode int `bson:"statusCode" json:"statusCode"`
+
+	// Body is the HTTP response body (literal or inline template).
+	Body string `bson:"body,omitempty" json:"body,omitempty"`
+
+	// Message is the semantic error message passed to templates as {{.Message}}.
+	Message string `bson:"message,omitempty" json:"message,omitempty"`
+
+	// Template references an error template file in the templates/ directory.
+	Template string `bson:"template,omitempty" json:"template,omitempty"`
+
+	// Headers are HTTP headers to include in the response.
+	Headers map[string]string `bson:"headers,omitempty" json:"headers,omitempty"`
+}
+    ErrorResponse defines the override response for error overrides.
+
+func (er ErrorResponse) ExtractTo(api *apidef.ErrorResponse)
+
 type EventHandler struct {
 	// Enabled enables the event handler.
 	//
@@ -5576,6 +5717,8 @@
 	Server Server `bson:"server" json:"server"` // required
 	// Middleware contains the configurations related to the Tyk middleware.
 	Middleware *Middleware `bson:"middleware,omitempty" json:"middleware,omitempty"`
+	// ErrorOverrides contains the configurations for error response customization.
+	ErrorOverrides *ErrorOverrides `bson:"errorOverrides,omitempty" json:"errorOverrides,omitempty"`
 }
     XTykAPIGateway contains custom Tyk API extensions for the OpenAPI
     definition. The values for the extensions are stored inside the OpenAPI
@@ -6829,6 +6972,25 @@
 	// ```
 	OverrideMessages map[string]TykError `bson:"override_messages" json:"override_messages"`
 
+	// ErrorOverrides allows you to customize the error responses that the Gateway will return to API clients.
+	// This configuration will be used to override both Gateway-generated errors (e.g. authentication failures, rate limits, validation errors)
+	// and errors returned by the upstream service (4xx/5xx responses from backend APIs).
+	// Rules are organized by HTTP status code and can include additional matching criteria.
+	// These rules will be superseded by any overrides configured in the API definition
+	//
+	// Sample Override Setting
+	// ```
+	// "error_overrides": {
+	//   "500": [{
+	//     "response": {
+	//       "status_code": 503,
+	//       "body": "{\"error\": \"Service temporarily unavailable\"}"
+	//     }
+	//   }]
+	// }
+	// ```
+	ErrorOverrides apidef.ErrorOverridesMap `json:"error_overrides,omitempty"`
+
 	// Cloud flag shows the Gateway runs in Tyk Cloud.
 	Cloud bool `json:"cloud"`
 
@@ -9704,6 +9866,12 @@
     APIError is generic error object returned if there is something wrong with
     the request
 
+type APIErrorWithContext struct {
+	Message    htmltemplate.HTML
+	StatusCode int
+}
+    APIErrorWithContext provides context for error override templates.
+
 type APISpec struct {
 	*apidef.APIDefinition
 	OAS oas.OAS
@@ -9770,6 +9938,7 @@
 	// all primitives on every JSON-RPC request that doesn't match a VEM.
 	// This is a convenience flag that combines ToolsAllowListEnabled, ResourcesAllowListEnabled, and PromptsAllowListEnabled.
 	MCPAllowListEnabled bool
+
 	// Has unexported fields.
 }
     APISpec represents a path specification for an API, to avoid enumerating
@@ -9806,6 +9975,10 @@
 
 func (s *APISpec) FireEvent(name apidef.TykEvent, meta interface{})
 
+func (a *APISpec) GetCompiledErrorOverrides() *CompiledErrorOverrides
+    GetCompiledErrorOverrides returns the compiled error overrides for O(1)
+    lookup.
+
 func (a *APISpec) GetPRMConfig() *oas.ProtectedResourceMetadata
     GetPRMConfig returns the Protected Resource Metadata configuration if the
     API is an OAS API Definition (OAS API, MCP Proxy, Stream API) with PRM
@@ -9832,6 +10005,9 @@
 
 func (a *APISpec) SanitizeProxyPaths(r *http.Request)
 
+func (a *APISpec) SetCompiledErrorOverrides(compiled *CompiledErrorOverrides)
+    SetCompiledErrorOverrides stores the compiled error overrides.
+
 func (a *APISpec) StopSessionManagerPool()
 
 func (a *APISpec) StripListenPath(reqPath string) string
@@ -10195,6 +10371,23 @@
     ObjectPostProcess does CoProcessObject post-processing (adding/removing
     headers or params, etc.).
 
+type CompiledErrorOverrides struct {
+	// ByExactCode maps exact status codes to their override rules.
+	ByExactCode map[int][]*apidef.ErrorOverride
+
+	// ByPrefix maps status code prefixes to pattern rules.
+	ByPrefix map[int][]*apidef.ErrorOverride
+}
+    CompiledErrorOverrides provides lookup for error overrides by status code.
+
+func CompileErrorOverrides(overrides apidef.ErrorOverridesMap) *CompiledErrorOverrides
+    CompileErrorOverrides compiles all regex patterns, pre-compiles inline
+    message templates, and builds an indexed lookup structure for O(1) status
+    code matching. Called during config load (gateway-level) or API load
+    (API-level). Compilation failures are logged as warnings and those rules
+    are skipped. Returns nil if no overrides are provided or all rules failed to
+    compile.
+
 type ComplexityFailReason int
 
 const (
@@ -10372,10 +10565,58 @@
     most middleware will invoke the ErrorHandler if something is wrong with the
     request and halt the request processing through the chain
 
+func (e *ErrorHandler) ExecuteErrorTemplate(w http.ResponseWriter, tmpl TemplateExecutor, data any, errCode int) *http.Response
+    ExecuteErrorTemplate executes a template and captures output for analytics.
+    Uses io.MultiWriter to write to both the response and a buffer for
+    recording.
+
 func (e *ErrorHandler) HandleError(w http.ResponseWriter, r *http.Request, errMsg string, errCode int, writeResponse bool)
     HandleError is the actual error handler and will store the error details in
     analytics if analytics processing is enabled.
 
+func (e *ErrorHandler) SetErrorResponseHeaders(w http.ResponseWriter, contentType string) http.Header
+    SetErrorResponseHeaders sets common error response headers on both the
+    ResponseWriter and returns a copy for analytics recording.
+
+type ErrorOverrides struct {
+	Spec *APISpec
+	Gw   *Gateway
+}
+    ErrorOverrides provides centralized error override logic for both
+    Tyk-generated errors (via HandleError) and upstream error responses (via
+    response middleware).
+
+func NewErrorOverrides(spec *APISpec, gw *Gateway) *ErrorOverrides
+    NewErrorOverrides creates a new ErrorOverrides instance.
+
+func (o *ErrorOverrides) ApplyOverride(r *http.Request, statusCode int, body []byte) *OverrideResult
+    ApplyOverride attempts to match and apply an override for the given error.
+    Uses O(1) lookup by status code, then checks additional matching criteria.
+    Returns nil if no override matches.
+
+func (o *ErrorOverrides) ApplyUpstreamOverride(statusCode int, readBody func() []byte) *OverrideResult
+    ApplyUpstreamOverride applies overrides for upstream 4xx/5xx responses.
+    Uses lazy body reading via closure.
+
+type ErrorResponseContext struct {
+	// ContentType is the Content-Type header value to use in the response.
+	ContentType string
+
+	// TemplateExtension is the file extension for template lookup ("json" or "xml").
+	TemplateExtension string
+
+	// IsXML indicates whether XML content type was detected.
+	// When true, text/template is used; otherwise html/template is used.
+	IsXML bool
+}
+    ErrorResponseContext holds content-type detection results for error
+    responses. Used to determine template extension and template engine
+    selection.
+
+func DetectErrorResponseContext(r *http.Request) *ErrorResponseContext
+    DetectErrorResponseContext extracts content type info from the request.
+    Follows the same pattern as writeTemplateErrorResponse for consistency.
+
 type EventCurcuitBreakerMeta struct {
 	EventMetaDefault
 	Path         string
@@ -10641,6 +10882,10 @@
 
 func (gw *Gateway) GetCoProcessGrpcServerTargetURL() (*url.URL, error)
 
+func (gw *Gateway) GetCompiledErrorOverrides() *CompiledErrorOverrides
+    GetCompiledErrorOverrides returns the compiled error overrides for O(1)
+    lookup.
+
 func (gw *Gateway) GetConfig() config.Config
 
 func (gw *Gateway) GetLoadedAPIIDs() []model.LoadedAPIInfo
@@ -10710,6 +10955,9 @@
 
 func (gw *Gateway) SetCheckerHostList()
 
+func (gw *Gateway) SetCompiledErrorOverrides(compiled *CompiledErrorOverrides)
+    SetCompiledErrorOverrides stores the compiled error overrides.
+
 func (gw *Gateway) SetConfig(conf config.Config, skipReload ...bool)
 
 func (gw *Gateway) SetNodeID(nodeID string)
@@ -11775,6 +12023,40 @@
 
 func (k *OrganizationMonitor) SetOrgSentinel(orgChan chan bool, orgId string)
 
+type OverrideResult struct {
+	// StatusCode is the HTTP status code to return.
+	StatusCode int
+
+	// Headers are additional HTTP headers to include.
+	Headers map[string]string
+
+	// OriginalCode is the original error status code before override.
+	OriginalCode int
+
+	// Has unexported fields.
+}
+    OverrideResult contains the result of applying an error override. Holds
+    context needed for response writing including the matched rule.
+
+func (r *OverrideResult) GetBody() string
+    GetBody returns the response body.
+
+func (r *OverrideResult) GetMessageForTemplate() string
+    GetMessageForTemplate returns the semantic message for {{.Message}} in
+    templates.
+
+func (r *OverrideResult) GetTemplateExecutor(gw *Gateway, errCtx *ErrorResponseContext) TemplateExecutor
+    GetTemplateExecutor returns the template to execute, or nil if body should
+    be written directly.
+
+func (r *OverrideResult) ShouldUseDefaultTemplate() bool
+    ShouldUseDefaultTemplate returns true when only Message is set (no Body,
+    no Template).
+
+func (r *OverrideResult) ShouldWriteDirectly() bool
+    ShouldWriteDirectly returns true if body should be written as-is (no
+    template variables).
+
 type PRMMiddleware struct {
 	*BaseMiddleware
 }
@@ -12341,6 +12623,29 @@
 
 func (m *ResponseCacheMiddleware) Name() string
 
+type ResponseErrorOverrideMiddleware struct {
+	BaseTykResponseHandler
+}
+    ResponseErrorOverrideMiddleware intercepts upstream 4xx/5xx responses and
+    applies configured error overrides before they reach the client.
+
+func (r *ResponseErrorOverrideMiddleware) Base() *BaseTykResponseHandler
+
+func (r *ResponseErrorOverrideMiddleware) Enabled() bool
+
+func (r *ResponseErrorOverrideMiddleware) HandleError(_ http.ResponseWriter, _ *http.Request)
+
+func (r *ResponseErrorOverrideMiddleware) HandleResponse(
+	_ http.ResponseWriter,
+	res *http.Response,
+	req *http.Request,
+	_ *user.SessionState,
+) error
+
+func (r *ResponseErrorOverrideMiddleware) Init(_ any, spec *APISpec) error
+
+func (r *ResponseErrorOverrideMiddleware) Name() string
+
 type ResponseGoPluginMiddleware struct {
 	BaseTykResponseHandler
 	Path       string // path to .so file

@MFCaballero MFCaballero changed the title init [TT-16767][Global] [Implementation] Centralised Error Overrides Infrastructure Mar 9, 2026
@probelabs
Copy link
Copy Markdown
Contributor

probelabs bot commented Mar 9, 2026

Security Issues (2)

Severity Location Issue
🟠 Error apidef/error_overrides.go:70-78
The `MessagePattern` field in `ErrorMatcher` is compiled using Go's standard `regexp` package, which is known to be vulnerable to Regular Expression Denial of Service (ReDoS) attacks. A malicious or poorly crafted regex pattern in an API definition could lead to excessive CPU consumption and service degradation when matching against response bodies. The `regexp.Compile` call on line 72 uses this vulnerable engine.
💡 SuggestionTo mitigate the risk of ReDoS, replace the standard `regexp` package with a library that uses a linear-time regex engine, such as Google's RE2. The project already has a `github.com/TykTechnologies/tyk/regexp` package that wraps `regexp`, which could be modified to wrap `github.com/wasilibs/go-re2` or a similar safe alternative. This would provide protection against catastrophic backtracking without requiring changes to the rest of the code that uses the `regexp` wrapper.
🟡 Warning apidef/error_overrides.go:91
The `Body` field in `ErrorResponse` can be used as an inline template. If this template were to include user-controllable data, it could be vulnerable to Cross-Site Scripting (XSS) or other injection attacks. While the current implementation appears to only use safe, system-generated values like `{{.StatusCode}}` and `{{.Message}}`, this creates a potential risk if the feature is extended in the future to include data from the original request or response body in the template context. The use of `html/template` for non-XML content provides good protection, but `text/template` for XML content offers no automatic escaping.
💡 SuggestionExplicitly document that only system-generated, trusted variables should be used in inline templates. If there is ever a need to include data from the original request or response, ensure it is rigorously sanitized and escaped before being added to the template context. For XML templates using `text/template`, perform manual XML escaping on any dynamic data to prevent content injection.

Security Issues (2)

Severity Location Issue
🟠 Error apidef/error_overrides.go:70-78
The `MessagePattern` field in `ErrorMatcher` is compiled using Go's standard `regexp` package, which is known to be vulnerable to Regular Expression Denial of Service (ReDoS) attacks. A malicious or poorly crafted regex pattern in an API definition could lead to excessive CPU consumption and service degradation when matching against response bodies. The `regexp.Compile` call on line 72 uses this vulnerable engine.
💡 SuggestionTo mitigate the risk of ReDoS, replace the standard `regexp` package with a library that uses a linear-time regex engine, such as Google's RE2. The project already has a `github.com/TykTechnologies/tyk/regexp` package that wraps `regexp`, which could be modified to wrap `github.com/wasilibs/go-re2` or a similar safe alternative. This would provide protection against catastrophic backtracking without requiring changes to the rest of the code that uses the `regexp` wrapper.
🟡 Warning apidef/error_overrides.go:91
The `Body` field in `ErrorResponse` can be used as an inline template. If this template were to include user-controllable data, it could be vulnerable to Cross-Site Scripting (XSS) or other injection attacks. While the current implementation appears to only use safe, system-generated values like `{{.StatusCode}}` and `{{.Message}}`, this creates a potential risk if the feature is extended in the future to include data from the original request or response body in the template context. The use of `html/template` for non-XML content provides good protection, but `text/template` for XML content offers no automatic escaping.
💡 SuggestionExplicitly document that only system-generated, trusted variables should be used in inline templates. If there is ever a need to include data from the original request or response, ensure it is rigorously sanitized and escaped before being added to the template context. For XML templates using `text/template`, perform manual XML escaping on any dynamic data to prevent content injection.
\n\n ### ✅ Architecture Check Passed

No architecture issues found – changes LGTM.

✅ Performance Check Passed

No performance issues found – changes LGTM.

Quality Issues (2)

Severity Location Issue
🟠 Error apidef/error_overrides.go:35-41
The template selection logic uses `html/template` for non-XML responses, which is incorrect for JSON APIs. `html/template` performs context-aware escaping (e.g., escaping `&` to `&amp;` and `"` to `&#34;`) to prevent XSS in HTML. When generating JSON, this will result in corrupted data for API clients if template variables contain special characters that are valid in JSON strings but have significance in HTML. JSON responses with templated values should be generated using `text/template` to ensure raw string values are produced, and ideally with a mechanism to properly escape for JSON context.
💡 SuggestionThe logic should be changed to use `text/template` for both XML and JSON responses when using inline body templates. The distinction between `compiledBodyTmpl` and `compiledBodyTmplHTML` seems incorrect for this purpose, as `html/template` is not suitable for generating JSON. A single `compiledBodyTmpl` of type `*text/template.Template` should be used for all text-based content types, including JSON and XML. If HTML responses are a specific requirement, content-type detection should be more specific than a simple boolean `isXML`.
🟡 Warning apidef/api_definitions.go:1611
The `DummyAPI` function includes a sample `ErrorOverride` that matches on status code "400" and error flag "RLT" (RateLimitExceeded). Rate limit errors in Tyk typically result in a 429 status code. This configuration is confusing and unrealistic, which might mislead developers or cause issues in tests that rely on `DummyAPI` for a baseline configuration.
💡 SuggestionUpdate the dummy data to reflect a more realistic scenario. For an override matching the "RLT" flag, the status code key should be "429" instead of "400" to align with standard HTTP practices for rate limiting.

Powered by Visor from Probelabs

Last updated: 2026-04-20T15:49:11.453Z | Triggered by: pr_updated | Commit: 0e739a5

💡 TIP: You can chat with Visor using /visor ask <your question>

Copy link
Copy Markdown
Contributor

@andyo-tyk andyo-tyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed some tweaks to the wording

Comment thread config/config.go Outdated
Comment thread config/config.go Outdated
Comment thread config/config.go Outdated
Comment thread config/config.go
MFCaballero and others added 2 commits March 12, 2026 09:41
Co-authored-by: andyo-tyk <99968932+andyo-tyk@users.noreply.github.com>
Co-authored-by: andyo-tyk <99968932+andyo-tyk@users.noreply.github.com>
@MFCaballero MFCaballero requested a review from andyo-tyk March 12, 2026 08:58
Copy link
Copy Markdown
Contributor

@edsonmichaque edsonmichaque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

MFCaballero and others added 4 commits March 12, 2026 16:51
…mport cycling (#7893)

<!-- Provide a general summary of your changes in the Title above -->

## Description

<!-- Describe your changes in detail -->

## Related Issue

<!-- This project only accepts pull requests related to open issues. -->
<!-- If suggesting a new feature or change, please discuss it in an
issue first. -->
<!-- If fixing a bug, there should be an issue describing it with steps
to reproduce. -->
<!-- OSS: Please link to the issue here. Tyk: please create/link the
JIRA ticket. -->

## Motivation and Context

<!-- Why is this change required? What problem does it solve? -->

## How This Has Been Tested

<!-- Please describe in detail how you tested your changes -->
<!-- Include details of your testing environment, and the tests -->
<!-- you ran to see how your change affects other areas of the code,
etc. -->
<!-- This information is helpful for reviewers and QA. -->

## Screenshots (if appropriate)

## Types of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test
coverage to functionality)

## Checklist

<!-- Go over all the following points, and put an `x` in all the boxes
that apply -->
<!-- If there are no documentation updates required, mark the item as
checked. -->
<!-- Raise up any additional concerns not covered by the checklist. -->

- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning
why it's required
- [ ] I would like a code coverage CI quality gate exception and have
explained why

Co-authored-by: Vlad Zabolotnyi <vlad.z@tyk.io>
<!-- Provide a general summary of your changes in the Title above -->

## Description

<!-- Describe your changes in detail -->
This PR adds integration tests for the error overrides feature. Depends
on #7867
CI tests have been tested against master based and are passing
Base infra taken from @tbuchaillot
https://github.com/tbuchaillot/test-access-logs
## Related Issue

<!-- This project only accepts pull requests related to open issues. -->
<!-- If suggesting a new feature or change, please discuss it in an
issue first. -->
<!-- If fixing a bug, there should be an issue describing it with steps
to reproduce. -->
<!-- OSS: Please link to the issue here. Tyk: please create/link the
JIRA ticket. -->

## Motivation and Context

<!-- Why is this change required? What problem does it solve? -->

## How This Has Been Tested

<!-- Please describe in detail how you tested your changes -->
<!-- Include details of your testing environment, and the tests -->
<!-- you ran to see how your change affects other areas of the code,
etc. -->
<!-- This information is helpful for reviewers and QA. -->

## Screenshots (if appropriate)

## Types of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test
coverage to functionality)

## Checklist

<!-- Go over all the following points, and put an `x` in all the boxes
that apply -->
<!-- If there are no documentation updates required, mark the item as
checked. -->
<!-- Raise up any additional concerns not covered by the checklist. -->

- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning
why it's required
- [ ] I would like a code coverage CI quality gate exception and have
explained why






<!---TykTechnologies/jira-linter starts here-->

### Ticket Details

<details>
<summary>
<a href="https://tyktech.atlassian.net/browse/TT-16775" title="TT-16775"
target="_blank">TT-16775</a>
</summary>

|         |    |
|---------|----|
| Status  | Merge |
| Summary | [1b] Testing Centralised ErrorOverrides Infrastructure |

Generated at: 2026-03-17 16:18:07

</details>

<!---TykTechnologies/jira-linter ends here-->
@MFCaballero MFCaballero requested a review from a team as a code owner March 31, 2026 09:27
MFCaballero and others added 4 commits March 31, 2026 11:28
<!-- Provide a general summary of your changes in the Title above -->

## Description

<!-- Describe your changes in detail -->
Upstream errors override implementation 
# Error Override Middleware Performance Benchmarks

## Executive Summary

The upstream error override middleware adds **negligible performance
overhead**:

- **Success responses**: ~0.82 ns/op overhead (sub-nanosecond status
code check)
- **No overrides configured**: ~0.84 ns/op overhead (fast path with map
length check)
- **Error responses**: ~0.83 ns/op (all fast paths are essentially
identical)
- **Error with no match**: ~19 ns/op (fast rejection)
- **Error with exact match**: ~77 ns/op (includes map lookup and result
creation)
- **Error with body inspection**: ~292 ns/op (includes JSON parsing)

**Optimization:** The middleware checks `statusCode >= 400` and
`len(ErrorOverrides) > 0` before any processing, ensuring zero impact on
success responses and deployments without overrides.

## Detailed Results

### 1. Fast Path Performance

Testing the critical fast path checks that protect the hot path
(averaged over 10 runs):

```
BenchmarkShouldProcessResponse/fast_path_success                  0.82 ns/op       0 B/op       0 allocs/op
BenchmarkShouldProcessResponse/fast_path_no_config                0.84 ns/op       0 B/op       0 allocs/op
BenchmarkShouldProcessResponse/error_response                     0.83 ns/op       0 B/op       0 allocs/op
```

**Key Findings:**
- **All three paths are essentially identical**: 0.82-0.84 ns -
differences are measurement noise
- **Sub-nanosecond overhead**: Effectively immeasurable in production
- **Zero allocations** on all paths
- **CPU-optimized**: Branch prediction and L1 cache make the check
nearly free
- The function performs two integer comparisons (status >= 400, len > 0)
with short-circuit evaluation
- Modern CPUs execute this in a fraction of a nanosecond

**Note on Variance**: At sub-nanosecond scales, individual measurements
can vary by ±0.2 ns due to CPU scheduling, cache effects, and branch
prediction. Statistical averages over multiple runs show all paths
perform identically.

### 2. Lazy Body Reader

Testing the lazy body reading mechanism that defers I/O until needed:

```
BenchmarkLazyBodyReader/no_read                                    7.8 ns/op       0 B/op       0 allocs/op
BenchmarkLazyBodyReader/small_body                               660.0 ns/op     600 B/op       4 allocs/op
BenchmarkLazyBodyReader/large_body                              9907  ns/op   17496 B/op      10 allocs/op
BenchmarkLazyBodyReader/cached_read                                6.4 ns/op       0 B/op       0 allocs/op
BenchmarkLazyBodyReader/restore_body                             863.6 ns/op     720 B/op       8 allocs/op
```

**Key Findings:**
- Creating reader has no measurable cost (7.8 ns, 0 allocs)
- Body only read when rule requires inspection
- Cached reads enable multiple rule checks without re-reading (6.4 ns)
- Large bodies respect maxBodySizeForMatching limit (16 KB)
- Restore mechanism preserves full body with minimal overhead

### 3. ApplyUpstreamOverride - Core Matching

Testing the core matching logic for upstream error responses:

```
BenchmarkApplyUpstreamOverride/no_match                           18.9 ns/op       0 B/op       0 allocs/op
BenchmarkApplyUpstreamOverride/exact_match_no_body                77.0 ns/op      32 B/op       1 allocs/op
BenchmarkApplyUpstreamOverride/pattern_match_5xx                  78.2 ns/op      32 B/op       1 allocs/op
BenchmarkApplyUpstreamOverride/URS_flag                           84.4 ns/op      32 B/op       1 allocs/op
BenchmarkApplyUpstreamOverride/body_field_match                  292.4 ns/op      48 B/op       2 allocs/op
BenchmarkApplyUpstreamOverride/message_pattern_match             261.2 ns/op      32 B/op       1 allocs/op
BenchmarkApplyUpstreamOverride/multiple_rules_first_match         68.7 ns/op      32 B/op       1 allocs/op
BenchmarkApplyUpstreamOverride/multiple_rules_last_match          92.6 ns/op      32 B/op       1 allocs/op
```

**Key Findings:**
- **No match**: 18.9 ns with zero allocations (fast rejection)
- **Exact status code match** (e.g., "503"): 77.0 ns (O(1) hash map
lookup)
- **Pattern match** (e.g., "5xx"): 78.2 ns (prefix calculation + map
lookup)
- **URS flag matching**: 84.4 ns (integer range check for 500-599)
- **Body field matching**: 292.4 ns (JSON path extraction adds overhead)
- **Regex pattern matching**: 261.2 ns (pre-compiled patterns)
- **Multiple rules**: ~24 ns difference between first and last match
(~24 ns per rule iteration)

### 4. CompiledErrorOverrides - Direct Map Access

Testing the optimized compiled structure with direct map lookups:

```
BenchmarkCompiledErrorOverrides/exact_code_lookup                  5.9 ns/op       0 B/op       0 allocs/op
BenchmarkCompiledErrorOverrides/prefix_lookup                      6.4 ns/op       0 B/op       0 allocs/op
BenchmarkCompiledErrorOverrides/no_match                           6.9 ns/op       0 B/op       0 allocs/op
```

**Key Findings:**
- O(1) exact match check: 5.9 ns (direct map lookup)
- Prefix check: 6.4 ns (includes prefix calculation from status code)
- No match: 6.9 ns (checks both exact and prefix maps)
- Zero allocations enable efficient early rejection
- Compiled structure eliminates runtime parsing overhead

### 5. MatchesUpstreamCriteria

Testing different matching criteria types:

```
BenchmarkMatchesUpstreamCriteria/no_criteria                       5.6 ns/op       0 B/op       0 allocs/op
BenchmarkMatchesUpstreamCriteria/URS_flag                          6.0 ns/op       0 B/op       0 allocs/op
BenchmarkMatchesUpstreamCriteria/body_field_small_JSON           193.6 ns/op      16 B/op       1 allocs/op
BenchmarkMatchesUpstreamCriteria/body_field_large_JSON          1576  ns/op       8 B/op       1 allocs/op
BenchmarkMatchesUpstreamCriteria/message_pattern_simple          175.4 ns/op       0 B/op       0 allocs/op
BenchmarkMatchesUpstreamCriteria/message_pattern_complex         205.1 ns/op       0 B/op       0 allocs/op
```

**Key Findings:**
- **No criteria** (match all): 5.6 ns
- **URS flag** (5xx check): 6.0 ns - simplest semantic matching
- **Small JSON body field**: 193.6 ns - gjson path extraction
- **Large JSON body field**: 1576 ns - deeper nesting increases overhead
- **Simple regex**: 175.4 ns - pre-compiled pattern
- **Complex regex**: 205.1 ns - alternation and capture groups
- Zero allocations for flag and regex matching

### 6. Rule Matching Scalability

Performance with varying rule counts (worst case: matching last rule):

```
BenchmarkFindMatchingRuleGeneric/10_rules                         60.3 ns/op       0 B/op       0 allocs/op
BenchmarkFindMatchingRuleGeneric/50_rules                        329.1 ns/op       0 B/op       0 allocs/op
BenchmarkFindMatchingRuleGeneric/100_rules                       565.6 ns/op       0 B/op       0 allocs/op
```

**Key Findings:**
- Linear scaling: ~5.4 ns per rule
- Zero allocations regardless of rule count
- 10 rules (typical): 60.3 ns
- 50 rules (large): 329.1 ns
- 100 rules (extreme): 565.6 ns
- First-match semantics: place frequent rules first

### 7. End-to-End Middleware Performance

Full middleware execution (includes HTTP response handling overhead):

```
BenchmarkHandleResponse/no_override_passthrough                 1310  ns/op     672 B/op       9 allocs/op
BenchmarkHandleResponse/success_response_skip                    207.4 ns/op     136 B/op       4 allocs/op
BenchmarkHandleResponse/exact_match_status_only                 1727  ns/op    1216 B/op      18 allocs/op
BenchmarkHandleResponse/exact_match_with_body                   3531  ns/op    1232 B/op      19 allocs/op
BenchmarkHandleResponse/pattern_match_small_body                2595  ns/op    1776 B/op      21 allocs/op
BenchmarkHandleResponse/pattern_match_large_body               13394  ns/op   29171 B/op      21 allocs/op
```

**Note:** These times include benchmark setup overhead (HTTP response
object creation). Actual middleware overhead is shown in fast path
benchmarks (0.82 ns).

**Key Findings:**
- Success response: 207.4 ns total (middleware: 0.82 ns, rest: test
harness)
- Error passthrough: 1310 ns
- Override application: ~1.7-3.5 μs for status/body changes
- Large body: 13.4 μs (dominated by I/O)

### 8. Real-World Scenarios

Production workload simulations:

```
BenchmarkRealWorld/high_traffic_no_override                      329.7 ns/op     125 B/op       4 allocs/op
  (99% success, 1% errors without matching rules)

BenchmarkRealWorld/high_traffic_with_override                    270.5 ns/op     141 B/op       4 allocs/op
  (98% success, 2% errors with matching overrides)

BenchmarkRealWorld/complex_ruleset                              1660  ns/op    1201 B/op      18 allocs/op
  (10 error codes, multiple rules, mixed traffic)
```

**Key Findings:**
- Typical API (1% errors, no override): 329.7 ns average
- With error overrides (2% errors): 270.5 ns average
- Complex configuration: 1.66 μs

**Production Impact Analysis:**

For typical API handling **10,000 req/sec**:

**Scenario 1: No Overrides Configured**
```
10,000 req/sec × 0.84 ns = 8.4 μs/sec = 0.0008% CPU
```

**Scenario 2: 99% Success, 1% Errors**
```
Success:  9,900 req/sec × 0.82 ns = 8.1 μs/sec
Errors:     100 req/sec × 270 ns = 27 μs/sec
Total:    35.1 μs/sec = 0.004% CPU
```

**Scenario 3: High Error Rate (10% errors)**
```
Success:  9,000 req/sec × 0.82 ns = 7.4 μs/sec
Errors:   1,000 req/sec × 270 ns = 270 μs/sec
Total:    277 μs/sec = 0.028% CPU
```

## Performance Impact Analysis

### Hot Path (Every Request)

When **no overrides are configured** (common case):
- Overhead: **0.84 ns** per request
- Memory: 0 bytes allocated
- **Impact: Zero** - immeasurable in production

When **overrides are configured**:
- Success responses: **0.82 ns** (status code check only)
- Error with exact match: **77 ns** (O(1) map lookup)
- Error with pattern match: **78 ns** (prefix + map lookup)
- Error with body inspection: **292 ns** (includes JSON parsing)

### Cold Path (Gateway Startup)

Compilation overhead (one-time at startup):
- Tested via `CompileErrorOverrides` function
- Simple rules: ~1-2 μs
- With regex patterns: ~5-10 μs
- **Impact: Negligible** - happens once

## Memory Allocation Analysis

Memory allocations per operation:

```
Operation                      Allocations    Bytes
----------------------------------------------------
Success response check         0 allocs       0 B
No config check                0 allocs       0 B
No matching rule               0 allocs       0 B
Exact code match               1 alloc       32 B
Pattern match (5xx)            1 alloc       32 B
URS flag match                 1 alloc       32 B
Body field match               2 allocs      48 B
Regex pattern match            1 alloc       32 B
Body read (small)              4 allocs     600 B
Body read (large)             10 allocs  17 KB
```

**Key Findings:**
- Zero allocations on all fast/rejection paths
- Single 32B allocation for matched overrides
- Lazy body reading prevents unnecessary allocations
- No memory leaks or unbounded growth

## Scalability Considerations

### Rule Count Impact
- 10 rules: 60.3 ns (typical configuration)
- 50 rules: 329.1 ns (large configuration)
- 100 rules: 565.6 ns (extreme configuration)
- Linear scaling: 5.4 ns per additional rule

### Body Size Impact
- Small bodies (< 1 KB): ~660 ns read time
- Large bodies (16 KB): ~9.9 μs read time
- Respects `maxBodySizeForMatching` limit
- Lazy reading: only when rule requires it

## Conclusions

**Virtually zero overhead when disabled**: 0.84 ns (map length check) -
immeasurable in production

**Sub-nanosecond fast paths**: All paths (success, no-config, error)
have identical overhead (~0.83 ns)

**Efficient matching**: O(1) status code lookups (exact: 77 ns, pattern:
78 ns)

**URS flag is fastest semantic matching**: 6.0 ns for simple 5xx range
check

**Body inspection adds reasonable overhead**: 194-1576 ns depending on
JSON complexity

**Scalable**: Linear performance up to 100+ rules with 5.4 ns per rule

**Memory efficient**: Zero allocations on fast paths, single allocation
(32B) for matches

**Production ready**: < 0.03% CPU impact even with 10% error rate and
full override processing

## Recommendations

For **best performance**:
1. Use URS flag for 5xx matching (6.0 ns - fastest semantic match)
2. Use exact status codes for specific errors (77.0 ns)
3. Use pattern matching (5xx) for broad categories (78.2 ns)
4. Place frequently matched rules first (saves ~5 ns per rule skipped)
5. Minimize body inspection when possible (adds ~200-300 ns)

For **body inspection** (when you need to match on response content):

**Use regex patterns when:**
- Large JSON responses or deeply nested structures
- Body size > 1 KB or nesting depth > 2-3 levels
- Performance is critical (regex: 175-205 ns regardless of JSON size)
- Matching error messages or text patterns
- Example: `"message_pattern": "database.*unavailable"`

**Use body field matching when:**
- Small JSON responses (< 1 KB) with shallow structure
- Need precise field extraction (e.g., `error.code == "TIMEOUT"`)
- Fields are at root or 1-2 levels deep
- Performance: 194 ns for small/shallow JSON, but degrades to 1576 ns
for large/nested JSON
- Example: `"body_field": "error.code", "body_value": "TIMEOUT"`

**Performance comparison:**
```
Small JSON (< 1 KB, shallow):
  - Body field: 193.6 ns  ≈  Regex: 175-205 ns  (comparable)

Large/nested JSON:
  - Body field: 1576 ns   vs  Regex: 175-205 ns  (regex 8x faster!)
```

For **optimal flexibility**:
1. Use URS flag for semantic upstream error matching
2. Use regex patterns for most body matching (consistent performance)
3. Use body field matching only for small JSON with shallow fields
4. Use status code + criteria combinations for precise matching
5. Both exact and pattern (4xx/5xx) matching are very efficient

---

**Test Environment:**
- Machine: Apple M1 Pro
- Go Version: 1.25.1
- OS: macOS (darwin/arm64)
- Benchmark Duration: 1s per benchmark (with 10 runs for fast paths)
- Total Benchmarks: 39
- Run Date: 2026-03-16

## Related Issue

<!-- This project only accepts pull requests related to open issues. -->
<!-- If suggesting a new feature or change, please discuss it in an
issue first. -->
<!-- If fixing a bug, there should be an issue describing it with steps
to reproduce. -->
<!-- OSS: Please link to the issue here. Tyk: please create/link the
JIRA ticket. -->

## Motivation and Context

<!-- Why is this change required? What problem does it solve? -->

## How This Has Been Tested

<!-- Please describe in detail how you tested your changes -->
<!-- Include details of your testing environment, and the tests -->
<!-- you ran to see how your change affects other areas of the code,
etc. -->
<!-- This information is helpful for reviewers and QA. -->

## Screenshots (if appropriate)

## Types of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test
coverage to functionality)

## Checklist

<!-- Go over all the following points, and put an `x` in all the boxes
that apply -->
<!-- If there are no documentation updates required, mark the item as
checked. -->
<!-- Raise up any additional concerns not covered by the checklist. -->

- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning
why it's required
- [ ] I would like a code coverage CI quality gate exception and have
explained why


<!---TykTechnologies/jira-linter starts here-->

### Ticket Details

<details>
<summary>
<a href="https://tyktech.atlassian.net/browse/TT-16772" title="TT-16772"
target="_blank">TT-16772</a>
</summary>

|         |    |
|---------|----|
| Status  | In Code Review |
| Summary | [2] Implement Upstream Error Response Overrides |

Generated at: 2026-03-17 15:57:40

</details>

<!---TykTechnologies/jira-linter ends here-->

---------

Co-authored-by: Vlad Zabolotnyi <109525963+vladzabolotnyi@users.noreply.github.com>
Co-authored-by: Vlad Zabolotnyi <vlad.z@tyk.io>
Co-authored-by: Leonid Bugaev <leonsbox@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: andrei-tyk <97896463+andrei-tyk@users.noreply.github.com>
Co-authored-by: Laurentiu <6229829+lghiur@users.noreply.github.com>
@MFCaballero MFCaballero requested a review from a team as a code owner April 7, 2026 09:44
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Swagger Changes

     _        __  __
    ErrorMatcher:
    ErrorOverride:
    ErrorResponse:
    error_overrides:
    error_overrides_disabled:
   _| |_   _ / _|/ _|  between swagger-prev.yml
  + three map entries added:
  + two map entries added:
 / _' | | | | |_| |_       and swagger-current.yml
 \__,_|\__, |_| |_|   returned two differences
components.schemas
components.schemas.APIDefinition.properties
| (_| | |_| |  _|  _|

Copy link
Copy Markdown

@bojank93 bojank93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MFCaballero can you please merge PR ?
Thank you in advance

@github-actions
Copy link
Copy Markdown
Contributor

🚨 Jira Linter Failed

Commit: 0e739a5
Failed at: 2026-04-20 15:48:08 UTC

The Jira linter failed to validate your PR. Please check the error details below:

🔍 Click to view error details
failed to get Jira issue: failed to fetch Jira issue TT-16767: Issue does not exist or you do not have permission to see it.: request failed. Please analyze the request body for more details. Status code: 404

Next Steps

  • Ensure your branch name contains a valid Jira ticket ID (e.g., ABC-123)
  • Verify your PR title matches the branch's Jira ticket ID
  • Check that the Jira ticket exists and is accessible

This comment will be automatically deleted once the linter passes.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
91.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

@MFCaballero MFCaballero enabled auto-merge (squash) April 20, 2026 16:13
@MFCaballero MFCaballero merged commit a921174 into master Apr 20, 2026
43 of 60 checks passed
@MFCaballero MFCaballero deleted the TT-16767 branch April 20, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants