Skip to content

Conversation

@LaurenceJJones
Copy link
Member

@LaurenceJJones LaurenceJJones commented Sep 25, 2025

Add AppSec (WAF) Integration Support

This PR adds integration with CrowdSec's AppSec engine, enabling Web Application Firewall (WAF) functionality alongside existing remediation mechanisms (ban, captcha).

Feature Overview

The bouncer now forwards HTTP requests to the CrowdSec AppSec engine for additional validation. AppSec can override decisions made by LAPI, providing more granular WAF-based protection. The integration supports both global and per-host AppSec configuration.

Key Changes

AppSec Component (internal/appsec/root.go)

  • New AppSecClient with optimized HTTP transport (keep-alive, connection pooling)
  • ValidateRequest() forwards requests to AppSec engine with full request context
  • Supports per-host appsec_url and api_key configuration (with global fallback)
  • Processes AppSec responses and maps status codes to remediation actions
  • 5-second timeout with context support to prevent HAProxy connection resets

We should really look to change the underlying spop implementation since appsec increases the processing time, we hit timeout quite easily

SPOA Handler Integration (pkg/spoa/root.go)

  • Integrated AppSec validation into existing remediation flow
  • Reuses parsed HTTP data (URL, method, headers, body) from captcha/ban handling
  • Extracts User-Agent from HAProxy headers (req.hdrs) instead of separate SPOE message
  • Only sends requests to AppSec if not already banned/captcha'd (unless always_send: true)
  • Handles HTTP request line parsing in headers (normalizes \r\n, skips request line)

Configuration (pkg/cfg/config.go)

  • Added global appsec_url configuration option
  • AppSec shares the same api_key as LAPI by default (can be overridden per-host)

Infrastructure

  • Vagrantfile: Complete testing environment with CrowdSec, AppSec collections, HAProxy, and Nginx
  • Docker Compose: Added AppSec collections and acquisition configuration
  • HAProxy configs: Added http-buffer-request option for body capture

Bug Fixes & Improvements

  • Fixed User-Agent extraction from HAProxy headers (handles \r\n line endings correctly)
  • Increased HAProxy SPOA timeout processing to 6s to accommodate AppSec validation
  • Improved header parsing to skip HTTP request line when present
  • Optimized connection reuse via keep-alive and response body draining
  • Simplified header copying using maps.Clone()
  • Cleaned up excessive debug logging

Configuration Example

# Global AppSec URL (optional)
appsec_url: http://127.0.0.1:7422

hosts:
  - host: "*"
    appsec:
      always_send: false  # Only validate if not already banned/captcha'd
      # url: http://custom-appsec:7422  # Optional per-host override
      # api_key: custom-key  # Optional per-host override

Testing

The Vagrantfile provides a complete test environment with:

  • CrowdSec with AppSec collections installed
  • HAProxy with SPOA configured
  • Nginx backend server
  • Automated provisioning for end-to-end testing

@LaurenceJJones LaurenceJJones linked an issue Sep 30, 2025 that may be closed by this pull request
@LaurenceJJones LaurenceJJones added this to the 0.3.0 milestone Oct 10, 2025
- Add AppSec validation to SPOA request handler
- Integrate AppSec component with global URL/API key config
- Add AppSec collections and acquisition setup for docker-compose
- Update HAProxy configs with http-buffer-request option
- Add Vagrantfile for testing environment
- Update bouncer config to support AppSec per-host configuration
- Fix readHeaders to properly handle CRLF line endings from HAProxy req.hdrs
- Extract User-Agent from parsed headers instead of separate SPOE message arg
- Only parse HTTP data when necessary (not for Allow/Ban unless AppSec always_send)
- Move debug logging in createAppSecRequest to show actual headers being sent
- Remove excessive logging from parseHTTPData function
- Clean up verbose debug logs for cookie clearing and redirects
- Check io.Copy return value (errcheck)
- Remove unused logger parameter from parseHTTPData (unparam)
- Remove duplicate User-Agent logging from debug log
- Access req fields directly in debug log instead of reading from headers
Track metrics in SPOA handler where we already have the IP as netip.Addr
type, avoiding unnecessary string-to-IP parsing. This is more efficient
and cleaner than parsing the IP from string in the AppSec component.
@LaurenceJJones LaurenceJJones changed the title WIP: AppSec integration implementation feat: AppSec integration Nov 22, 2025
Copilot finished reviewing on behalf of LaurenceJJones November 23, 2025 07:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive AppSec (Web Application Firewall) integration to the CrowdSec SPOA bouncer, enabling additional layer of protection beyond IP-based remediation. The implementation follows a fail-open pattern, optimizes for performance with connection pooling, and supports both global and per-host AppSec configuration.

Key Changes

  • AppSec Client Implementation: New HTTP client with optimized connection pooling (keep-alive, 5s timeout) that validates requests against CrowdSec's AppSec engine via custom X-Crowdsec-Appsec-* headers
  • SPOA Handler Integration: AppSec validation integrated into remediation flow with conditional execution (skips if already banned/captcha'd unless always_send: true), reuses parsed HTTP data from captcha handling
  • Infrastructure Setup: Complete test environments added via Vagrantfile and Docker Compose with AppSec collections, acquisition configs, and HAProxy body buffering enabled

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
internal/appsec/root.go New AppSec client with request validation, response processing, and HTTP transport configuration with connection pooling
pkg/spoa/root.go AppSec integration into HTTP request handling flow with data reuse, improved header parsing with HTTP request line detection, and metrics tracking
pkg/host/root.go Global AppSec configuration management with per-host override support
pkg/cfg/config.go Global appsec_url and api_key configuration fields
cmd/root.go Global AppSec configuration initialization in host manager
docker-compose.yaml, docker-compose.proxy-test.yaml AppSec collections installation and acquisition configuration mounting
docker/crowdsec/acquisitions/appsec.yaml AppSec component acquisition config listening on port 7422
docker/crowdsec/README.md Documentation for AppSec collections and manual installation
config/haproxy.cfg, config/haproxy-upstreamproxy.cfg Added http-buffer-request option and removed non-existent SPOA server
Vagrantfile Complete test environment provisioning with CrowdSec, AppSec, HAProxy, and Nginx

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +857 to +870
// Skip the HTTP request line if present (first line that looks like "METHOD PATH HTTP/VERSION")
// HAProxy's req.hdrs may include the request line at the beginning
if i == 0 {
// Check if this looks like an HTTP request line (starts with HTTP method)
parts := strings.Fields(header)
if len(parts) >= 3 {
// Check if third part looks like HTTP version (e.g., "HTTP/1.1")
// Format: "METHOD PATH HTTP/VERSION"
if strings.HasPrefix(parts[2], "HTTP/") {
// This is the request line, skip it
continue
}
}
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP request line detection logic may incorrectly skip valid headers. The current check only verifies if the third field starts with "HTTP/", but this could match legitimate headers like "X-Custom: value HTTP/1.1 info". Consider adding additional validation to ensure this is truly an HTTP request line by:

  1. Verifying the first field matches a known HTTP method (GET, POST, PUT, DELETE, HEAD, OPTIONS, PATCH, TRACE, CONNECT)
  2. Checking that parts[2] exactly matches the pattern "HTTP/[0-9].[0-9]" (e.g., using strings.HasPrefix and length validation)

Example:

if len(parts) >= 3 && strings.HasPrefix(parts[2], "HTTP/") && len(parts[2]) == 8 {
    method := parts[0]
    if method == "GET" || method == "POST" || method == "PUT" || method == "DELETE" || 
       method == "HEAD" || method == "OPTIONS" || method == "PATCH" || method == "TRACE" || method == "CONNECT" {
        continue // This is the request line
    }
}

Copilot uses AI. Check for mistakes.

a.Client = &AppSecClient{
HTTPClient: &http.Client{
Timeout: 5 * time.Second,
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential timeout conflict: the HTTP client has a 5-second timeout configured (line 72), but the context passed to the request also has a 5-second timeout (line 390 in pkg/spoa/root.go). When both timeouts are set to the same value, it's unclear which will trigger first, and this could lead to inconsistent error messages or behavior. Consider either:

  1. Removing the HTTP client timeout and relying solely on the context timeout for consistency
  2. Setting the HTTP client timeout slightly higher than the context timeout (e.g., 6s) to ensure the context timeout is the primary mechanism

The context timeout is generally preferred for request-level control as it provides better cancellation semantics.

Suggested change
Timeout: 5 * time.Second,

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +143
// Determine HTTP method based on whether there's a body
if len(req.Body) > 0 {
httpReq, err = http.NewRequest(http.MethodPost, a.Client.URL, bytes.NewReader(req.Body))
} else {
httpReq, err = http.NewRequest(http.MethodGet, a.Client.URL, http.NoBody)
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP method used for the AppSec request is determined solely by whether the original request has a body (POST if body exists, GET otherwise). This ignores the actual HTTP method of the original request (req.Method). For example, a PUT or DELETE request with a body would be sent to AppSec as POST, and a POST request without a body would be sent as GET. This could lead to incorrect WAF analysis.

Consider using the original request's method instead:

method := http.MethodGet
if req.Method != "" {
    method = req.Method
}

var body io.Reader = http.NoBody
if len(req.Body) > 0 {
    body = bytes.NewReader(req.Body)
}

httpReq, err = http.NewRequest(method, a.Client.URL, body)

This ensures the AppSec engine receives the correct HTTP method for accurate request analysis.

Suggested change
// Determine HTTP method based on whether there's a body
if len(req.Body) > 0 {
httpReq, err = http.NewRequest(http.MethodPost, a.Client.URL, bytes.NewReader(req.Body))
} else {
httpReq, err = http.NewRequest(http.MethodGet, a.Client.URL, http.NoBody)
}
// Use the original HTTP method if present, default to GET
method := http.MethodGet
if req.Method != "" {
method = req.Method
}
var body io.Reader = http.NoBody
if len(req.Body) > 0 {
body = bytes.NewReader(req.Body)
}
httpReq, err = http.NewRequest(method, a.Client.URL, body)

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +166
// Copy original headers
if req.Headers != nil {
httpReq.Header = maps.Clone(req.Headers)
}

// Now override with our trusted CrowdSec headers
httpReq.Header.Set("X-Crowdsec-Appsec-Ip", req.RemoteIP)
httpReq.Header.Set("X-Crowdsec-Appsec-Uri", req.URL)
httpReq.Header.Set("X-Crowdsec-Appsec-Host", req.Host)
httpReq.Header.Set("X-Crowdsec-Appsec-Verb", req.Method)

// Ensure we have a valid API key
if a.Client.APIKey == "" {
a.logger.Error("AppSec API key is empty")
return nil, fmt.Errorf("AppSec API key is not configured")
}

httpReq.Header.Set("X-Crowdsec-Appsec-Api-Key", a.Client.APIKey)
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The AppSec API key check is performed after cloning potentially untrusted headers from the original request. If an attacker sends a request with a malicious X-Crowdsec-Appsec-Api-Key header, it will be included in the cloned headers (line 151) and then overwritten (line 166). While the final result is secure, this creates unnecessary processing.

Consider validating the API key earlier, before cloning headers:

// Ensure we have a valid API key first
if a.Client.APIKey == "" {
    a.logger.Error("AppSec API key is empty")
    return nil, fmt.Errorf("AppSec API key is not configured")
}

var httpReq *http.Request
var err error
// ... create request ...

// Copy original headers
if req.Headers != nil {
    httpReq.Header = maps.Clone(req.Headers)
}

This is a minor optimization that fails fast when misconfigured.

Copilot uses AI. Check for mistakes.
backend crowdsec-spoa
mode tcp
balance roundrobin
server s2 spoa:9000
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The removal of server s3 spoa:9001 is mentioned in the PR description as cleanup since port 9001 doesn't exist. However, this change appears in both haproxy.cfg and haproxy-upstreamproxy.cfg. While this is likely intentional cleanup, consider verifying that no other configuration references this server or that load balancing behavior doesn't unexpectedly change (though with balance roundrobin and only one server remaining, all traffic will go to s2).

Copilot uses AI. Check for mistakes.
}

// If remediation is ban/captcha we dont need to create a request to send to appsec unless always send is on
if r > remediation.Unknown && !matchedHost.AppSec.AlwaysSend {
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The condition r > remediation.Unknown will be true for Captcha and Ban remediation types. Based on the remediation type constants (Allow=0, Unknown=1, Captcha=2, Ban=3), this means when the initial remediation is Captcha, AppSec validation is skipped unless always_send is true. However, if a Captcha remediation becomes Allow after handleCaptchaRemediation (lines 327, 662), the condition r > remediation.Unknown would be false (since Allow=0), and AppSec validation would proceed. This appears to be intentional based on the comment, but the logic could be clearer. Consider using r >= remediation.Captcha to make the intent more explicit that you're checking for restrictive remediations (Captcha/Ban).

Suggested change
if r > remediation.Unknown && !matchedHost.AppSec.AlwaysSend {
if r >= remediation.Captcha && !matchedHost.AppSec.AlwaysSend {

Copilot uses AI. Check for mistakes.
kv := strings.SplitN(header, ":", 2)
if len(kv) != 2 {
return nil, fmt.Errorf("invalid header: %q", header)
// Skip lines without colon (might be continuation or malformed)
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The error handling was changed to silently continue when a header line doesn't contain a colon, but this could hide malformed headers. The original behavior of returning an error for invalid headers was more defensive. Consider at least logging a debug message when skipping malformed header lines to aid in debugging issues with HAProxy configuration:

if len(kv) != 2 {
    s.logger.WithField("header", header).Debug("Skipping malformed header line without colon separator")
    continue
}

Note: This comment assumes access to a logger, but since this is a standalone function, you may need to pass a logger parameter or use a package-level logger.

Suggested change
// Skip lines without colon (might be continuation or malformed)
// Skip lines without colon (might be continuation or malformed)
log.WithField("header", header).Debug("Skipping malformed header line without colon separator")

Copilot uses AI. Check for mistakes.
if version != nil {
appSecReq.Version = *version
}
if httpData.Body != nil {
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The body is being dereferenced and assigned directly to a []byte field, but httpData.Body is of type *[]byte. When httpData.Body is not nil but points to an empty slice, the current code assigns the slice correctly. However, the conditional check at line 384 (if httpData.Body != nil) should verify the actual slice length before including it in the AppSec request to avoid sending empty bodies unnecessarily. Consider:

if httpData.Body != nil && len(*httpData.Body) > 0 {
    appSecReq.Body = *httpData.Body
}

This ensures only non-empty bodies are sent to AppSec, potentially improving performance by avoiding unnecessary data transfer.

Suggested change
if httpData.Body != nil {
if httpData.Body != nil && len(*httpData.Body) > 0 {

Copilot uses AI. Check for mistakes.
appsec_config: crowdsecurity/appsec-default
labels:
type: appsec
listen_addr: 0.0.0.0:4241
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Port mismatch: The Vagrantfile configures AppSec to listen on port 4241 (line 110), but the Docker Compose files configure it to listen on port 7422 (docker-compose.yaml line 61, docker-compose.proxy-test.yaml line 69). This inconsistency could cause confusion when switching between test environments. Consider using the same port (7422) across all test environments for consistency, or document why different ports are used in different environments.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Will this replace the HAProxy bouncer?

2 participants