Skip to content

[feature] Network Security Controls for Scrapling MCP #293

@ICO-Project-team

Description

@ICO-Project-team

Have you searched if there an existing feature request for this?

  • I have searched the existing requests

Feature description


Title: Add Network Security Controls (allowed-origins, blocked-origins, allowed-hosts, isolated mode) to Scrapling MCP Server


Description:

Scrapling MCP is a powerful web scraping tool with excellent built-in protections against prompt injection, ad-blocking, and cloudflare bypass. However, it currently lacks critical network security controls that are essential for production deployments, especially when integrated with AI platforms like Open WebUI, Claude Desktop, or other MCP-compatible clients.

Without these controls, the browser instance can access any URL, including internal networks, cloud metadata endpoints, and sensitive infrastructure. This creates significant risks in enterprise environments, multi-tenant deployments, or scenarios where the MCP server is exposed over HTTP.


Security Risks Without Network Controls

  1. Internal Network Access: An attacker via prompt injection could instruct the scraper to access internal services at 10.x.x.x, 192.168.x.x, or 172.16-31.x.x
  2. Cloud Metadata Theft: Access to 169.254.169.254 can leak AWS/GCP/Azure credentials and IAM roles
  3. SSRF Attacks: Server-Side Request Forgery by scraping internal dashboards, APIs, or configuration endpoints
  4. Unauthorized Service Discovery: Scanning and scraping internal management interfaces

Proposed Features

1. --allowed-origins (Whitelist Mode)

Restrict the browser to only access explicitly permitted domains or patterns.

scrapling mcp --allowed-origins "https://example.com,https://docs.example.com"

Behavior:

  • Only requests to matching origins are allowed
  • All other requests are aborted automatically
  • Supports glob patterns: https://*.example.com

Reference Implementation: Microsoft Playwright MCP already implements this via --allowed-origins flag.


2. --blocked-origins (Blacklist Mode)

Block specific origins while allowing all others.

scrapling mcp --blocked-origins "10.*,192.168.*,172.16.*,169.254.*,*.internal,*.local"

Behavior:

  • Requests to matching origins are blocked
  • Useful when you need broad access but want to exclude dangerous ranges

Common default blocked patterns should include:

10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
169.254.0.0/16  (cloud metadata)
127.0.0.0/8
localhost
*.internal
*.local
*.lan

3. --allowed-hosts (Server-Level Host Verification)

Restrict which clients can connect to the MCP server (especially important for HTTP mode).

scrapling mcp --http --host 0.0.0.0 --port 8000 \
  --allowed-hosts "localhost,127.0.0.1,10.0.0.0/8"

Behavior:

  • Validates the Host header or client IP against the whitelist
  • Prevents unauthorized remote connections to the MCP server
  • DNS rebinding protection

Currently, HTTP mode binds to 0.0.0.0 by default with no client verification.


4. --isolated Mode

Run the browser with a fresh, ephemeral profile that is discarded on exit.

scrapling mcp --isolated

Behavior:

  • No persistent cookies, cache, or storage between sessions
  • Prevents data leakage between different user sessions
  • Mitigates cookie/session hijacking if the MCP server is compromised

5. Optional: --proxy-server and --proxy-bypass

Support routing browser traffic through an upstream proxy for centralized filtering.

scrapling mcp --proxy-server "http://squid:3128" \
  --proxy-bypass "localhost,<local>"

This allows organizations to enforce network policies at the proxy layer rather than relying solely on application-level controls.


Suggested Configuration File Support

In addition to CLI flags, support a configuration file for easier deployment:

{
  "network": {
    "allowedOrigins": [
      "https://example.com",
      "https://www.example.com"
    ],
    "blockedOrigins": [
      "10.*",
      "192.168.*",
      "172.16.*",
      "169.254.*",
      "*.internal",
      "*.local"
    ]
  },
  "server": {
    "allowedHosts": ["localhost", "127.0.0.1"]
  },
  "browser": {
    "isolated": true,
    "proxy": {
      "server": "http://proxy:3128",
      "bypass": "<local>"
    }
  }
}

Used via:

scrapling mcp --config mcp-security.json

Why This Matters

Scenario | Risk Level | Impact -- | -- | -- Prompt injection attack | High | Attacker redirects scraper to internal services Multi-tenant AI platform | High | One tenant accesses another's internal network Cloud metadata exposure | Critical | IAM credentials leaked, full account compromise Enterprise deployment | Medium | Policy violations, data exfiltration

Happy to discuss implementation approaches or contribute to a PR if helpful. This would significantly improve the security posture of Scrapling MCP for production and enterprise deployments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions