Enterprise-grade web search for AI β with compliance boundaries built in.
Palena is a single-binary MCP server that turns the noisy public web into LLM-ready context β searched, scraped, de-PII'd, reranked, and hash-chained for audit β so your agents can browse without tripping every compliance officer on the floor.
Built for fintech, healthtech, and govtech teams who cannot send raw third-party HTML directly to a foundation model, but still need their AI to have fresh, cited, reproducible knowledge from the live internet.
Palena β Hawaiian for boundary, limit, border.
| Problem | What most "web search" tools do | What Palena does |
|---|---|---|
| Raw pages leak PII into prompts | Ship HTML as-is | Presidio audit/redact/block before the LLM sees it |
| Bot-protected sites return nothing | Fail silently | Tiered L0βL1βL2 escalation (readability β headless β stealth+proxy) |
| "We scraped this" has no receipts | No trace | Three-stage SHA-256 hash chain per document |
| Results are ranked by a generic search engine | Trust the top 10 | Pluggable reranker (KServe GPU, FlashRank CPU, RankLLM, or none) |
| Compliance only gets shown a demo | Hope for the best | Domain allow/blocklists, robots.txt enforcement, per-domain rate limits, OTel traces on every span |
MCP tools/call web_search
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. SEARCH SearXNG metasearch β
β query expansion Β· engine routing Β· dedup β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2. POLICY domain allow/block Β· robots.txt Β· rate limit β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 3. SCRAPE L0 HTTP + readability (fast path) β
β L1 Playwright headless (needs JS) β
β L2 Playwright + stealth/proxy (anti-bot) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 4. PII Microsoft Presidio Β· audit / redact / block β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 5. RERANK FlashRank Β· KServe cross-encoder Β· RankLLM β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 6. FORMAT markdown Β· citations Β· SHA-256 provenance β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
MCP tool_result
Every stage is an OpenTelemetry span. Every document gets three hashes (raw HTML β extracted markdown β final delivered content). The audit trail is ready to ship to ClickHouse out of the box.
Real output from the MCP server answering "latest AI hot topics 2026" (category: news):
{
"query": "latest AI hot topics 2026",
"result_count": 3,
"reranker_used": "flashrank",
"pii_mode": "audit",
"pii_checked": true,
"search_engines": ["google news", "duckduckgo", "bing news"],
"total_duration_ms": 4414,
"results": [
{ "title": "OpenClaw Exposes the Real Cybersecurity Risks of Agentic AI",
"score": 0.984, "scraper_level": 0,
"content_hash": "c3b4d690β¦" },
{ "title": "Comprehensive AI Conference List for 2026: Dates, Locations, and Keynotes",
"score": 0.973, "scraper_level": 0,
"content_hash": "7af1205eβ¦" },
{ "title": "Can AI infrastructure costs be a value driver? β Oracle AI World 2026",
"score": 0.002, "scraper_level": 0,
"content_hash": "9e48c311β¦" }
]
}Note how FlashRank pushes the substantive pieces to the top (0.98 / 0.97) and demotes the vendor-conference writeup that happens to contain the right keywords (0.002) β signal the stock SearXNG ranking does not surface.
docker compose -f deploy/docker-compose.yml up --buildBrings up Palena + SearXNG + Presidio Analyzer + Presidio Anonymizer + Playwright + FlashRank.
docker compose -f deploy/docker-compose.minimal.yml up --buildPalena + SearXNG only. ~200 MB image, no browser, no PII, no reranking.
# Sidecar health
curl http://localhost:8080/health
# β {"status":"ok","sidecars":{"searxng":"ok","presidio-analyzer":"ok",...}}
# List the exposed tool over MCP Streamable HTTP
SESS=$(curl -sS -X POST http://localhost:8080/mcp \
-H 'content-type: application/json' \
-H 'accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"curl","version":"0.1"}}}' \
-D - -o /dev/null | awk '/Mcp-Session-Id/ {print $2}' | tr -d '\r')
curl -sS -X POST http://localhost:8080/mcp \
-H 'content-type: application/json' \
-H 'accept: application/json, text/event-stream' \
-H "Mcp-Session-Id: $SESS" \
-d '{"jsonrpc":"2.0","method":"notifications/initialized"}'
curl -sS -X POST http://localhost:8080/mcp \
-H 'content-type: application/json' \
-H 'accept: application/json, text/event-stream' \
-H "Mcp-Session-Id: $SESS" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list"}'# librechat.yaml
mcpServers:
palena:
type: streamableHttp # or: sse
url: http://palena:8080/mcp{
"mcpServers": {
"palena": { "url": "http://localhost:8080/sse" }
}
}Any MCP-compatible client works β connect to /sse (legacy event stream) or /mcp (Streamable HTTP).
{
"name": "web_search",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "The search query" },
"category": { "type": "string", "description": "general | news | code | science (default general)" },
"language": { "type": "string", "description": "ISO code β en, de, frβ¦ (default en)" },
"timeRange": { "type": "string", "description": "day | week | month | year" },
"maxResults": { "type": "integer", "description": "1β20 (default 5)" }
},
"required": ["query"]
}
}The response is formatted markdown with numbered results, relevance scores, source citations, and structured metadata (PII status, reranker, engines hit, total latency).
Start from the annotated example:
cp config/palena.example.yaml config/palena.yaml| Section | Controls | Key env override |
|---|---|---|
search |
SearXNG URL, engines per category, default language | PALENA_SEARCH_SEARXNG_URL |
scraper |
Concurrency, timeouts, Playwright WS endpoint, proxy pool | PALENA_SCRAPER_PLAYWRIGHT_ENDPOINT |
pii |
Mode (audit / redact / block), Presidio URLs, entities | PALENA_PII_MODE |
reranker |
Provider (kserve / flashrank / rankllm / none) |
PALENA_RERANKER_PROVIDER |
policy |
Domain allow/blocklists, robots.txt cache, rate limits | PALENA_POLICY_DOMAIN_MODE |
provenance |
Hash chain, ClickHouse export | PALENA_PROVENANCE_ENABLED |
otel |
Trace + metric exporters | PALENA_OTEL_ENABLED |
Full annotated reference: config/palena.example.yaml Β· deep-dive: Configuration guide.
Palena is a Go orchestrator. Every external capability runs as its own container and is optional except SearXNG.
| Sidecar | Image | Protocol | Required | Purpose |
|---|---|---|---|---|
| SearXNG | searxng/searxng |
HTTP JSON | Yes | Metasearch aggregation |
| Presidio Analyzer | mcr.microsoft.com/presidio-analyzer |
HTTP JSON | No | PII entity detection |
| Presidio Anonymizer | mcr.microsoft.com/presidio-anonymizer |
HTTP JSON | No | PII masking / replacement |
| Playwright | mcr.microsoft.com/playwright |
Playwright WS | No | JS-rendered page extraction (L1/L2) |
| FlashRank | Flask + ONNX (bundled) | HTTP JSON | No | CPU cross-encoder reranking |
| KServe | Your own InferenceService | HTTP JSON | No | GPU cross-encoder reranking |
Missing sidecars trigger graceful degradation, not failure β L1/L2 disabled, PII reported as "not checked," reranker falls back to search-engine order.
| Profile | Components | Footprint |
|---|---|---|
| Minimal | Palena + SearXNG | ~200 MB |
| Standard | + Presidio + Playwright + FlashRank | ~2β3 GB |
| Enterprise | + KServe GPU reranker (e.g. mxbai-rerank) |
Variable (GPU) |
# Standard
helm install palena deploy/helm/palena/
# Minimal β disable all optional sidecars
helm install palena deploy/helm/palena/ \
--set presidio.enabled=false \
--set playwright.enabled=false \
--set flashrank.enabled=false
# With your own GPU reranker
helm install palena deploy/helm/palena/ \
--set reranker.provider=kserve \
--set reranker.endpoint=http://mxbai-rerank.kserve.svc:8080| Method | Path | Description |
|---|---|---|
GET |
/sse |
MCP SSE transport (legacy event stream) |
POST |
/mcp |
MCP Streamable HTTP transport |
GET |
/health |
Sidecar reachability probe |
GET |
/metrics |
Prometheus metrics (when OTel metrics enabled) |
palena-websearch-mcp/
βββ cmd/palena/ # Binary entry point, config + server wiring
βββ internal/
β βββ config/ # YAML parsing, validation, env overrides
β βββ search/ # SearXNG client, query expansion, dedup
β βββ scraper/ # L0 readability Β· L1/L2 Playwright Β· stealth Β· proxy pool
β βββ pii/ # Presidio client, policy modes, PII-free audit records
β βββ reranker/ # Pluggable interface Β· KServe Β· FlashRank Β· RankLLM Β· no-op
β βββ policy/ # Domain filter, robots.txt, per-domain rate limit
β βββ output/ # Markdown formatting, provenance hash chain
β βββ transport/ # MCP SSE + Streamable HTTP, tool handler
β βββ otel/ # Tracing + metrics setup
βββ config/ # Default + annotated example YAML
βββ deploy/ # Dockerfile, Compose (full + minimal), Helm chart
βββ docs/ # User documentation
Full user documentation lives in docs/.
| Topic | Where to read |
|---|---|
| Start here | Getting Started Β· Concepts Β· FAQ |
| Using the server | Tool Reference Β· Integrations Β· Configuration |
| Pipeline stages | Scraping Β· PII & Compliance Β· Reranking Β· Provenance |
| Running in production | Deployment Β· Observability |
- Core: Go 1.26+, single static binary
- Search: SearXNG (self-hosted metasearch)
- Extraction:
go-shiori/go-readability(L0),playwright-community/playwright-goagainst Microsoft's official Playwright image (L1/L2) - PII: Microsoft Presidio Analyzer + Anonymizer
- Reranking: Mixedbread
mxbai-rerankvia KServe Β· FlashRank ONNX/CPU Β· RankLLM against any inference endpoint - Observability: OpenTelemetry traces + Prometheus-compatible metrics
- Transport: MCP SSE, MCP Streamable HTTP
- Config: YAML + environment variable overrides
- Deploy: Docker Compose (dev), Helm chart (Kubernetes / OpenShift)
Copyright Β© 2026 bitkaio LLC.
Licensed under the Apache License, Version 2.0. You may use, modify, and redistribute Palena under the terms of that license. For commercial support, custom reranker models, or production SLAs, contact bitkaio LLC.