[0.9.0] - 2026-04-18
Initial public release of the Palena MCP Server.
Added
- MCP server with SSE (
/sse) and Streamable HTTP (/mcp) transports, plus standalone REST API (/api/v1/search) web_searchMCP tool with input parameters: query, category (general/news/code/science), language, timeRange, maxResults- SearXNG search integration with category-based engine routing, query expansion, and URL deduplication
- Tiered content extraction via Playwright
- L0: plain HTTP GET with go-readability for server-rendered pages
- L1: headless Chromium via Playwright for JavaScript-rendered pages
- L2: stealth mode with navigator.webdriver override, viewport/UA randomization, and proxy rotation for bot-protected pages
- Automatic escalation: L0 -> L1 (if content detection flags JS rendering) -> L2 (if bot-blocked)
- Graceful degradation when the Playwright sidecar is unavailable
- PII detection and redaction via Microsoft Presidio
- Three modes: audit (detect and log), redact (detect and anonymize), block (reject high-density PII documents)
- Configurable entity types, anonymization strategies, and density thresholds
- Audit records that never contain actual PII values
- Graceful degradation when Presidio is unavailable
- Prompt-injection defense via a Hugging Face Text Embeddings Inference (TEI) sidecar serving
deepset/deberta-v3-base-injection- Three modes:
audit(detect and log),annotate(wrap suspicious chunks in<untrusted-content>markers),block(drop documents containing any over-threshold chunk) - Per-paragraph chunked scoring catches short malicious paragraphs hidden inside otherwise legitimate pages
- Pluggable model — swap
deepset/deberta-v3-base-injectionfor any HuggingFaceSequenceClassificationmodel (e.g. a fine-tuned successor on the samemicrosoft/deberta-v3-basebackbone) by changinginjection.predictURLand the sidecar--model-idonly - Configurable injection-label name (
injection.injectionLabel) so fine-tuned models with different label conventions work without code changes - Audit records that never contain chunk text — only counts, max/mean scores, and over-threshold counts
- Graceful degradation when the TEI sidecar is unreachable
- Documentation:
docs/prompt-injection.md
- Three modes:
- Pluggable reranker subsystem
- KServe provider for GPU cross-encoder models (mxbai-rerank)
- FlashRank provider for CPU ONNX models with Flask sidecar
- RankLLM provider for LLM-as-reranker via any inference endpoint
- Noop provider to skip reranking and preserve search engine order
- Domain policy with allow/blocklists and robots.txt enforcement, evaluated before scraping
- Content provenance
- Three-stage SHA-256 hash chain: raw HTML, extracted markdown, final content
- Structured provenance records emitted via slog
- Optional batched ClickHouse export for audit trail storage
- OpenTelemetry instrumentation
- Distributed tracing with spans for each pipeline stage (search, scrape, PII, injection, rerank, pipeline)
- Prometheus-compatible metrics: counters (requests, errors, PII entities) and histograms (duration, content length)
- Configurable exporters: OTLP gRPC, stdout, Prometheus, or disabled
- Proxy pool with round-robin rotation and cooldown-on-failure for L2 extraction
- YAML configuration with environment variable overrides (
PALENA_*pattern) and built-in defaults - Health endpoint (
/health) with sidecar reachability checks - Docker deployment
- Multi-stage Dockerfile producing a runtime image that bundles the Playwright driver subprocess
- Full-stack Docker Compose with all sidecars (SearXNG, Presidio, Playwright, injection-guard, FlashRank)
- Minimal Docker Compose with Palena + SearXNG only
- FlashRank sidecar Dockerfile and Flask server
- Pre-configured SearXNG settings with JSON format enabled
- Helm chart for Kubernetes/OpenShift with per-sidecar toggles (presidio, playwright, injection-guard, flashrank), ConfigMap-based configuration, and health probes
- Annotated example configuration (
config/palena.example.yaml) documenting every option - Subsystem documentation covering architecture, search, scraper, PII, prompt-injection, reranker, MCP transport, configuration, and provenance
Known issues
- Injection-guard throughput on long pages is limited by an upstream TEI bug. The released TEI v1.9 Docker image has a DeBERTa-v2/v3 batching defect — multi-input forward passes fail with
broadcast_mulshape mismatches. Palena works around it by serializing classifier calls (one HTTP request per chunk), which keeps the classifier correct but means a 70-chunk document spends roughly a minute inside the injection stage. Upstream fix: huggingface/text-embeddings-inference#846, expected in TEI v1.10.0. When the image is bumped, raisepredictConcurrencyininternal/injection/tei.goto restore parallelism.
Container image
ghcr.io/palenaai/palena-websearch-mcp:0.9.0- Digest:
sha256:3b1ab427a11c525be937335a374bf7f16dbecc0b4f683974c7c74f879fd4d417
Supply-chain artifacts
- CycloneDX SBOM:
sbom.cdx.json(attached + cosign attestation) - SPDX SBOM:
sbom.spdx.json(attached) - Trivy HIGH/CRITICAL scan report:
trivy-report.json(attached) - Image signature: Sigstore keyless (cosign, GitHub OIDC)
- SLSA provenance: Level 3 (attached via slsa-github-generator)
Verify the image signature:
cosign verify \
--certificate-identity-regexp 'https://github.com/PalenaAI/palena-websearch-mcp/' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
ghcr.io/palenaai/palena-websearch-mcp@sha256:3b1ab427a11c525be937335a374bf7f16dbecc0b4f683974c7c74f879fd4d417