- Global Session Cap: New
server.max_sessionsconfig (default: 1000). Returns HTTP 503 withCapacityPayloadwhen the total concurrent session count across all IPs exceeds the cap. Logs a warning at 80% utilization. - Two-Tier TTL: New
server.session_unused_ttl_minutesconfig (default: 15). Sessions that never invoke a tool are reaped at the shorter TTL; sessions that have invoked a tool use the existingsession_ttl_minutes(default: 30). "Used" is tracked at the tool-invocation level, not the protocol-message level. - Lazy Workspace Allocation: Workspace directories are no longer created eagerly at session init. Bash tools already create them lazily per-operation, so sessions that never invoke bash tools incur zero filesystem I/O.
- Observability: Session close logs show count as percentage of cap. Reaper tick logs show used/unused breakdown.
- MCP OAuth 2.1: Full OAuth ceremonial flow so claude.ai (and other MCP clients) can authenticate against the public Pathfinder server. Anonymous OAuth — RFC-compliant endpoints (
/.well-known/oauth-protected-resource,/.well-known/oauth-authorization-server,/register,/authorize,/token) with auto-approve and JWT issuance. PKCE S256 required. - Dynamic Client Registration (RFC 7591): claude.ai and other MCP clients auto-register; no manual client provisioning
- HS256 JWT: Hand-rolled with
node:crypto— no new dependencies - Opportunistic bearer auth on
/mcp: Valid JWT accepted, invalid rejected, missing allowed (backward compatible) - Per-endpoint rate limiting: 10/min
/register, 30/min/authorizeand/tokenper IP - Required env var:
MCP_JWT_SECRET(32+ bytes) — production throws on startup if missing
- Analytics Filter UX: Tool type pills (All / Search / Explore) and source sub-filters for drilling into query data without losing the big picture
- Tool Counts Endpoint: New
/api/analytics/tool-countsfor query volume by tool type - Filterable API: All analytics endpoints accept
tool_typeandsourcequery params - Dev Mode Auth Bypass: Analytics dashboard and API skip token auth when
NODE_ENV=development - Backfill-Safe Stats: Avg result count and latency stats exclude reconstructed log data (
result_count=-1) - Always-On Logging: Query logging fires unconditionally —
analytics.enabledonly gates dashboard access - CI Fix: Slack webhook curl in all workflows wrapped in if-guards to prevent failures when
SLACK_WEBHOOKis unset
- Analytics Dashboard: Built-in query analytics with embedded Chart.js dashboard at
/analytics. Track query volume, latency (avg + p95), empty result rates, top queries, and queries by source - Query Logging: Fire-and-forget instrumentation in search and knowledge tool handlers. Logs query text, result count, top similarity score, latency, source, and session ID
- Analytics REST API: Three endpoints (
/api/analytics/summary,/queries,/empty-queries) with Bearer token authentication via config orANALYTICS_TOKENenv var - Analytics Config: New
analyticssection in pathfinder.yaml withenabled,log_queries,token, andretention_daysfields. Fully optional and backwards compatible - Auto-Cleanup: Old query_log rows automatically pruned during nightly reindex cycle based on
retention_days(default 90) - PGlite Compatible: p95 latency computed in application code (not SQL
percentile_cont) for PGlite compatibility
- Auto-Generate from URL:
pathfinder init --from <url>crawls a docs site and generates a working pathfinder.yaml - Site Crawler: Sitemap-first discovery with robots.txt and recursive link following fallbacks, rate limiting, SPA detection
- Config Generator: Auto-detects source type, derives base URL, generates complete YAML config
- PDF/DOCX Ingestion: New
documentsource type for indexing PDF and DOCX files with page-break and section-aware chunking - Content Extractors: Optional peer dependencies
pdf-parse(PDF) andmammoth(DOCX), dynamically imported only when configured - Scanned PDF Detection: Warns when a PDF produces very little text, indicating a scanned document
- Hybrid Search: Combine vector similarity and full-text keyword search using Reciprocal Rank Fusion (RRF). Three search modes:
vector(default, unchanged),keyword(tsvector-based full-text),hybrid(both + RRF merge) - Keyword Search Upgrade: Replaced ILIKE with PostgreSQL tsvector/tsquery for proper full-text search with ranking via ts_rank
- search_mode Config: New
search_modefield on search tools — set to'vector','keyword', or'hybrid'(defaults to'vector'for backwards compatibility) - tsvector Schema Migration: Adds
tsvtsvector column with GIN index for fast full-text search, with PGlite-safe trigger fallback
- Local Embedding Support: Drop the OpenAI dependency — use Ollama or local transformers.js for embeddings. Three providers: openai (existing), ollama (HTTP API), local (@xenova/transformers, zero external deps)
- EmbeddingProvider Interface: Abstracted embedding generation behind a provider interface with factory. Pluggable architecture for future providers
- Conditional API Keys: OPENAI_API_KEY only required when embedding.provider is "openai" — Ollama and local providers need no API keys
- Dimension Mismatch Detection: Startup check warns when configured dimensions don't match existing vector index
- Format all source files with prettier
- Make Slack webhook notifications non-fatal in CI
- Move CopilotKit deployment configs to deploy/ directory
- Rename fixture configs from mcp-docs to pathfinder
- Remove internal operations runbook from repo
- Clean up stale mcp-docs references in comments and scripts
- Add Additional Use Grant — explicitly permits hosting Pathfinder for your own or your organization's documentation
- Add LICENSING.md with plain-English license summary
- Notion Data Provider: Index Notion pages and database entries as searchable markdown documents. Recursive block-to-markdown conversion, database property serialization as YAML frontmatter, configurable page depth, self-throttled API client (340ms/req)
- Deleted Page Detection: Two-pass incremental acquire detects pages that were deleted, archived, or had integration access revoked — removes stale chunks automatically
- Improve test coverage from 53% to 75% lines across the project (1044 → 1698 tests)
- Add comprehensive tests for markdown, code, and raw-text chunkers, file provider, config, validation, search/knowledge tools, webhooks, schema, url-derivation, embeddings
- Update homepage hero text and docs site styling
- Add
getIndexedItemIdsquery for provider-level deletion detection
- Data Provider Abstraction: Refactored SourceIndexer into DataProvider interface + IndexingPipeline, enabling API-based sources alongside file-based sources
- Slack Data Provider: Index Slack threads as searchable Q&A knowledge via LLM distillation (gpt-4o-mini). Configurable channels, confidence threshold, emoji-triggered reindexing
- Discord Data Provider: Index Discord text channels (LLM distillation) and forum channels (direct Q&A extraction at confidence 1.0). Forum posts are inherently Q&A-shaped — no LLM needed
- FAQ Endpoint (
/faq.txt): Serves Q&A pairs from all FAQ-category sources, filtered by confidence at query time. Advertised via Link header - Knowledge MCP Tool: New
knowledgetool type with browse mode (full FAQ listing) and search mode (vector search scoped to FAQ sources) pathfinder validateCLI Command: Validates YAML config, checks env vars, probes source connectivity. Exit code 1 on errors- Q&A Chunker: Renamed from slack-specific to source-agnostic, registered for both slack and discord source types
- Fix bash filesystem empty after fresh deploy when DB already has current index state
- Fix pathfinder-docs Railway service missing deployment trigger
- Always refresh bash instances after startup index check completes
- Rename "Migrate" to "Switch" in docs nav (matches aimock)
- Update docs: config reference (Slack, Discord, Knowledge tool, jump links), usage (FAQ endpoint, validate command), deploy (new env vars), README
- Add HTML source type for indexing static HTML documentation sites (cheerio-based parser)
- Content container auto-detection (main, article, [role="main"], .content, #content)
- Heading-boundary chunking with headingPath tracking (h1-h3)
- Code block preservation, list formatting, table formatting in HTML extraction
- Add pathfinder-docs.yaml for dogfooding Pathfinder on its own documentation
- Generalize smoke test script for any Pathfinder instance
- Add mobile hamburger nav menu to all docs pages
- Simplify README to match aimock style, add npm metadata (repository, homepage, keywords)
- Fix Dockerfile to copy pathfinder.yaml for production deploy
- Fix schema migration: remove version index from generateSchema (was failing on existing databases)
- Rename project from mcp-docs to Pathfinder
- Add agentic retrieval: session state, vector grep, virtual files, related command, telemetry
- Add configurable bash tool options (grep_strategy, workspace, virtual_files)
- Add Pathfinder landing page and documentation site
- Add Mintlify migration tutorial
- Add GitHub Actions for Pages deployment, releases, and Docker publishing
- Add versioning infrastructure with CHANGELOG
- Semantic search over documentation and code via pgvector + OpenAI embeddings
- Bash tool filesystem exploration via just-bash virtual filesystem
- Feedback collection tools with YAML-defined schemas
- Config-driven via pathfinder.yaml
- Webhook-triggered reindexing from GitHub push events
- Nightly auto-reindex on configurable schedule
- Docker deployment support
- Elastic License 2.0