Skip to content

Implement improvements and fixes#89

Merged
alex-feel merged 27 commits intomainfrom
alex-feel-dev
Mar 18, 2026
Merged

Implement improvements and fixes#89
alex-feel merged 27 commits intomainfrom
alex-feel-dev

Conversation

@alex-feel
Copy link
Copy Markdown
Owner

No description provided.

…mmary support

Replace unreliable grep-based model existence checks with ollama show in both the Ollama sidecar entrypoint and all 6 Ollama healthchecks.
The grep approach matched partial model names and missed tag variants.

Add missing summary-openai build extra, SUMMARY_PROVIDER, and SUMMARY_MODEL to all 3 OpenAI docker-compose files so the container image includes the summary dependency and configures it at runtime.

Add Helm initContainer for auto-pulling Ollama models on startup with a new autoPull toggle in values.yaml.

Expand the FATAL CONFIGURATION ERROR message in docker-entrypoint.sh to list all possible exit code 78 causes including summary model, provider packages, API keys, and PostgreSQL issues.
…all tools

All four context tools (store_context, update_context, store_context_batch, update_context_batch) now use asyncio.gather(*tasks, return_exceptions=True) for parallel embedding and summary generation, with proper error inspection after gather completes.
Batch tools reuse generate_embeddings_with_timeout and generate_summary_with_timeout wrappers from context.py, eliminating ~250 lines of duplicated generation code.
Generation failures are now handled gracefully instead of crashing the request.
Increase EMBEDDING_TIMEOUT_S from 30s to 60s and SUMMARY_TIMEOUT_S from 30s to 120s to better accommodate slower hardware and larger models under typical production conditions.
Update settings, documentation, Helm values, and tests to reflect the new defaults.
The smaller qwen3:0.6b model provides acceptable summary quality while running significantly faster on local hardware with minimal resource requirements (2GB RAM vs 4GB for qwen3:1.7b).
This improves the out-of-the-box experience for local deployments.
Remove redundant prefixes from initialization log messages to match the established embedding provider log format convention.
Replace tenacity's built-in before_sleep_log with a custom _make_before_sleep_log factory in both embedding and summary retry modules.
The built-in callback relies on retry_state.fn which is always None when using the AsyncRetrying iteration pattern, causing log messages to display <unknown> instead of the operation name.
The custom callback uses the explicitly provided operation_name, producing clear retry log messages like "Retrying embedding in 1.5 seconds as it raised ConnectionError: ...".

Remove the dead __qualname__ workaround that attempted to fix this but never worked with the iteration pattern.
Replace obsolete __qualname__ tests with new tests that verify actual retry log output contains the expected operation name.
…TRUNCATION_LENGTH to 300

The previous defaults (300/150) left a blind spot where entries between 150-300 characters were truncated but never summarized, reducing search result informativeness.
The new defaults (500/300) ensure truncated previews adequately represent content up to 500 characters, reserving LLM summary generation for entries where a separate distillation adds genuine value.
Prevent silent context truncation during summary generation by mirroring the embedding provider's context validation pattern.

Key changes:
- Add summary context limits registry (app/summary/context_limits.py) with model specs for Ollama, OpenAI, and Anthropic models
- Add SUMMARY_OLLAMA_NUM_CTX (default 32768) and SUMMARY_OLLAMA_TRUNCATE (default false) settings for explicit context window control
- Add pre-validation in OllamaSummaryProvider that accounts for prompt overhead and output token reservation before calling the API
- Extract shared OLLAMA_HOST into OllamaSettings class to eliminate cross-feature dependency between embedding and summary settings
- Rename OLLAMA_NUM_CTX/OLLAMA_TRUNCATE to EMBEDDING_OLLAMA_NUM_CTX/EMBEDDING_OLLAMA_TRUNCATE for unambiguous per-feature configuration
- Correct upper bound for context length validators from 131072 to 2097152 to support models with extended context windows
- Refactor dependency check dispatch to uniform interface where all provider check functions accept the same (settings, ollama_host) signature
Reduce first-request latency by pre-loading Ollama models into memory at server startup and keeping them loaded indefinitely.

Add prewarm_ollama_models() to app/startup that sends lightweight warm-up requests to Ollama for embedding (/api/embed) and summary (/api/chat with num_ctx) models during lifespan initialization.
Deduplicates when the same model serves both roles.
Failures are logged as warnings without blocking startup.

Set OLLAMA_KEEP_ALIVE=-1 in all Docker Compose files and Helm deployment template so models remain in memory after loading.

Add commented-out NVIDIA GPU passthrough sections to all Docker Compose files and Helm values for easy enablement.
Replace full text_content comparisons with hash-based checks in all deduplication code paths to reduce network overhead, especially for PostgreSQL deployments where full text was transferred over the wire.

Key changes:
- Add content_hash column via idempotent migration (SQLite and PostgreSQL)
- Add composite index (thread_id, source, content_hash) for fast lookups
- Compute SHA-256 hash in store, update, and pre-check methods
- NULL fallback ensures backward compatibility for pre-migration rows
- Hash is internal only (excluded from API responses)
- 25 dedicated tests covering all code paths and edge cases
Remove _initialize_schema() from PostgreSQLBackend to consolidate all schema initialization into init_database() as the single source of truth.
The redundant dual-path execution contributed to server crashes when the first path failed and prevented reaching the migration step.

Add transaction-level advisory lock (pg_advisory_xact_lock) to both PostgreSQL paths in init_database() for multi-pod safety, replacing the session-level lock that required explicit unlock in a finally block.
Add per-statement debug/error logging for schema execution diagnostics.
Delete store_contexts_batch() and update_contexts_batch() from ContextRepository along with their TestBatchHashDedup test class.
Both methods have zero callers since MCP batch tools delegate to single-entry methods (store_with_deduplication, update_context_entry).
…are FastMCP middleware

The `deserialize_json_param` function was called at individual tool call sites to work around MCP clients serializing list/dict parameters as JSON strings.
This approach required manual maintenance per tool and missed parameters outside `store_context`.

Replace with `JsonStringDeserializerMiddleware` that uses the FastMCP 3.1.x Middleware API to intercept all tool calls.
At startup, it builds a static schema map from each tool's JSON Schema to identify parameters expecting array/object types.
At runtime, only those parameters are candidates for deserialization, leaving string parameters untouched.
Handles double-encoding.

Delete `deserialize_json_param` from `app/startup/validation.py`, its 3 call sites in `app/tools/context.py`, its re-export from `app/server.py`, and the associated test files `test_json_parameter_handling.py` and `test_json_string_handling.py`.
Add 55 unit tests and 4 integration tests for the new middleware.
…ters in batch tool messages

Both store_context_batch and update_context_batch reported "summaries generated" and "embeddings generated" based on whether providers were configured, not whether generation actually occurred.
This caused misleading messages when all entries had short text (below SUMMARY_MIN_CONTENT_LENGTH) or when update_context_batch entries had no text changes.

Track actual generation counts during the per-entry processing loop and use those counters in the response message construction.
Standardize update_context_batch verb from "summaries generated" to "summaries regenerated" matching the single-tool convention.
…to 240s

EMBEDDING_TIMEOUT_S raised from 60s to 240s and SUMMARY_TIMEOUT_S from 120s to 240s.
Updated across settings source of truth, documentation, Helm values, and tests.
The middleware registration step (step 22) in lifespan calls mcp.list_tools(run_middleware=False) which requires an awaitable.
Add AsyncMock for list_tools to both test_lifespan_initializes_and_shuts_down_summary_provider and test_lifespan_summary_disabled to match the pattern used in test_server_initialization.py.
…k reranking

Move FlashRank model loading from lazy initialization (first rerank call) to eager loading during initialize(), eliminating cold-start latency on the first search request.
Add asyncio.Lock with double-checked locking to _ensure_ranker() to prevent concurrent coroutines from triggering duplicate model downloads.
The existing _ensure_ranker() call in rerank() now serves as a defense-in-depth safety net.
…rt retries

Classify exceptions in PostgreSQLBackend.initialize() as DependencyError (exit 69, retryable) or ConfigurationError (exit 78, non-retryable) to enable proper Docker/Kubernetes restart policy behavior.
DependencyError: OSError, ConnectionDoesNotExistError, InterfaceError, TooManyConnectionsError.
ConfigurationError: InvalidPasswordError, InvalidCatalogNameError.
Unknown exceptions default to DependencyError.

Limit Docker Compose restart policy from unbounded on-failure to on-failure:5 across all 9 compose files to prevent infinite restart loops on non-recoverable errors.
…and summary generation

More retries improve resilience against transient provider failures (network timeouts, rate limits, temporary service unavailability) without requiring users to override defaults via environment variables.
…ariables

Transform hardcoded user-tuneable variables in all 9 Docker Compose files to ${VAR:-default} syntax, enabling .env overrides without editing compose files.
Configurable variables: LOG_LEVEL, EMBEDDING_MODEL, EMBEDDING_DIM, EMBEDDING_PROVIDER, SUMMARY_MODEL, SUMMARY_PROVIDER, and PostgreSQL credentials (POSTGRESQL_USER, POSTGRESQL_PASSWORD, POSTGRESQL_DATABASE).
Ollama sidecar model names use the same interpolation to stay in sync.

Also adds SUMMARY_PROVIDER to 6 Ollama compose files where it was absent.
Updates PostgreSQL healthcheck to use $${POSTGRES_USER}/$${POSTGRES_DB} runtime expansion.
Updates 6 .env example files with model/provider configuration sections.
Updates CLAUDE.md Docker-Compose Environment Variable Policy.
Summary prompts are now dynamically selected based on the source field of the context entry being summarized.
The model sees instructions tailored to the specific source type and never sees instructions for the other source type.

User messages get a prompt focused on capturing intent, requirements, constraints, and directives.
Agent reports get a prompt focused on key findings, decisions, deliverables, and technical specifics.

resolve_summary_prompt() now accepts source instead of SummarySettings and is called per-summarization-call rather than at provider init time.
check_entry_exists() returns tuple[bool, str | None] to propagate the source field through the pipeline.
All three summary providers, all tool call sites, and all tests are updated accordingly.
Add --no-editable flag to uv sync so the package is installed into
site-packages instead of as an editable .pth reference, eliminating the
double /app/app/ path in containers.
Remove the now-unnecessary COPY of application code into the runtime
stage and clean up dead PROJECT_ROOT/BASE_DIR constants from settings.
Move all environment variable documentation from README.md into a dedicated docs/environment-variables.md reference covering all 128 variables from app/settings.py with types, defaults, constraints, and descriptions organized by settings category.
Replace the README.md environment configuration section with a brief description and link to the new reference, matching the pattern used by other README sections.
Add maintenance notes in CLAUDE.md requiring docs/environment-variables.md to be updated alongside server.json when settings change.
Update SUMMARY_PROMPT descriptions in environment-variables.md and summary-generation.md to reflect that a custom prompt overrides both source-specific default prompts introduced in commit 5c80bea.
Missing Ollama models are now automatically downloaded during server startup dependency checks, eliminating the need for manual ollama pull commands before first run.

Add OLLAMA_AUTO_PULL (default: true) and OLLAMA_PULL_TIMEOUT_S (default: 900s, range: 30-3600) settings to OllamaSettings for controlling the auto-pull behavior.

Extract shared _check_ollama_model() helper from duplicated embedding and summary dependency check functions, reducing ~140 lines of code duplication while adding the auto-pull logic with post-pull verification.
Direct agents to discover and apply user-authored Skills that relate to context storage, retrieval, and preservation.
Agents check Skill descriptions for relevance and read ambiguous ones in full before applying them to extend the default server instructions.
@github-actions
Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  app
  server.py 228, 279, 333, 373, 511, 541-542
  settings.py
  app/backends
  postgresql_backend.py 338, 532-534
  app/embeddings
  retry.py 150, 155
  app/embeddings/providers
  langchain_ollama.py
  app/middleware
  __init__.py
  json_string_deserializer.py
  app/migrations
  __init__.py
  content_hash.py 58-70
  dependencies.py 71-72, 85
  app/repositories
  context_repository.py 183-224, 234-250, 308-328, 918-928, 962-963, 991-1000, 1246
  app/reranking/providers
  flashrank.py 70-71
  app/startup
  __init__.py 210-218, 238-252, 267-275, 290-305
  validation.py
  app/summary
  base.py
  context_limits.py
  instructions.py
  retry.py 140, 145
  app/summary/providers
  langchain_anthropic.py
  langchain_ollama.py
  langchain_openai.py
  app/tools
  batch.py 97, 579, 712, 870-871
  context.py
Project Total  

This report was generated by python-coverage-comment-action

@alex-feel alex-feel merged commit 2fd4876 into main Mar 18, 2026
6 checks passed
@alex-feel alex-feel deleted the alex-feel-dev branch March 18, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant