Sarah-Salah
diff --git a/‎.claude/settings.json‎
Lines changed: 13 additions & 0 deletions b/‎.claude/settings.json‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎.cursor/rules/graph-nodes.mdc‎
Lines changed: 18 additions & 59 deletions b/‎.cursor/rules/graph-nodes.mdc‎
Lines changed: 18 additions & 59 deletions
diff --git a/‎.devcontainer/devcontainer.json‎
Lines changed: 3 additions & 3 deletions b/‎.devcontainer/devcontainer.json‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎.dockerignore‎
Lines changed: 41 additions & 18 deletions b/‎.dockerignore‎
Lines changed: 41 additions & 18 deletions
diff --git a/‎.env.example‎
Lines changed: 92 additions & 7 deletions b/‎.env.example‎
Lines changed: 92 additions & 7 deletions
diff --git a/‎.github/ISSUE_TEMPLATE/feature_request.yml‎
Lines changed: 1 addition & 0 deletions b/‎.github/ISSUE_TEMPLATE/feature_request.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.github/dependabot.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/dependabot.yml‎
Lines changed: 2 additions & 2 deletions
@@ -0,0 +1,13 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(uv run *)",
+      "Bash(python *)",
+      "Bash(python3 *)",
+      "Bash(python3*)",
+      "PowerShell(uv run *)",
+      "PowerShell(python *)",
+      "PowerShell(python3 *)"
+    ]
+  }
+}
@@ -1,70 +1,29 @@
 ---
-description: LangGraph pipeline architecture and node development
+description: Investigation pipeline architecture and stage development
 globs:
-  - app/graph_pipeline.py
-  - app/routing.py
-  - app/state.py
-  - app/nodes/**
+    - app/pipeline/**
+    - app/agent/**
+    - app/state/**
+    - app/delivery/**
 ---
 
-# Graph & Node Development
+# Investigation pipeline
 
-## Pipeline Architecture
+## Coordinator
 
-The agent is a LangGraph `StateGraph` over `AgentState` (a `TypedDict` in `app/state.py`).
+`app/pipeline/pipeline.py` runs **resolve integrations → extract alert → investigation agent → deliver**.
 
-### Investigation Flow
-```
-inject_auth → extract_alert → resolve_integrations → plan_actions → investigate → diagnose
-                                                          ↑                          │
-                                                          └── (loop if recommendations) ──┘
-                                                                                     │
-                                                                                  publish → END
-```
+`app/pipeline/runners.py` exposes `run_investigation`, `run_chat`, and async streaming helpers.
 
-### Chat Flow
-```
-inject_auth → router → chat_agent ⇄ tool_executor → END
-                     → general → END
-```
+## Key packages
 
-## Key Files
+- `app/agent/context.py`, `extract.py`, `investigation.py`, `chat.py`
+- `app/delivery/` for publishing and integrations
+- `app/state/agent_state.py` for `AgentState` / `InvestigationState`
 
-- `app/graph_pipeline.py` — `build_graph()` wires nodes and edges
-- `app/routing.py` — conditional edge functions (`route_by_mode`, `route_after_extract`, etc.)
-- `app/state.py` — `AgentState` TypedDict + `AgentStateModel` Pydantic validator
-- `app/nodes/__init__.py` — barrel exports for all node functions
+## Conventions
 
-## Writing a Node
-
-1. Create a subpackage under `app/nodes/` (e.g., `app/nodes/my_step/`)
-2. Implement the node function in `node.py`
-3. Re-export from the subpackage `__init__.py`
-4. Register in `app/nodes/__init__.py` and wire into `graph_pipeline.py`
-
-### Node function pattern:
-
-```python
-from langsmith import traceable
-from app.output import get_tracker
-from app.state import InvestigationState
-
-@traceable(name="node_my_step")
-def node_my_step(state: InvestigationState) -> dict:
-    tracker = get_tracker()
-    tracker.start("my_step", "Doing something")
-    # ... read from state, do work ...
-    tracker.complete("my_step", fields_updated=["evidence"], message="Done")
-    return {"field": value}  # partial state update dict
-```
-
-## Rules
-
-- Nodes receive full state, return a **partial dict** of fields to update
-- Always edit `routing.py` alongside `graph_pipeline.py` when adding conditional edges
-- New state keys go in `AgentState` (TypedDict) AND `AgentStateModel` (Pydantic) in `state.py`
-- Use `@traceable(name="node_xxx")` for LangSmith tracing on all node functions
-- Use `get_tracker()` for CLI progress output
-- Use `InvestigateInput.from_state(state)` in investigation nodes to extract typed inputs
-- The investigation loop is capped at 5 total iterations (see `should_continue_investigation`)
-- `InvestigationState` is an alias for `AgentState`
+- Prefer `@traceable` from `app.utils.tracing` on externally-visible orchestration helpers.
+- Stages read full state and return **partial dict** updates.
+- Use `get_tracker()` for CLI progress when appropriate.
+- Keep new persisted keys in the state `TypedDict` and any matching validators.
@@ -20,11 +20,11 @@
     "PYTHONUTF8": "1"
   },
   "forwardPorts": [
-    2024
+    8000
   ],
   "portsAttributes": {
-    "2024": {
-      "label": "LangGraph dev server",
+    "8000": {
+      "label": "OpenSRE health app",
       "onAutoForward": "notify"
     }
   },
 
@@ -1,19 +1,42 @@
-.git
-.cursor
+# Ignore node_modules and other dependency directories
+node_modules
+bower_components
+vendor
+
+# Ignore logs and temporary files
+*.log
+*.tmp
+*.swp
+
+# Ignore .env files and other environment files
 .env
-.DS_Store
-**/.DS_Store
-**/__pycache__
-**/.pytest_cache
-**/.mypy_cache
-**/.ruff_cache
-**/.venv
-**/.venv-devcontainer
-**/node_modules
-**/cdk.out
-tests/test_case_upstream_lambda/pipeline_code/api_ingester/certifi
-tests/test_case_upstream_lambda/pipeline_code/api_ingester/charset_normalizer
-tests/test_case_upstream_lambda/pipeline_code/api_ingester/idna
-tests/test_case_upstream_lambda/pipeline_code/api_ingester/requests
-tests/test_case_upstream_lambda/pipeline_code/api_ingester/urllib3
-tests/test_case_upstream_lambda/pipeline_code/api_ingester/*.dist-info
+.env.*
+*.local
+
+# Ignore git-related files
+.git
+.gitignore
+
+# Ignore Docker-related files and configs
+.dockerignore
+docker-compose.yml
+
+# Ignore build and cache directories
+dist
+build
+.cache
+__pycache__
+
+# Ignore IDE and editor configurations
+.vscode
+.idea
+*.sublime-project
+*.sublime-workspace
+.DS_Store  # macOS-specific
+
+# Ignore test and coverage files
+coverage
+*.coverage
+*.test.js
+*.spec.js
+tests
@@ -6,20 +6,62 @@
 #   3. Configure one integration with `opensre integrations setup <service>`.
 #   4. Verify with `opensre health` and `opensre integrations verify <service>`.
 #
-# `~/.tracer/integrations.json` is the preferred local integration store.
+# `~/.config/opensre/integrations.json` is the preferred local integration store.
 # The env vars below are still supported as fallback/direct configuration.
 
 # --- Most important ---------------------------------------------------------
 
 # Provider used for LLM calls. Common values: anthropic, openai, openrouter,
-# gemini, nvidia, codex.
+# gemini, nvidia, minimax, bedrock, ollama, codex, claude-code, opencode, kimi,
+# copilot.
 LLM_PROVIDER=anthropic
 
 # Codex CLI works for `opensre investigate` after `codex login`.
 # Leave CODEX_MODEL empty to use the CLI's currently configured model.
 CODEX_MODEL=
 CODEX_BIN=
 
+# Claude Code CLI works for `opensre investigate` after `claude login` or setting ANTHROPIC_API_KEY.
+# Install: npm i -g @anthropic-ai/claude-code
+# Leave CLAUDE_CODE_MODEL empty to use the CLI's currently configured model.
+CLAUDE_CODE_MODEL=
+CLAUDE_CODE_BIN=
+
+# Gemini CLI works for `opensre investigate` after `gemini` auth setup.
+# Install: npm i -g @google/gemini-cli
+# Leave GEMINI_CLI_MODEL empty to use the CLI's configured default model.
+GEMINI_CLI_MODEL=
+GEMINI_CLI_BIN=
+# OpenCode CLI works for `opensre investigate` after `opencode auth login`.
+# Leave OPENCODE_MODEL empty to use the CLI's currently configured model
+OPENCODE_MODEL=
+OPENCODE_BIN=
+
+# Cursor Agent CLI
+# Leave CURSOR_MODEL empty to use the CLI's currently configured model.
+CURSOR_MODEL=
+CURSOR_BIN=
+
+# Kimi Code CLI works for `opensre investigate` after `kimi login`.
+# Leave KIMI_MODEL empty to use the CLI's currently configured model.
+KIMI_MODEL=
+KIMI_BIN=
+KIMI_API_KEY=
+# KIMI_SHARE_DIR=~/.kimi
+
+# GitHub Copilot CLI works for `opensre investigate` after running `copilot`
+# and authenticating with the interactive `/login` slash command.
+# Install: npm i -g @github/copilot
+# Leave COPILOT_MODEL empty to use the CLI's currently configured model.
+COPILOT_MODEL=
+COPILOT_BIN=
+# Optional auth fallbacks (only used when no stored Copilot CLI login exists):
+# COPILOT_GITHUB_TOKEN=
+# GH_TOKEN=
+# GITHUB_TOKEN=
+# Optional config dir override (default: ~/.copilot):
+# COPILOT_HOME=
+
 # Set the key for the provider you choose above.
 ANTHROPIC_API_KEY=
 ANTHROPIC_REASONING_MODEL=
@@ -34,6 +76,11 @@ OPENROUTER_API_KEY=
 OPENROUTER_REASONING_MODEL=
 OPENROUTER_TOOLCALL_MODEL=
 
+# Requesty is an OpenAI-compatible LLM gateway with fallback routing and caching.
+REQUESTY_API_KEY=
+REQUESTY_REASONING_MODEL=
+REQUESTY_TOOLCALL_MODEL=
+
 # Gemini uses the OpenAI-compatible endpoint in this project.
 GEMINI_API_KEY=
 GEMINI_REASONING_MODEL=
@@ -44,6 +91,11 @@ NVIDIA_API_KEY=
 NVIDIA_REASONING_MODEL=
 NVIDIA_TOOLCALL_MODEL=
 
+# Amazon Bedrock — set `LLM_PROVIDER=bedrock` above. Uses the same AWS credential
+# chain as the AWS integration block below (region, keys, or IAM role). No LLM API key.
+BEDROCK_REASONING_MODEL=
+BEDROCK_TOOLCALL_MODEL=
+
 # --- First integrations to set up ------------------------------------------
 
 # For a first real RCA run, one of Grafana / Datadog / Honeycomb / Coralogix
@@ -80,6 +132,16 @@ ARGOCD_VERIFY_SSL=true
 # ARGOCD_INSTANCES='[{"name":"prod","base_url":"https://argocd.example.com","bearer_token":"***","project":"default"}]'
 ARGOCD_INSTANCES=
 
+# Helm 3 (read-only CLI — list/status/history/get values/get manifest)
+# Requires OSRE_HELM_INTEGRATION=1 (or true/yes) to activate from env.
+OSRE_HELM_INTEGRATION=
+HELM_PATH=helm
+HELM_KUBE_CONTEXT=
+HELM_KUBECONFIG=
+HELM_NAMESPACE=
+# Optional: cap manifest size from helm get manifest (integer, min 1024; default 600000).
+# HELM_MANIFEST_MAX_CHARS=
+
 # Datadog
 DD_API_KEY=
 DD_APP_KEY=
@@ -109,6 +171,12 @@ CORALOGIX_SUBSYSTEM_NAME=
 # CORALOGIX_INSTANCES='[{"name":"prod","api_key":"...","base_url":"https://api.coralogix.com"}]'
 CORALOGIX_INSTANCES=
 
+# SigNoz (Query API — logs, metrics, traces)
+# Local Docker stack: http://localhost:8080 (see infra/scripts/signoz/)
+# API key: Settings → Service Accounts → Keys
+SIGNOZ_URL=
+SIGNOZ_API_KEY=
+
 # AWS
 AWS_REGION=us-east-1
 AWS_ROLE_ARN=
@@ -134,6 +202,17 @@ GITHUB_MCP_TOOLSETS=repos,issues,pull_requests,actions,search
 # OPENSRE_GITHUB_MCP_REPO_PROBE_LIMIT=
 
 # Sentry
+# Runtime error monitoring for OpenSRE itself uses the project Sentry DSN constant.
+# Optional: override for operator-side DSN rotation without rebuilding.
+# OPENSRE_SENTRY_DSN=
+SENTRY_ERROR_SAMPLE_RATE=1.0
+SENTRY_TRACES_SAMPLE_RATE=1.0
+OPENSRE_SENTRY_DISABLED=0
+# Tag value attached to Sentry events to identify how this process is deployed.
+# Common values: railway, ec2, vercel, local. Defaults to "local" when unset.
+# OPENSRE_DEPLOYMENT_METHOD=local
+
+# Sentry investigation integration
 SENTRY_URL=https://sentry.io
 SENTRY_ORG_SLUG=
 SENTRY_PROJECT_SLUG=
@@ -240,6 +319,10 @@ JIRA_PROJECT_KEY=
 OPSGENIE_API_KEY=
 OPSGENIE_REGION=us
 
+# incident.io
+INCIDENT_IO_API_KEY=
+INCIDENT_IO_BASE_URL=
+
 # Vercel
 VERCEL_API_TOKEN=
 VERCEL_TEAM_ID=
@@ -273,6 +356,12 @@ DISCORD_DEFAULT_CHANNEL_ID=
 TELEGRAM_BOT_TOKEN=
 TELEGRAM_DEFAULT_CHAT_ID=
 
+# WhatsApp (Twilio)
+TWILIO_ACCOUNT_SID=
+TWILIO_AUTH_TOKEN=
+TWILIO_WHATSAPP_FROM=
+WHATSAPP_DEFAULT_TO=
+
 # --- Web app / hosted runtime only -----------------------------------------
 
 # Required only when using the Tracer web app / hosted integration path.
@@ -284,14 +373,10 @@ OPENSRE_API_KEY=
 
 # --- Deployment / runtime ---------------------------------------------------
 
-# Required for deployed OpenSRE / LangGraph services.
+# Required for hosted OpenSRE runtimes that need persistent storage.
 DATABASE_URI=
 REDIS_URI=
 
-# Optional LangSmith integration
-LANGSMITH_API_KEY=
-LANGSMITH_DEPLOYMENT_NAME=open-sre-agent
-
 ENV=development
 
 # Reversible masking before external LLM calls. Off by default.
 
@@ -3,6 +3,7 @@ description: Suggest a new feature or capability
 title: "[FEATURE] "
 labels:
   - enhancement
+  - pending triage
 body:
   - type: textarea
     id: problem
 
@@ -2,10 +2,10 @@
 
 version: 2
 updates:
-  - package-ecosystem: "pip"
+  - package-ecosystem: "uv"
     directory: "/"
     schedule:
-      interval: "daily"
+      interval: "weekly"
     open-pull-requests-limit: 10
 
   - package-ecosystem: "npm"