ci: replace kimi-k2-thinking with kimi-k2.6 in integration test defaults (#3102)

xingyaoww · openhands-agent · web-flow · commit b2122dd1657e · 2026-05-07T12:11:04.000-04:00
Co-authored-by: openhands &lt;openhands@all-hands.dev&gt;
diff --git a/.github/workflows/integration-runner.yml b/.github/workflows/integration-runner.yml
@@ -50,7 +50,7 @@ on:
 env:
     N_PROCESSES: 4 # Global configuration for number of parallel processes for evaluation
     # Default models for scheduled/label-triggered runs (subset of models from resolve_model_config.py)
-    DEFAULT_MODEL_IDS: claude-sonnet-4-6,deepseek-v3.2-reasoner,kimi-k2-thinking,gemini-3.1-pro
+    DEFAULT_MODEL_IDS: claude-sonnet-4-6,deepseek-v4-flash,kimi-k2.6,gemini-3.1-pro
 
 jobs:
     setup-matrix:
diff --git a/tests/integration/README.md b/tests/integration/README.md
@@ -72,7 +72,7 @@ Defined in `.github/workflows/integration-runner.yml`, this workflow runs integr
 2. **Manual Trigger**: Via workflow dispatch with a required reason
 3. **Scheduled Runs**: Daily at 10:30 PM UTC (cron: `30 22 * * *`)
 
-**Test Coverage:** Runs across 6 LLM models (Claude Sonnet 4.5, GPT-5.1 Codex Max, Deepseek, Kimi K2, Gemini 3.1 Pro, Devstral 2512)
+**Test Coverage:** Runs across 4 LLM models (Claude Sonnet 4.6, DeepSeek V4 Flash, Kimi K2.6, Gemini 3.1 Pro)
 
 ### Condenser Tests Workflow