|
| 1 | +# LLM Research Profile Mainline Integration Design |
| 2 | + |
| 3 | +Date: 2026-05-13 |
| 4 | + |
| 5 | +## Goal |
| 6 | + |
| 7 | +Wire the LLM benchmark results into the production research pipeline so rule discovery, rule promotion, and cross-platform verification use the best tested provider order by default. |
| 8 | + |
| 9 | +This change is configuration orchestration only. It must not enable live trading, place orders, or commit API keys. |
| 10 | + |
| 11 | +## Current Context |
| 12 | + |
| 13 | +The project already supports OpenAI-compatible provider variables: |
| 14 | + |
| 15 | +- `OPENAI_MODEL`, `OPENAI_BASE_URL`, `OPENAI_API_MODE`, `OPENAI_API_KEY` |
| 16 | +- `OPENAI_SECONDARY_*` |
| 17 | +- `OPENAI_BACKUP_*` |
| 18 | +- `OPENAI_FALLBACK_*` |
| 19 | + |
| 20 | +The following mainline scripts consume those variables: |
| 21 | + |
| 22 | +- `scripts/refresh_discovery_watchlist.sh` |
| 23 | +- `scripts/run_rule_promotion_once.sh` |
| 24 | +- `scripts/run_cross_platform_scan_once.sh` |
| 25 | + |
| 26 | +The benchmark summary is in `reports/experiment-llm-complex-recognition-consolidated-summary-2026-05-13.md`. |
| 27 | + |
| 28 | +## Selected Profile |
| 29 | + |
| 30 | +Default profile name: `balanced` |
| 31 | + |
| 32 | +Provider order: |
| 33 | + |
| 34 | +| Role | Provider result | Base URL | API mode | Reason | |
| 35 | +|---|---|---|---|---| |
| 36 | +| primary | `windhub/deepseek-v3-2-251201` | `https://windhub.cc/v1` | `messages` | Best balance of strict recall and latency among stable candidates | |
| 37 | +| secondary | `secondary/gemini-3.1-pro-preview` | `https://api.xn--chy-js0fk50c.top/v1` | `chat` | Formal CLI smoke passed; lower semantic strength but fast | |
| 38 | +| backup | `elysiver/longcat-flash-chat` | `https://elysiver.h-e.top/v1` | `chat` | Best stable elysiver result, good recall and moderate latency | |
| 39 | +| fallback | `gpt-5.4` on the original responses endpoint | `https://api.wwcloud.app` | `responses` | More expensive high-capability last resort | |
| 40 | + |
| 41 | +Optional profile: `semantic` |
| 42 | + |
| 43 | +- primary becomes `windhub/doubao-seed-1-8-251228/messages` |
| 44 | +- other roles remain the same |
| 45 | +- use for manual high-confidence research runs, not routine background loops |
| 46 | + |
| 47 | +## Proposed Implementation |
| 48 | + |
| 49 | +Add a small shell profile loader: |
| 50 | + |
| 51 | +- `scripts/load_llm_research_profile.sh` |
| 52 | + |
| 53 | +Responsibilities: |
| 54 | + |
| 55 | +- Export the provider variables above for a selected profile. |
| 56 | +- Default to `LLM_RESEARCH_PROFILE=balanced`. |
| 57 | +- Never define API keys directly. |
| 58 | +- Only assign provider roles whose matching key variables are already present. |
| 59 | +- Preserve explicit user overrides unless `LLM_RESEARCH_PROFILE_FORCE=1`. |
| 60 | +- Print a sanitized provider summary when `LLM_RESEARCH_PROFILE_VERBOSE=1`. |
| 61 | + |
| 62 | +Key mapping: |
| 63 | + |
| 64 | +| Role | Required key variable | |
| 65 | +|---|---| |
| 66 | +| primary/windhub | `OPENAI_API_KEY` | |
| 67 | +| secondary/new middle provider | `OPENAI_SECONDARY_API_KEY` | |
| 68 | +| backup/elysiver | `OPENAI_BACKUP_API_KEY` | |
| 69 | +| fallback/original responses endpoint | `OPENAI_FALLBACK_API_KEY` | |
| 70 | + |
| 71 | +The loader should set models, modes, and base URLs but leave key values untouched. |
| 72 | + |
| 73 | +## Mainline Integration Points |
| 74 | + |
| 75 | +Source the loader after `.env.local` is loaded and before provider variables are read in: |
| 76 | + |
| 77 | +- `scripts/refresh_discovery_watchlist.sh` |
| 78 | +- `scripts/run_rule_promotion_once.sh` |
| 79 | +- `scripts/run_cross_platform_scan_once.sh` |
| 80 | + |
| 81 | +`scripts/background_manager.sh` does not need direct model logic. It already delegates to those scripts. |
| 82 | + |
| 83 | +## Behavior |
| 84 | + |
| 85 | +Default run: |
| 86 | + |
| 87 | +1. `.env.local` loads secrets and base endpoint values. |
| 88 | +2. `load_llm_research_profile.sh` fills missing model/mode/base-url values from the benchmark profile. |
| 89 | +3. Existing provider health checks and retry/fallback behavior remain responsible for runtime failures. |
| 90 | +4. The scripts continue to write current logs such as `discover_provider label=...` and `rule_promotion_provider label=...`. |
| 91 | + |
| 92 | +Override examples: |
| 93 | + |
| 94 | +- Set `OPENAI_MODEL=...` to override only the primary model. |
| 95 | +- Set `LLM_RESEARCH_PROFILE=semantic` for the slow high-recall primary. |
| 96 | +- Set `LLM_RESEARCH_PROFILE_FORCE=1` to replace all profile-managed model/mode/base-url values. |
| 97 | +- Set `LLM_RESEARCH_PROFILE=off` to disable the loader. |
| 98 | + |
| 99 | +## Error Handling |
| 100 | + |
| 101 | +- Missing key for a role should not hard-fail profile loading; it should skip that role. |
| 102 | +- Existing script behavior decides whether no usable provider is fatal. |
| 103 | +- Unsupported profile names should fail early with a clear message. |
| 104 | +- The loader must not echo keys or full secret-bearing environment values. |
| 105 | + |
| 106 | +## Testing |
| 107 | + |
| 108 | +Add shell-focused tests that run the loader in a clean environment and assert: |
| 109 | + |
| 110 | +- `balanced` exports the expected provider order. |
| 111 | +- explicit user overrides are preserved by default. |
| 112 | +- `LLM_RESEARCH_PROFILE_FORCE=1` replaces explicit values. |
| 113 | +- `LLM_RESEARCH_PROFILE=off` makes no changes. |
| 114 | +- no output contains API keys when verbose mode is enabled. |
| 115 | + |
| 116 | +Add integration-level tests for the three mainline scripts if practical by sourcing them in a controlled shell or by extracting profile application into a testable function. If full script sourcing is too heavy, test the loader plus one small wrapper contract: scripts can source it without requiring keys. |
| 117 | + |
| 118 | +## Non-Goals |
| 119 | + |
| 120 | +- Do not change LLM prompts in this task. |
| 121 | +- Do not re-run the full expensive benchmark suite. |
| 122 | +- Do not turn on live execution. |
| 123 | +- Do not write secrets into tracked files. |
| 124 | +- Do not replace provider health checks; the loader only sets defaults. |
| 125 | + |
| 126 | +## Success Criteria |
| 127 | + |
| 128 | +- Mainline LLM paths use the benchmark-derived provider order without manual `.env.local` model edits. |
| 129 | +- User overrides still work. |
| 130 | +- Fallback remains available for the original responses endpoint. |
| 131 | +- Tests cover the profile selection and override behavior. |
| 132 | +- Full test suite passes after implementation. |
0 commit comments