Skip to content

Commit 3de6788

Browse files
committed
Add LLM research profile design
1 parent 416a24a commit 3de6788

1 file changed

Lines changed: 132 additions & 0 deletions

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# LLM Research Profile Mainline Integration Design
2+
3+
Date: 2026-05-13
4+
5+
## Goal
6+
7+
Wire the LLM benchmark results into the production research pipeline so rule discovery, rule promotion, and cross-platform verification use the best tested provider order by default.
8+
9+
This change is configuration orchestration only. It must not enable live trading, place orders, or commit API keys.
10+
11+
## Current Context
12+
13+
The project already supports OpenAI-compatible provider variables:
14+
15+
- `OPENAI_MODEL`, `OPENAI_BASE_URL`, `OPENAI_API_MODE`, `OPENAI_API_KEY`
16+
- `OPENAI_SECONDARY_*`
17+
- `OPENAI_BACKUP_*`
18+
- `OPENAI_FALLBACK_*`
19+
20+
The following mainline scripts consume those variables:
21+
22+
- `scripts/refresh_discovery_watchlist.sh`
23+
- `scripts/run_rule_promotion_once.sh`
24+
- `scripts/run_cross_platform_scan_once.sh`
25+
26+
The benchmark summary is in `reports/experiment-llm-complex-recognition-consolidated-summary-2026-05-13.md`.
27+
28+
## Selected Profile
29+
30+
Default profile name: `balanced`
31+
32+
Provider order:
33+
34+
| Role | Provider result | Base URL | API mode | Reason |
35+
|---|---|---|---|---|
36+
| primary | `windhub/deepseek-v3-2-251201` | `https://windhub.cc/v1` | `messages` | Best balance of strict recall and latency among stable candidates |
37+
| secondary | `secondary/gemini-3.1-pro-preview` | `https://api.xn--chy-js0fk50c.top/v1` | `chat` | Formal CLI smoke passed; lower semantic strength but fast |
38+
| backup | `elysiver/longcat-flash-chat` | `https://elysiver.h-e.top/v1` | `chat` | Best stable elysiver result, good recall and moderate latency |
39+
| fallback | `gpt-5.4` on the original responses endpoint | `https://api.wwcloud.app` | `responses` | More expensive high-capability last resort |
40+
41+
Optional profile: `semantic`
42+
43+
- primary becomes `windhub/doubao-seed-1-8-251228/messages`
44+
- other roles remain the same
45+
- use for manual high-confidence research runs, not routine background loops
46+
47+
## Proposed Implementation
48+
49+
Add a small shell profile loader:
50+
51+
- `scripts/load_llm_research_profile.sh`
52+
53+
Responsibilities:
54+
55+
- Export the provider variables above for a selected profile.
56+
- Default to `LLM_RESEARCH_PROFILE=balanced`.
57+
- Never define API keys directly.
58+
- Only assign provider roles whose matching key variables are already present.
59+
- Preserve explicit user overrides unless `LLM_RESEARCH_PROFILE_FORCE=1`.
60+
- Print a sanitized provider summary when `LLM_RESEARCH_PROFILE_VERBOSE=1`.
61+
62+
Key mapping:
63+
64+
| Role | Required key variable |
65+
|---|---|
66+
| primary/windhub | `OPENAI_API_KEY` |
67+
| secondary/new middle provider | `OPENAI_SECONDARY_API_KEY` |
68+
| backup/elysiver | `OPENAI_BACKUP_API_KEY` |
69+
| fallback/original responses endpoint | `OPENAI_FALLBACK_API_KEY` |
70+
71+
The loader should set models, modes, and base URLs but leave key values untouched.
72+
73+
## Mainline Integration Points
74+
75+
Source the loader after `.env.local` is loaded and before provider variables are read in:
76+
77+
- `scripts/refresh_discovery_watchlist.sh`
78+
- `scripts/run_rule_promotion_once.sh`
79+
- `scripts/run_cross_platform_scan_once.sh`
80+
81+
`scripts/background_manager.sh` does not need direct model logic. It already delegates to those scripts.
82+
83+
## Behavior
84+
85+
Default run:
86+
87+
1. `.env.local` loads secrets and base endpoint values.
88+
2. `load_llm_research_profile.sh` fills missing model/mode/base-url values from the benchmark profile.
89+
3. Existing provider health checks and retry/fallback behavior remain responsible for runtime failures.
90+
4. The scripts continue to write current logs such as `discover_provider label=...` and `rule_promotion_provider label=...`.
91+
92+
Override examples:
93+
94+
- Set `OPENAI_MODEL=...` to override only the primary model.
95+
- Set `LLM_RESEARCH_PROFILE=semantic` for the slow high-recall primary.
96+
- Set `LLM_RESEARCH_PROFILE_FORCE=1` to replace all profile-managed model/mode/base-url values.
97+
- Set `LLM_RESEARCH_PROFILE=off` to disable the loader.
98+
99+
## Error Handling
100+
101+
- Missing key for a role should not hard-fail profile loading; it should skip that role.
102+
- Existing script behavior decides whether no usable provider is fatal.
103+
- Unsupported profile names should fail early with a clear message.
104+
- The loader must not echo keys or full secret-bearing environment values.
105+
106+
## Testing
107+
108+
Add shell-focused tests that run the loader in a clean environment and assert:
109+
110+
- `balanced` exports the expected provider order.
111+
- explicit user overrides are preserved by default.
112+
- `LLM_RESEARCH_PROFILE_FORCE=1` replaces explicit values.
113+
- `LLM_RESEARCH_PROFILE=off` makes no changes.
114+
- no output contains API keys when verbose mode is enabled.
115+
116+
Add integration-level tests for the three mainline scripts if practical by sourcing them in a controlled shell or by extracting profile application into a testable function. If full script sourcing is too heavy, test the loader plus one small wrapper contract: scripts can source it without requiring keys.
117+
118+
## Non-Goals
119+
120+
- Do not change LLM prompts in this task.
121+
- Do not re-run the full expensive benchmark suite.
122+
- Do not turn on live execution.
123+
- Do not write secrets into tracked files.
124+
- Do not replace provider health checks; the loader only sets defaults.
125+
126+
## Success Criteria
127+
128+
- Mainline LLM paths use the benchmark-derived provider order without manual `.env.local` model edits.
129+
- User overrides still work.
130+
- Fallback remains available for the original responses endpoint.
131+
- Tests cover the profile selection and override behavior.
132+
- Full test suite passes after implementation.

0 commit comments

Comments
 (0)