Skip to content
This repository was archived by the owner on Jan 19, 2026. It is now read-only.

Commit 34defdf

Browse files
committed
Began new approach with agentic classifier, initial progress made to make frontend compatible but needs rework to make UX actually good and mobile friendly
1 parent 1bb5b3c commit 34defdf

61 files changed

Lines changed: 11242 additions & 206 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
<!-- 5ddf541b-8e90-4e85-9152-c52f39be9149 010e2c40-4a44-4364-afad-04889d79cdc1 -->
2+
# Agentic Correction System with Human Feedback Loop
3+
4+
## Phase 1: Classification-First Correction Workflow
5+
6+
### 1.1 Create Gap Classification Schema
7+
8+
**File**: `lyrics_transcriber/correction/agentic/models/schemas.py`
9+
10+
Add new Pydantic models for gap classification:
11+
12+
- `GapCategory` enum: `PUNCTUATION_ONLY`, `SOUND_ALIKE`, `BACKGROUND_VOCALS`, `EXTRA_WORDS`, `REPEATED_SECTION`, `COMPLEX_MULTI_ERROR`, `AMBIGUOUS`, `NO_ERROR`
13+
- `GapClassification` model with fields:
14+
- `gap_id`: str
15+
- `category`: GapCategory
16+
- `confidence`: float (0-1)
17+
- `reasoning`: str
18+
- `suggested_handler`: Optional[str]
19+
- Update `CorrectionProposal` to include:
20+
- `gap_category`: Optional[GapCategory]
21+
- `requires_human_review`: bool
22+
- `artist`: Optional[str]
23+
- `title`: Optional[str]
24+
25+
### 1.2 Build Classification Prompt
26+
27+
**File**: `lyrics_transcriber/correction/agentic/prompts/classifier.py` (new)
28+
29+
Create prompt template for gap classification:
30+
31+
- Include: gap text, preceding/following context, reference lyrics from all sources
32+
- Include: artist name, song title (from metadata)
33+
- Ask LLM to categorize gap and explain reasoning
34+
- Provide examples from `gaps_review.yaml` for few-shot learning
35+
- Request structured JSON output matching `GapClassification` schema
36+
37+
### 1.3 Implement Category-Specific Handlers
38+
39+
**File**: `lyrics_transcriber/correction/agentic/handlers/` (new directory)
40+
41+
Create handler classes for each category:
42+
43+
- `PunctuationHandler`: Returns NO_ACTION if only punctuation differs
44+
- `SoundAlikeHandler`: Uses reference lyrics to propose REPLACE actions
45+
- `BackgroundVocalsHandler`: Detects parentheses and proposes DELETE
46+
- `ExtraWordsHandler`: Detects common filler words ("And", "But") and proposes DELETE
47+
- `RepeatedSectionHandler`: Flags for human review with context about chorus/verse structure
48+
- `ComplexMultiErrorHandler`: Breaks into smaller sub-gaps or flags for review
49+
- `AmbiguousHandler`: Always flags for human review
50+
- `NoErrorHandler`: Returns NO_ACTION when any reference source matches
51+
52+
Each handler returns list of `CorrectionProposal` objects.
53+
54+
### 1.4 Update AgenticCorrector Workflow
55+
56+
**File**: `lyrics_transcriber/correction/agentic/agent.py`
57+
58+
Modify `propose()` method to use two-step process:
59+
60+
1. Call classifier to categorize gap
61+
2. Route to appropriate handler based on category
62+
3. Collect proposals from handler
63+
4. Add metadata: artist, title, session_id
64+
65+
### 1.5 Update LyricsCorrector Integration
66+
67+
**File**: `lyrics_transcriber/correction/corrector.py`
68+
69+
In `_process_corrections()` method:
70+
71+
- Pass artist and title from metadata to `AgenticCorrector`
72+
- Handle FLAG action type (new) by marking proposals for human review
73+
- Store gap classification data in correction_steps for later analysis
74+
75+
## Phase 2: Human Feedback Collection System
76+
77+
### 2.1 Define Correction Annotation Schema
78+
79+
**File**: `lyrics_transcriber/correction/feedback/schemas.py` (new)
80+
81+
Create Pydantic models:
82+
83+
- `CorrectionAnnotationType` enum: matches gap categories plus `MANUAL_EDIT`
84+
- `CorrectionAnnotation` model:
85+
- `annotation_id`: str (UUID)
86+
- `audio_hash`: str
87+
- `gap_id`: Optional[str]
88+
- `annotation_type`: CorrectionAnnotationType
89+
- `original_text`: str
90+
- `corrected_text`: str
91+
- `action_taken`: str (NO_ACTION, REPLACE, DELETE, INSERT, MERGE, SPLIT, FLAG)
92+
- `confidence`: float (1-5 scale)
93+
- `reasoning`: str (required human explanation)
94+
- `word_ids_affected`: List[str]
95+
- `agentic_proposal`: Optional[Dict] (what the AI suggested)
96+
- `agentic_category`: Optional[GapCategory]
97+
- `reference_sources_consulted`: List[str]
98+
- `timestamp`: datetime
99+
- `artist`: str
100+
- `title`: str
101+
- `session_id`: str
102+
103+
### 2.2 Create Feedback Storage Backend
104+
105+
**File**: `lyrics_transcriber/correction/feedback/store.py` (new)
106+
107+
Implement `FeedbackStore` class:
108+
109+
- Uses JSON file storage in cache directory: `correction_annotations.jsonl`
110+
- Each line is one annotation (JSONL format for easy appending)
111+
- Methods:
112+
- `save_annotation(annotation: CorrectionAnnotation)`
113+
- `get_annotations_by_song(audio_hash: str)`
114+
- `get_annotations_by_category(category: str)`
115+
- `export_to_training_data()` (for future fine-tuning)
116+
- `get_statistics()` (aggregations for analysis)
117+
118+
### 2.3 Update Backend API Endpoints
119+
120+
**File**: `lyrics_transcriber/review/server.py`
121+
122+
Add new endpoints:
123+
124+
- `POST /api/v1/annotations` - Save correction annotation
125+
- `GET /api/v1/annotations/{audio_hash}` - Get annotations for song
126+
- `GET /api/v1/annotations/stats` - Get aggregated statistics
127+
128+
Update existing endpoint:
129+
130+
- `POST /api/v1/submit` - Also save annotations when corrections submitted
131+
132+
### 2.4 Create UI Annotation Modal Component
133+
134+
**File**: `lyrics_transcriber/frontend/src/components/CorrectionAnnotationModal.tsx` (new)
135+
136+
Build modal that appears when user makes corrections:
137+
138+
- Triggered when: user edits word, deletes word, merges/splits, etc.
139+
- Form fields:
140+
- Annotation type (dropdown with categories)
141+
- Confidence slider (1-5)
142+
- Reasoning text area (required, min 10 chars)
143+
- Display: what agentic AI suggested (if applicable)
144+
- Display: reference lyrics context
145+
- "Save & Continue" and "Skip" buttons
146+
- Store annotations locally until final submission
147+
148+
### 2.5 Integrate Annotation Collection into Edit Workflow
149+
150+
**Files**:
151+
152+
- `lyrics_transcriber/frontend/src/components/EditModal.tsx`
153+
- `lyrics_transcriber/frontend/src/components/EditWordList.tsx`
154+
155+
Wrap edit actions to capture annotations:
156+
157+
- After user confirms word edit, show annotation modal
158+
- Store annotation in React state
159+
- Submit all annotations when user clicks "Finish Review"
160+
- Add settings toggle: "Enable correction annotations" (default: true)
161+
162+
### 2.6 Update Frontend Types and API Client
163+
164+
**Files**:
165+
166+
- `lyrics_transcriber/frontend/src/types.ts` - Add `CorrectionAnnotation` interface
167+
- `lyrics_transcriber/frontend/src/api.ts` - Add `submitAnnotations()` method
168+
169+
## Phase 3: Continuous Improvement Infrastructure
170+
171+
### 3.1 Create Analysis Scripts
172+
173+
**File**: `scripts/analyze_annotations.py` (new)
174+
175+
Script to analyze collected annotations:
176+
177+
- Load all annotations from JSONL file
178+
- Generate reports:
179+
- Most common error categories
180+
- Agentic AI accuracy by category
181+
- Frequently mis-heard words/phrases
182+
- Cases where reference lyrics were wrong
183+
- Output Markdown report to `CORRECTION_ANALYSIS.md`
184+
185+
### 3.2 Build Few-Shot Example Generator
186+
187+
**File**: `scripts/generate_few_shot_examples.py` (new)
188+
189+
Script to convert annotations into few-shot examples:
190+
191+
- Select high-confidence annotations (4-5 rating)
192+
- Format as prompt examples for classifier
193+
- Output to `lyrics_transcriber/correction/agentic/prompts/examples.yaml`
194+
- Can be loaded by classifier prompt builder
195+
196+
### 3.3 Update Classifier with Examples
197+
198+
**File**: `lyrics_transcriber/correction/agentic/prompts/classifier.py`
199+
200+
Modify to:
201+
202+
- Load examples from `examples.yaml`
203+
- Include top N examples for each category in prompt
204+
- Dynamically update as more annotations collected
205+
206+
### 3.4 Add Feedback Loop Documentation
207+
208+
**File**: `HUMAN_FEEDBACK_LOOP.md` (new)
209+
210+
Document the full feedback loop process:
211+
212+
- How to use annotation collection in UI
213+
- How to run analysis scripts
214+
- How to regenerate few-shot examples
215+
- How to evaluate improvement over time
216+
- Future: Path to fine-tuning custom model with RLHF
217+
218+
## Phase 4: Testing and Validation
219+
220+
### 4.1 Create Unit Tests
221+
222+
**File**: `tests/unit/correction/test_classifier.py` (new)
223+
224+
Test gap classifier with examples from `gaps_review.yaml`:
225+
226+
- Verify correct categorization for each gap type
227+
- Test edge cases (ambiguous gaps, no reference match)
228+
229+
### 4.2 Create Integration Tests
230+
231+
**File**: `tests/integration/test_agentic_workflow.py` (update)
232+
233+
Test full classification → correction flow:
234+
235+
- Use Time Bomb song as fixture
236+
- Verify gaps are correctly classified
237+
- Verify appropriate handlers are invoked
238+
- Verify FLAG actions are generated for ambiguous cases
239+
240+
### 4.3 Create Feedback System Tests
241+
242+
**File**: `tests/unit/correction/test_feedback_store.py` (new)
243+
244+
Test annotation storage:
245+
246+
- Save and retrieve annotations
247+
- JSONL format correctness
248+
- Statistics generation
249+
250+
## Implementation Order
251+
252+
1. Phase 1.1-1.3: Classification infrastructure (schemas, prompts, handlers)
253+
2. Phase 1.4-1.5: Integrate into existing workflow
254+
3. Phase 2.1-2.3: Backend feedback storage
255+
4. Phase 2.4-2.6: UI annotation collection
256+
5. Phase 3.1-3.3: Analysis and improvement tools
257+
6. Phase 4: Testing
258+
7. Phase 3.4: Documentation
259+
260+
## Future Enhancements (Out of Scope)
261+
262+
- Fine-tune small LLM (e.g., Llama 3.1-8B) using collected annotations
263+
- Implement RLHF workflow with human preference rankings
264+
- A/B testing framework for comparing classifier versions
265+
- Active learning: prioritize flagging gaps where model is most uncertain
266+
267+
### To-dos
268+
269+
- [ ] Create gap classification schemas and update CorrectionProposal model
270+
- [ ] Build classification prompt template with few-shot examples from gaps_review.yaml
271+
- [ ] Implement category-specific handler classes for each gap type
272+
- [ ] Update AgenticCorrector to use two-step classification workflow
273+
- [ ] Update LyricsCorrector to pass metadata and handle FLAG actions
274+
- [ ] Define CorrectionAnnotation schema and related types
275+
- [ ] Implement FeedbackStore with JSONL storage
276+
- [ ] Add annotation API endpoints to review server
277+
- [ ] Create CorrectionAnnotationModal component
278+
- [ ] Integrate annotation collection into edit workflow
279+
- [ ] Create annotation analysis script
280+
- [ ] Build few-shot example generator from annotations
281+
- [ ] Update classifier to load dynamic few-shot examples
282+
- [ ] Write comprehensive tests for all new components
283+
- [ ] Document the human feedback loop and improvement process

0 commit comments

Comments
 (0)