-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathproduct.yaml
More file actions
318 lines (277 loc) · 13.1 KB
/
product.yaml
File metadata and controls
318 lines (277 loc) · 13.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
# reps.gg — Adaptive DSA Learning System
# Product Specification
product:
name: reps.gg
tagline: Adaptive DSA learning through spaced repetition and pattern mastery
components:
- Chrome extension (lives on LeetCode)
- Web dashboard (mastery skill tree, levels, stats)
# ─── Content Pipeline ────────────────────────────────────────────────
content_pipeline:
taxonomy:
status: done
description: 18 top-level topics, ~60 sub-patterns, each with an importance score
file: taxonomy.yaml
prerequisite_graph:
status: done
description: >
Hard and soft dependencies between sub-patterns.
Base tier (arrays/hashing, stack, linked list, bit manipulation, math/geometry) has no prerequisites.
Everything else branches from there.
file: prerequisites.yaml
problem_data:
status: done
description: >
3860 problems pulled from LeetCode GraphQL API with descriptions and solutions.
Zerotrac ratings mapped by problem ID (2417 matched).
HTML cleaned to plain text (content_clean, solution_clean).
NeetCode 250 list matched and stored separately.
files:
- data/core/problems.json
- data/core/nc250.json
- data/core/nc150.json
llm_tagging:
status: done
description: >
Two-pass process. All 3860 problems tagged.
pass_1:
model: claude-opus-4-6
batch_size: 25
description: >
All problems run through the LLM in batches with anchors
(Zerotrac-rated problems for difficulty calibration, NeetCode 250 as importance reference).
output_per_problem:
- primary_topic: exact topic name from taxonomy
- primary_subtopic: name + weight (single tag)
- secondary_subtopics: list of topic + name + weight (can be empty, weights sum to 1.0 with primary)
- difficulty: integer on Zerotrac scale (real Zerotrac elo used when available, LLM estimate otherwise)
- importance: float 0-1 (how transferable the core technique is)
- interview_plausibility: float 0-1 (independent of importance)
- company_plausibility: quant / faang / mid / startup (float 0-1 each)
pass_2:
model: claude-haiku-4-5-20251001
description: >
Groups pass 1 results by sub-pattern, sends back for relative comparison.
Conservative flagging — only misclassifications and egregious score errors (>0.25 delta).
393 flags produced, select fixes applied.
validation:
deterministic: >
validate_tags.py checks weight ordering, weight sums, invalid subtopics, topic mismatches.
fix_tags.py auto-corrected 47 structural issues.
files:
- scripts/pipeline/tag_problems.py
- scripts/pipeline/corrective_pass.py
- scripts/pipeline/validate_tags.py
- scripts/pipeline/fix_tags.py
- tagging_schema.yaml
embeddings:
status: done
description: >
Tags-only embeddings (topic + primary subtopic + secondary subtopics).
No description, solution, or numeric scores — purely semantic.
Importance/difficulty handled as post-retrieval filtering.
model: text-embedding-3-large
vector_count: 3860
file_size: 253.7 MB
files:
- scripts/pipeline/generate_embeddings.py
- scripts/testing/test_embeddings.py
- data/core/embeddings.json (gitignored, regenerate with generate_embeddings.py)
# ─── User Onboarding ────────────────────────────────────────────────
onboarding:
steps:
- Sign up on website
- Connect LeetCode profile:
methods:
- Username (GraphQL pull)
- Chrome extension DOM reading
- Copy-paste fallback
outcome: Cross-reference solved problems against tagged database for initial mastery estimates
- Self-assessment questionnaire on top-level patterns (covers gaps LeetCode history doesn't)
- Pick goal:
options:
- Interview prep with timeline
- Building foundations
- General improvement
- See mastery dashboard (mostly empty skill tree), install Chrome extension, start solving
calibration: >
System calibrates quickly from real usage.
Initial estimates get overwritten within days.
# ─── Two Modes ───────────────────────────────────────────────────────
modes:
build_patterns:
driven_by: LLM
description: >
Adaptive learning mode. LLM receives user's mastery object, recent history,
prerequisite graph, taxonomy, current mode, active filters, user goal/timeline.
llm_context:
- mastery_object: every sub-pattern with score, problems solved count, last attempted timestamp
- recent_history: last 10-15 problems across all sub-patterns with results
- prerequisite_graph: hard/soft dependencies
- taxonomy: with importance scores
- current_mode: and any active topic filters
- user_goal: and timeline if set
three_states:
filling_gaps:
description: >
Default state for most users. User has many sub-patterns at low mastery.
Focus on building foundations. Prioritize high importance sub-patterns with
prerequisites met. Serve problems at appropriate difficulty for current mastery.
Not pushing, just building competency.
pushing:
description: >
User has few gaps, most high importance sub-patterns at solid mastery.
Push harder — harder problems, challenging variants, niche edge cases,
pushing toward higher tiers. Also triggers when user has an active topic
filter (filtering = asking to be pushed in that area).
spaced_repetition:
description: >
Background layer throughout states 1 and 2. Periodically mix in problems
from sub-patterns with stale last-attempted timestamps. Review problems are
at the user's current mastery level, not harder. Checking retention, not
pushing growth. If they struggle, mastery drops and the system naturally
increases focus on that sub-pattern.
variety: >
Throughout all states: introduce variety, don't repeat same sub-pattern too many
times in a row, be somewhat random among similarly good choices.
output: >
LLM outputs a shortlist of ~10 problem profiles. Each gets matched against the
problem database by sub-pattern, difficulty range, and generalizability. Filter out
seen/done problems and prerequisite-gated problems. Pick from candidates with
randomness. Queue refreshes when mastery changes significantly or queue runs low.
matching:
primary: sub-pattern filter + optional embedding similarity for fuzzy cross-subtopic matching
secondary: difficulty range, generalizability
exclude: seen problems, prerequisite-gated problems, discarded problems
optional_topic_filter: User can scope to a specific area and the LLM works within that constraint
interview_practice:
driven_by: Formula (no LLM)
description: >
Simulates real interview randomness. No adaptation to weaknesses.
flow:
- User selects company type (quant, FAANG, mid-level, startup, all)
- Filter unseen problems to difficulty band for that company type
- Weight by interview plausibility and company type plausibility
- Pull top candidates, random pick
premium_additions:
- Timer
- AI follow-up questions
shared: Both modes update mastery from results
# ─── Chrome Extension ────────────────────────────────────────────────
chrome_extension:
always_active: true
description: Always active on LeetCode regardless of mode
post_solve_report:
trigger: After any problem submission
questions:
- Did you solve it?
- Did you use hints?
- Did you look at the solution?
- Rate perceived difficulty
interaction: Four taps
build_patterns_mode:
next_problem: >
Presents the next recommended problem with a link to it on LeetCode.
Recommendation is precomputed during the progress animation screen (no perceived wait).
progress_animation:
description: >
Post-solve dopamine loop — XP gains/losses per sub-pattern, tier promotions/demotions, level movement.
skip:
available: true
daily_limit: Past a certain number of daily skips, requires premium
feedback: Skip data feeds back into the LLM context for next recommendation
shaky_solves:
description: >
Problems self-reported as shaky (used hints or looked at solution) get flagged for re-serving
after a ~30 day cooldown. Solved clean on re-serve: permanently marked done. Shaky twice:
discarded (never re-served again).
# ─── Mastery Model ───────────────────────────────────────────────────
mastery_model:
type: Running score (additive/subtractive), no averaging, no decay, no recency weighting
implementation: lib/mastery.py
subtopic_mastery:
range: 0-100
per: sub-pattern
mechanism: Each attempt adds or subtracts from the running score
formula:
mastery_change: quality_score × perceived_diff_mult × difficulty_multiplier × generalizability_factor
quality_score:
solved_clean: 4.0
solved_with_hints: 2.0
solved_after_solution: 0.75
struggled: -2.0 (struggled even after seeing solution — subtracts from mastery)
difficulty_multiplier:
description: >
Continuous function of problem elo vs user's expected elo for their mastery level.
Higher mastery requires harder problems for meaningful gains.
expected_elo = 800 + (mastery/100) * 1700
multiplier = clamp(1.0 + 0.5 * (problem_elo - expected_elo) / 500, 0.3, 2.5)
For unsolved: inverted — failing hard problems penalizes less than failing easy ones.
generalizability_factor:
formula: importance + (1 - importance) * (current_mastery / 100)
description: >
At low mastery, niche problems (low importance) give little gain.
At high mastery, the gap shrinks and niche problems give meaningful gains.
secondary_subtopics:
description: >
Primary subtopic gets full mastery change.
Each secondary subtopic gets: mastery_change × weight × 0.5
tiers:
per_subtopic:
display: Tier name (can go up and down)
ranges:
- { range: "0-19", tier: Bronze }
- { range: "20-39", tier: Silver }
- { range: "40-59", tier: Gold }
- { range: "60-79", tier: Platinum }
- { range: "80-100", tier: Diamond }
per_topic:
display: Level (e.g., Level 1-50)
calculation: Importance-weighted average of subtopic scores within the topic
overall:
display: Level (e.g., Level 1-100)
calculation: Importance-weighted average of topic scores
user_facing:
display: XP gains/losses after every attempt, tier promotions/demotions
individual_ranks: Every subtopic has its own tier
topic_ranks: Cumulative level per topic
overall_rank: Cumulative level across all topics
manual_override: >
Users can manually override mastery levels at any time.
Override sets a starting point that real performance adjusts from.
# ─── Overall Rank ────────────────────────────────────────────────────
overall_rank:
description: >
Overall mastery across all subtopics, displayed as a level.
Importance-weighted aggregation of topic scores.
Separate from per-subtopic tiers.
estimated_contest_elo:
status: deferred
description: May add estimated contest elo later based on mastery profile
# ─── Dashboard ───────────────────────────────────────────────────────
dashboard:
skill_tree:
description: >
Visualization of all sub-patterns with tier badges.
Shows unlocked, locked (behind prerequisites), and close to unlocking.
Tiers can go up and down.
overall_level: Overall mastery level prominently displayed
topic_levels: Per-topic level breakdown
stats:
- Problems solved
- Quality scores
- Strongest and weakest patterns
- Total time practiced
social:
friends_list: true
visibility: Opt-in level and tier sharing
# ─── Monetization ────────────────────────────────────────────────────
monetization:
free_tier:
limits: TBD (possibly daily problem cap or skip limit)
premium:
pricing: TBD (subscription)
features:
- Company-specific interview practice
- AI interview simulation with timer and follow-ups
- Unlimited skips