feat(ts-sdk): Math Standards Alignment Evaluator#91
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds a new TypeScript SDK evaluator (MathStandardsAlignmentEvaluator) that determines CCSS math standards alignment by retrieving learning components from the Learning Commons Knowledge Graph and running LC-level LLM judgments, with optional coarse filtering for cost control.
Changes:
- Introduces Knowledge Graph access layer (API + JSON repositories, cached client) for resolving standards, learning components, and grade-level standard lists.
- Adds Math standards alignment evaluator with single-standard, per-item, question-bank (M×N), and grade-based evaluation entrypoints.
- Adds unit + integration tests and colocated prompts/schemas/config for the evaluator.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdks/typescript/src/evaluators/math/standards-alignment.ts | Implements the new evaluator, including coarse filter + question bank modes. |
| sdks/typescript/src/knowledge-graph/repository.ts | Adds KG API + JSON repositories for standards/LC retrieval. |
| sdks/typescript/src/knowledge-graph/client.ts | Adds promise-cached KG client with concurrency limiting and eviction-on-rejection. |
| sdks/typescript/src/knowledge-graph/types.ts | Defines KG domain types + parseGradeFromStandard. |
| sdks/typescript/src/knowledge-graph/index.ts | Re-exports KG client/repositories/types from a single entrypoint. |
| sdks/typescript/src/schemas/math/standards-alignment.ts | Adds Zod schemas for detail LC eval + coarse filter outputs. |
| sdks/typescript/src/prompts/math/standards-alignment/index.ts | Wires prompt templates/config/schema into TS for runtime prompt construction. |
| sdks/typescript/src/errors.ts | Introduces KnowledgeGraphError. |
| sdks/typescript/src/evaluators/base.ts | Adds platformApiKey to base config surface area. |
| sdks/typescript/src/evaluators/index.ts | Exports the new evaluator + types from evaluators barrel. |
| sdks/typescript/src/index.ts | Exposes the evaluator + KG utilities/errors from the SDK root. |
| sdks/typescript/tests/unit/evaluators/math-standards-alignment.test.ts | Unit coverage for evaluator behavior across modes. |
| sdks/typescript/tests/unit/knowledge-graph/repository.test.ts | Unit coverage for KG repositories and error mapping. |
| sdks/typescript/tests/unit/knowledge-graph/client.test.ts | Unit coverage for KG client caching/eviction and composition helpers. |
| sdks/typescript/tests/integration/math-standards-alignment.integration.test.ts | End-to-end integration test for the 3.MD.C.7 standard family. |
| sdks/typescript/.env.test.example | Documents required env var for KG integration tests. |
| evals/prompts/math/standards-alignment/config.json | Evaluator/prompt configuration (model, temperature, steps). |
| evals/prompts/math/standards-alignment/input_schema.json | Input schema for question/grade/statementCode. |
| evals/prompts/math/standards-alignment/output_schema.json | Output schema for LC-level alignment results. |
| evals/prompts/math/standards-alignment/system.txt | System prompt defining alignment criteria. |
| evals/prompts/math/standards-alignment/user.txt | User prompt template for LC batch evaluation. |
| evals/prompts/math/standards-alignment/coarse-filter-user.txt | User prompt template for coarse relevance filtering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Evaluates whether assessment questions align to CCSS math standards by checking alignment at the learning-component level via the LC Knowledge Graph API (see PR #97 for the KG client layer). Three evaluation methods: - evaluate(question, grade, statementCode) — single pair, primitive - evaluateQuestionBank(questions[], statementCodes[]) — M×N cross-product with optional coarse filter (useCoarseFilter: true) - evaluateByGrade(questions[], grade) — grade-level discovery wrapper Model, temperature, supported grades, and max question length all read from config.json / input_schema.json. platformApiKey doubles as partnerKey for telemetry when not set separately.
15582de to
0dc2581
Compare
czi-fsisenda
left a comment
There was a problem hiding this comment.
Nice! 🚀
Just one P0 and a few P1s and even P2s.
| // --------------------------------------------------------------------------- | ||
| // Public types | ||
| // --------------------------------------------------------------------------- |
There was a problem hiding this comment.
[P1] Moving these to a separate file will make this file less intimidating 😅
| // If partnerKey isn't set, fall back to platformApiKey — same LC platform key, | ||
| // different API surfaces (KG vs telemetry). |
There was a problem hiding this comment.
[P0] We should have these set separately and intentionally even if they have the same value
| this.detailProvider = this.createConfiguredProvider( | ||
| Provider.OpenAI, | ||
| DETAIL_MODEL, | ||
| config.openaiApiKey, | ||
| ); | ||
|
|
||
| this.coarseProvider = this.createConfiguredProvider( | ||
| Provider.OpenAI, | ||
| config.coarseFilterModel ?? DETAIL_MODEL, | ||
| config.openaiApiKey, | ||
| ); | ||
| } |
There was a problem hiding this comment.
[P1] Does this mean this eval only works with OpenAI? No model override?
| this.sendTelemetry({ | ||
| status: 'success', | ||
| latencyMs, | ||
| textLength: question.length, | ||
| grade, | ||
| provider: this.detailProvider.label, | ||
| tokenUsage, | ||
| metadata: { stage_details: stageDetails }, | ||
| inputText: question, | ||
| }).catch(() => undefined); |
There was a problem hiding this comment.
[P1] This makes a good case for more flexible telemetry. We don't capture standards details like statement codes or learning components.
| // Fetch standard descriptions concurrently so the model has real content to | ||
| // reason about rather than opaque codes like "3.MD.C.7.d". | ||
| const infos = await Promise.all( | ||
| statementCodes.map((code) => this.kgClient.getStandardInfo(code).catch(() => ({ uuid: '', description: undefined }))), | ||
| ); |
There was a problem hiding this comment.
[P2] We should cache these
| typeof q === 'string' ? { question: q, grade } : q, | ||
| ); | ||
|
|
||
| const academicStandards = await this.kgClient.getStandardsByGrade(grade); |
There was a problem hiding this comment.
[P2] We should cache these
czi-fsisenda
left a comment
There was a problem hiding this comment.
Chatted offline.
P0 isn't blocking. We don't want to introduce breaking changes yet. We'll address when we transition into 1.x versions.
Summary
Adds
MathStandardsAlignmentEvaluatorto the TypeScript SDK — evaluates whether assessment questions align to CCSS math standards by checking alignment at the learning-component level via the Learning Commons Knowledge Graph API.Depends on PR #97 (
ahussain/kg-client) which must merge first.Three evaluation methods
evaluate(question, grade, statementCode)evaluateQuestionBank(questions[], statementCodes[])evaluateByGrade(questions[], grade)Key design decisions
useCoarseFilter(defaultfalse): opt-in pre-filter that sends standard descriptions to a fast LLM call to skip clearly irrelevant pairs before full LC evaluationgradeis explicit on every call — validated as a supported grade level (K–12). High school standards (HSA, HSF, HSG, etc.) are evaluated with grade'9'/'10'/'11'/'12'since that is how the KG API indexes them.config.json/input_schema.jsonplatformApiKeydoubles aspartnerKeyfor telemetry when not set separatelyTest plan
npm test -- --run tests/unit)npx tsc --noEmit— 0 errorsnpm run lint— 0 errorsRUN_INTEGRATION_TESTS=true OPENAI_API_KEY=<key> PLATFORM_API_KEY=<key> npm test -- --run tests/integration/math-standards-alignment.integration.test.ts🤖 Generated with Claude Code