feat(ts-sdk): Math Standards Alignment Evaluator by adnanrhussain · Pull Request #91 · learning-commons-org/evaluators

adnanrhussain · 2026-06-03T23:43:34Z

Summary

Adds MathStandardsAlignmentEvaluator to the TypeScript SDK — evaluates whether assessment questions align to CCSS math standards by checking alignment at the learning-component level via the Learning Commons Knowledge Graph API.

Depends on PR #97 (ahussain/kg-client) which must merge first.

Three evaluation methods

Method	Use case
`evaluate(question, grade, statementCode)`	Single question × single standard (primitive)
`evaluateQuestionBank(questions[], statementCodes[])`	True M×N cross-product — coverage / gap analysis
`evaluateByGrade(questions[], grade)`	Wraps question bank, fetches all CCSS math standards for a grade from KG

Key design decisions

Batched LLM calls: all learning components for a standard evaluated in a single structured-output call — O(M×N) not O(M×N×LC)
useCoarseFilter (default false): opt-in pre-filter that sends standard descriptions to a fast LLM call to skip clearly irrelevant pairs before full LC evaluation
grade is explicit on every call — validated as a supported grade level (K–12). High school standards (HSA, HSF, HSG, etc.) are evaluated with grade '9'/'10'/'11'/'12' since that is how the KG API indexes them.
Model, temperature, supported grades, and max question length all read from config.json / input_schema.json
platformApiKey doubles as partnerKey for telemetry when not set separately

Test plan

244 unit tests pass (npm test -- --run tests/unit)
npx tsc --noEmit — 0 errors
npm run lint — 0 errors
Integration test validates alignment for full 3.MD.C.7 standard family with 0 coarse filter false negatives
Run integration: RUN_INTEGRATION_TESTS=true OPENAI_API_KEY=<key> PLATFORM_API_KEY=<key> npm test -- --run tests/integration/math-standards-alignment.integration.test.ts

🤖 Generated with Claude Code

codecov · 2026-06-03T23:45:37Z

Codecov Report

❌ Patch coverage is 93.50000% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...escript/src/evaluators/math/standards-alignment.ts	92.61%	13 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds a new TypeScript SDK evaluator (MathStandardsAlignmentEvaluator) that determines CCSS math standards alignment by retrieving learning components from the Learning Commons Knowledge Graph and running LC-level LLM judgments, with optional coarse filtering for cost control.

Changes:

Introduces Knowledge Graph access layer (API + JSON repositories, cached client) for resolving standards, learning components, and grade-level standard lists.
Adds Math standards alignment evaluator with single-standard, per-item, question-bank (M×N), and grade-based evaluation entrypoints.
Adds unit + integration tests and colocated prompts/schemas/config for the evaluator.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sdks/typescript/src/evaluators/math/standards-alignment.ts	Implements the new evaluator, including coarse filter + question bank modes.
sdks/typescript/src/knowledge-graph/repository.ts	Adds KG API + JSON repositories for standards/LC retrieval.
sdks/typescript/src/knowledge-graph/client.ts	Adds promise-cached KG client with concurrency limiting and eviction-on-rejection.
sdks/typescript/src/knowledge-graph/types.ts	Defines KG domain types + `parseGradeFromStandard`.
sdks/typescript/src/knowledge-graph/index.ts	Re-exports KG client/repositories/types from a single entrypoint.
sdks/typescript/src/schemas/math/standards-alignment.ts	Adds Zod schemas for detail LC eval + coarse filter outputs.
sdks/typescript/src/prompts/math/standards-alignment/index.ts	Wires prompt templates/config/schema into TS for runtime prompt construction.
sdks/typescript/src/errors.ts	Introduces `KnowledgeGraphError`.
sdks/typescript/src/evaluators/base.ts	Adds `platformApiKey` to base config surface area.
sdks/typescript/src/evaluators/index.ts	Exports the new evaluator + types from evaluators barrel.
sdks/typescript/src/index.ts	Exposes the evaluator + KG utilities/errors from the SDK root.
sdks/typescript/tests/unit/evaluators/math-standards-alignment.test.ts	Unit coverage for evaluator behavior across modes.
sdks/typescript/tests/unit/knowledge-graph/repository.test.ts	Unit coverage for KG repositories and error mapping.
sdks/typescript/tests/unit/knowledge-graph/client.test.ts	Unit coverage for KG client caching/eviction and composition helpers.
sdks/typescript/tests/integration/math-standards-alignment.integration.test.ts	End-to-end integration test for the 3.MD.C.7 standard family.
sdks/typescript/.env.test.example	Documents required env var for KG integration tests.
evals/prompts/math/standards-alignment/config.json	Evaluator/prompt configuration (model, temperature, steps).
evals/prompts/math/standards-alignment/input_schema.json	Input schema for question/grade/statementCode.
evals/prompts/math/standards-alignment/output_schema.json	Output schema for LC-level alignment results.
evals/prompts/math/standards-alignment/system.txt	System prompt defining alignment criteria.
evals/prompts/math/standards-alignment/user.txt	User prompt template for LC batch evaluation.
evals/prompts/math/standards-alignment/coarse-filter-user.txt	User prompt template for coarse relevance filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.

Evaluates whether assessment questions align to CCSS math standards by checking alignment at the learning-component level via the LC Knowledge Graph API (see PR #97 for the KG client layer). Three evaluation methods: - evaluate(question, grade, statementCode) — single pair, primitive - evaluateQuestionBank(questions[], statementCodes[]) — M×N cross-product with optional coarse filter (useCoarseFilter: true) - evaluateByGrade(questions[], grade) — grade-level discovery wrapper Model, temperature, supported grades, and max question length all read from config.json / input_schema.json. platformApiKey doubles as partnerKey for telemetry when not set separately.

czi-fsisenda

Nice! 🚀
Just one P0 and a few P1s and even P2s.

czi-fsisenda · 2026-06-10T19:08:10Z

+// ---------------------------------------------------------------------------
+// Public types
+// ---------------------------------------------------------------------------


[P1] Moving these to a separate file will make this file less intimidating 😅

czi-fsisenda · 2026-06-10T19:13:20Z

+    // If partnerKey isn't set, fall back to platformApiKey — same LC platform key,
+    // different API surfaces (KG vs telemetry).


[P0] We should have these set separately and intentionally even if they have the same value

czi-fsisenda · 2026-06-10T19:15:38Z

+    this.detailProvider = this.createConfiguredProvider(
+      Provider.OpenAI,
+      DETAIL_MODEL,
+      config.openaiApiKey,
+    );
+
+    this.coarseProvider = this.createConfiguredProvider(
+      Provider.OpenAI,
+      config.coarseFilterModel ?? DETAIL_MODEL,
+      config.openaiApiKey,
+    );
+  }


[P1] Does this mean this eval only works with OpenAI? No model override?

czi-fsisenda · 2026-06-10T19:32:21Z

+      this.sendTelemetry({
+        status: 'success',
+        latencyMs,
+        textLength: question.length,
+        grade,
+        provider: this.detailProvider.label,
+        tokenUsage,
+        metadata: { stage_details: stageDetails },
+        inputText: question,
+      }).catch(() => undefined);


[P1] This makes a good case for more flexible telemetry. We don't capture standards details like statement codes or learning components.

czi-fsisenda · 2026-06-11T01:42:45Z

+      // Fetch standard descriptions concurrently so the model has real content to
+      // reason about rather than opaque codes like "3.MD.C.7.d".
+      const infos = await Promise.all(
+        statementCodes.map((code) => this.kgClient.getStandardInfo(code).catch(() => ({ uuid: '', description: undefined }))),
+      );


[P2] We should cache these

czi-fsisenda · 2026-06-11T01:45:56Z

+      typeof q === 'string' ? { question: q, grade } : q,
+    );
+
+    const academicStandards = await this.kgClient.getStandardsByGrade(grade);


[P2] We should cache these

czi-fsisenda

Chatted offline.
P0 isn't blocking. We don't want to introduce breaking changes yet. We'll address when we transition into 1.x versions.

adnanrhussain requested review from a team, aychi1, Copilot, czi-fsisenda and georgemelvin June 4, 2026 00:13

Copilot started reviewing on behalf of adnanrhussain June 4, 2026 00:13 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

adnanrhussain requested a review from Copilot June 8, 2026 22:26

Copilot started reviewing on behalf of adnanrhussain June 8, 2026 22:26 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

adnanrhussain mentioned this pull request Jun 10, 2026

feat(ts-sdk): Knowledge Graph client layer #97

Open

3 tasks

adnanrhussain changed the base branch from main to ahussain/kg-client June 10, 2026 03:39

adnanrhussain force-pushed the ahussain/math-standards-alignment branch from 15582de to 0dc2581 Compare June 10, 2026 05:04

czi-fsisenda requested changes Jun 11, 2026

View reviewed changes

czi-fsisenda approved these changes Jun 11, 2026

View reviewed changes

		// If partnerKey isn't set, fall back to platformApiKey — same LC platform key,
		// different API surfaces (KG vs telemetry).

Conversation

adnanrhussain commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Three evaluation methods

Key design decisions

Test plan

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

czi-fsisenda left a comment

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adnanrhussain commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading