Skip to content

feat(ts-sdk): Math Standards Alignment Evaluator#91

Open
adnanrhussain wants to merge 1 commit into
ahussain/kg-clientfrom
ahussain/math-standards-alignment
Open

feat(ts-sdk): Math Standards Alignment Evaluator#91
adnanrhussain wants to merge 1 commit into
ahussain/kg-clientfrom
ahussain/math-standards-alignment

Conversation

@adnanrhussain

@adnanrhussain adnanrhussain commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds MathStandardsAlignmentEvaluator to the TypeScript SDK — evaluates whether assessment questions align to CCSS math standards by checking alignment at the learning-component level via the Learning Commons Knowledge Graph API.

Depends on PR #97 (ahussain/kg-client) which must merge first.

Three evaluation methods

Method Use case
evaluate(question, grade, statementCode) Single question × single standard (primitive)
evaluateQuestionBank(questions[], statementCodes[]) True M×N cross-product — coverage / gap analysis
evaluateByGrade(questions[], grade) Wraps question bank, fetches all CCSS math standards for a grade from KG

Key design decisions

  • Batched LLM calls: all learning components for a standard evaluated in a single structured-output call — O(M×N) not O(M×N×LC)
  • useCoarseFilter (default false): opt-in pre-filter that sends standard descriptions to a fast LLM call to skip clearly irrelevant pairs before full LC evaluation
  • grade is explicit on every call — validated as a supported grade level (K–12). High school standards (HSA, HSF, HSG, etc.) are evaluated with grade '9'/'10'/'11'/'12' since that is how the KG API indexes them.
  • Model, temperature, supported grades, and max question length all read from config.json / input_schema.json
  • platformApiKey doubles as partnerKey for telemetry when not set separately

Test plan

  • 244 unit tests pass (npm test -- --run tests/unit)
  • npx tsc --noEmit — 0 errors
  • npm run lint — 0 errors
  • Integration test validates alignment for full 3.MD.C.7 standard family with 0 coarse filter false negatives
  • Run integration: RUN_INTEGRATION_TESTS=true OPENAI_API_KEY=<key> PLATFORM_API_KEY=<key> npm test -- --run tests/integration/math-standards-alignment.integration.test.ts

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.50000% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...escript/src/evaluators/math/standards-alignment.ts 92.61% 13 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new TypeScript SDK evaluator (MathStandardsAlignmentEvaluator) that determines CCSS math standards alignment by retrieving learning components from the Learning Commons Knowledge Graph and running LC-level LLM judgments, with optional coarse filtering for cost control.

Changes:

  • Introduces Knowledge Graph access layer (API + JSON repositories, cached client) for resolving standards, learning components, and grade-level standard lists.
  • Adds Math standards alignment evaluator with single-standard, per-item, question-bank (M×N), and grade-based evaluation entrypoints.
  • Adds unit + integration tests and colocated prompts/schemas/config for the evaluator.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdks/typescript/src/evaluators/math/standards-alignment.ts Implements the new evaluator, including coarse filter + question bank modes.
sdks/typescript/src/knowledge-graph/repository.ts Adds KG API + JSON repositories for standards/LC retrieval.
sdks/typescript/src/knowledge-graph/client.ts Adds promise-cached KG client with concurrency limiting and eviction-on-rejection.
sdks/typescript/src/knowledge-graph/types.ts Defines KG domain types + parseGradeFromStandard.
sdks/typescript/src/knowledge-graph/index.ts Re-exports KG client/repositories/types from a single entrypoint.
sdks/typescript/src/schemas/math/standards-alignment.ts Adds Zod schemas for detail LC eval + coarse filter outputs.
sdks/typescript/src/prompts/math/standards-alignment/index.ts Wires prompt templates/config/schema into TS for runtime prompt construction.
sdks/typescript/src/errors.ts Introduces KnowledgeGraphError.
sdks/typescript/src/evaluators/base.ts Adds platformApiKey to base config surface area.
sdks/typescript/src/evaluators/index.ts Exports the new evaluator + types from evaluators barrel.
sdks/typescript/src/index.ts Exposes the evaluator + KG utilities/errors from the SDK root.
sdks/typescript/tests/unit/evaluators/math-standards-alignment.test.ts Unit coverage for evaluator behavior across modes.
sdks/typescript/tests/unit/knowledge-graph/repository.test.ts Unit coverage for KG repositories and error mapping.
sdks/typescript/tests/unit/knowledge-graph/client.test.ts Unit coverage for KG client caching/eviction and composition helpers.
sdks/typescript/tests/integration/math-standards-alignment.integration.test.ts End-to-end integration test for the 3.MD.C.7 standard family.
sdks/typescript/.env.test.example Documents required env var for KG integration tests.
evals/prompts/math/standards-alignment/config.json Evaluator/prompt configuration (model, temperature, steps).
evals/prompts/math/standards-alignment/input_schema.json Input schema for question/grade/statementCode.
evals/prompts/math/standards-alignment/output_schema.json Output schema for LC-level alignment results.
evals/prompts/math/standards-alignment/system.txt System prompt defining alignment criteria.
evals/prompts/math/standards-alignment/user.txt User prompt template for LC batch evaluation.
evals/prompts/math/standards-alignment/coarse-filter-user.txt User prompt template for coarse relevance filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdks/typescript/src/errors.ts
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts Outdated
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts Outdated
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts Outdated
Comment thread sdks/typescript/tests/integration/math-standards-alignment.integration.test.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.

Comment thread sdks/typescript/src/errors.ts
Comment thread sdks/typescript/src/knowledge-graph/repository.ts Outdated
Comment thread sdks/typescript/src/knowledge-graph/repository.ts Outdated
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts
Comment thread sdks/typescript/src/knowledge-graph/repository.ts Outdated
@adnanrhussain adnanrhussain changed the base branch from main to ahussain/kg-client June 10, 2026 03:39
Evaluates whether assessment questions align to CCSS math standards
by checking alignment at the learning-component level via the LC
Knowledge Graph API (see PR #97 for the KG client layer).

Three evaluation methods:
- evaluate(question, grade, statementCode) — single pair, primitive
- evaluateQuestionBank(questions[], statementCodes[]) — M×N cross-product
  with optional coarse filter (useCoarseFilter: true)
- evaluateByGrade(questions[], grade) — grade-level discovery wrapper

Model, temperature, supported grades, and max question length all
read from config.json / input_schema.json. platformApiKey doubles as
partnerKey for telemetry when not set separately.
@adnanrhussain adnanrhussain force-pushed the ahussain/math-standards-alignment branch from 15582de to 0dc2581 Compare June 10, 2026 05:04

@czi-fsisenda czi-fsisenda left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀
Just one P0 and a few P1s and even P2s.

Comment on lines +21 to +23
// ---------------------------------------------------------------------------
// Public types
// ---------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Moving these to a separate file will make this file less intimidating 😅

Comment on lines +112 to +113
// If partnerKey isn't set, fall back to platformApiKey — same LC platform key,
// different API surfaces (KG vs telemetry).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] We should have these set separately and intentionally even if they have the same value

Comment on lines +125 to +136
this.detailProvider = this.createConfiguredProvider(
Provider.OpenAI,
DETAIL_MODEL,
config.openaiApiKey,
);

this.coarseProvider = this.createConfiguredProvider(
Provider.OpenAI,
config.coarseFilterModel ?? DETAIL_MODEL,
config.openaiApiKey,
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Does this mean this eval only works with OpenAI? No model override?

Comment on lines +208 to +217
this.sendTelemetry({
status: 'success',
latencyMs,
textLength: question.length,
grade,
provider: this.detailProvider.label,
tokenUsage,
metadata: { stage_details: stageDetails },
inputText: question,
}).catch(() => undefined);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] This makes a good case for more flexible telemetry. We don't capture standards details like statement codes or learning components.

Comment on lines +466 to +470
// Fetch standard descriptions concurrently so the model has real content to
// reason about rather than opaque codes like "3.MD.C.7.d".
const infos = await Promise.all(
statementCodes.map((code) => this.kgClient.getStandardInfo(code).catch(() => ({ uuid: '', description: undefined }))),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] We should cache these

typeof q === 'string' ? { question: q, grade } : q,
);

const academicStandards = await this.kgClient.getStandardsByGrade(grade);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] We should cache these

@czi-fsisenda czi-fsisenda left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline.
P0 isn't blocking. We don't want to introduce breaking changes yet. We'll address when we transition into 1.x versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants