Public case study for the Urja Critic — a cross-provider taste auditor inside the Urja Visual API. The LLM that grades is, by policy, never the same provider as the LLM that generated.
Part of a 4-product solo studio by Omkar Jaliparthi. Kriya (109+ endpoint commercial astronomy API · npm · PyPI · Go) · Urja Critic (you are here) · Netra (BFF-over-Kriya workbench · live) · Insights by Omkar (consumer SaaS · live).
The Urja Critic is the evaluator inside Urja, the Insights by Omkar Visual API. It scores generated artifacts — illustrations, glyphs, character renders, scene compositions — against weighted rubrics anchored to a public-domain (pre-1929) reference corpus. Its one defining design decision: the evaluator and the generator are different LLM providers, by design. The provider mapping is a one-line policy — judge.provider !== generator.provider. Same-family bias becomes an architectural impossibility, not a runtime concern.
This is rare in 2026. Most LLM-as-judge pipelines run a single provider on both sides, inheriting documented self-preference bias. The mainstream eval harnesses (LangSmith, Braintrust, OpenAI Evals, Anthropic Evals) treat cross-provider grading as opt-in glue, not a structural guarantee. The Critic encodes the guarantee into the type system, ships it behind a metered API, and integrates into CI as a pass/fail gate. It is a policy layer on top of any harness — not a competitor to one.
| Product | Cross-provider taste auditor · POST /api/v1/critic/critique · live at urja.insightsbyomkar.com |
| Coverage | 4 rubric tracks (character-figure, character-part, illustrate-glyph, ambient-effect) · 2 reference sets at v1.0 (great-dane, canine-animated) · adding 1 rubric / 1 reference set per release wave |
| Architecture | Cross-provider by design · weighted-rubric · 0–10 anchored scales at 2 / 5 / 8 / 10 · severity-tagged failings with proposedTargetValue for auto-refine |
| Reference corpus | Public-domain pre-1929 line · text + landmark targets + image plates · explicit exclusion list of in-copyright modern authors |
| Integration | GitHub Actions (lucky-critique.yml) · PR-comment score table · pass/fail gate vs threshold · nightly regression sweep |
| Stack | Next.js 16 · App Router · TypeScript strict · scoped JWT auth · Upstash rate-limit · provider adapters behind a single interface |
| Status | v0.8 · Spring 2026 launch promo live · beta stability on the API surface · urja-client v0.2.0 published to npm |
| Parent business | Omkar's Holistic Services LLC (DBA Insights by Omkar) · formed May 2023 |
- Problem & users — why cross-provider eval, who needs it, what the incumbents miss
- Architecture — generator → output → opposite-provider judge → rubric runner → score envelope → CI
- Decision · cross-provider by design — the headline decision · why independence is structural, not optional
- Decision · rubric design — weighted axes, anchored scales, calibration, parser contract
- Decision · public-domain reference corpus — pre-1929 line · the copyright trap modern evaluators inherit
- Build vs buy — vs LangSmith, Braintrust, OpenAI Evals, Anthropic Evals, naive LLM-as-judge, human-only review
- CI integration — PR-comment scoring, pass/fail gates, the developer experience
- Outcomes & lessons — what worked, what didn't, what's next
- Worked example — end-to-end critique of a generated illustration · per-axis scores, evidence, cross-provider tag
- Live API · Urja · Pricing · Critic track docs
- urja-client · npm — type-safe URL builders + React ThemeProvider for Urja (v0.2.0)
- Kriya case study — sibling commercial astronomy API · 109+ endpoints from first principles
- Parent SaaS case study — the consumer product that dogfoods both Urja and Kriya
- Insights by Omkar — the consumer SaaS · Urja's first paying customer
| Product | Program | Engineering | Business |
|---|---|---|---|
|
|
|
|
This is a write-up. Not a release.
- No proprietary code. No source from
lib/critic/. No route handlers. No prompt templates. - No real rubric weights. The illustrative weights in this case study are representative, not the ones running in production.
- No engine internals. The cache-key composition, the parser's failure-recovery rules, and the provider-adapter implementations stay private.
- No customer list. Use cases discussed here are hypothetical, not commitments.
- No specific provider versions used internally beyond what's already on the public pricing page.
The architecture, the policy, the rubric grammar, and the build-vs-buy reasoning are public. The implementation is not. That separation is deliberate — what's defensible about the Critic is the policy, and the policy is what this case study is for.
Hiring Senior PM · AI Platforms · Founding Platform PM (Series B+)?
admin@insightsbyomkar.com · LinkedIn · GitHub