confident-ai · theanuragg · Jun 24, 2026 · Jun 24, 2026 · Jun 25, 2026
diff --git a/typescript/README.md b/typescript/README.md
@@ -1,30 +1,161 @@
 # DeepEval for TypeScript
 
-> **Status:** Initial version shipping **June 5th**.
+DeepEval for TypeScript brings the full DeepEval workflow into the JavaScript and TypeScript ecosystem, including local LLM evaluation, 40+ metrics, synthetic data generation, prompt optimization, and full Confident AI platform integration.
 
-DeepEval for TypeScript brings the DeepEval workflow into the JavaScript and TypeScript ecosystem, starting with the Confident AI platform features teams already use to manage datasets, prompts, and evaluation reporting.
+## Feature Parity (June 2026)
 
-This package is designed for TypeScript teams that want first-class access to DeepEval workflows that integrate with Confident AI from the same language they use to build their applications.
+The TypeScript SDK now provides near-complete parity with the Python package:
 
-## What TypeScript Supports
+### Local Evaluation Models
+- 11+ model providers: OpenAI, Azure, Anthropic, Gemini, Bedrock, DeepSeek, Grok, Kimi, Local, Ollama, AISDK
+- `ModelFactory` for auto-detecting providers from model name prefixes
+- `DeepEvalBaseEmbeddingModel` with OpenAI embedding support
+- All models backed by the `DeepEvalBaseLLM` abstract class with `generate<T>(prompt, schema?)`
 
-The initial TypeScript SDK focuses on the Confident AI API surface, including:
+### 40+ Metrics (Complete Parity)
+- **RAG**: Faithfulness, Hallucination, AnswerRelevancy, ContextualPrecision/Recall/Relevancy
+- **Safety**: Bias, Toxicity, PII Leakage, NonAdvice, Misuse, RoleViolation
+- **Agent**: TaskCompletion, ToolUse, ToolCorrectness, PlanAdherence, PlanQuality, StepEfficiency, GoalAccuracy, ArgumentCorrectness
+- **Quality**: Summarization, PromptAlignment, JsonCorrectness, ExactMatch, PatternMatch
+- **Conversational**: TurnRelevancy, TurnFaithfulness, ConversationCompleteness, KnowledgeRetention, RoleAdherence, TopicAdherence, ConversationalGEval
+- **Arena**: ArenaGEval with multi-contestant comparison
+- **MCP**: MCPUseMetric, MCPTaskCompletion, MultiTurnMCPUse
+- **Multimodal**: ImageCoherence, ImageHelpfulness, ImageReference, TextToImage, ImageEditing
+- **General**: GEval with custom criteria + rubrics
 
-- Pushing and pulling datasets
-- Running and reporting evaluations through Confident AI
-- Reading/writing prompts and prompt versions
-- Other Confident AI platform interactions
+All metrics share template definitions with Python (Jinja2 → Nunjucks, using the same `templates.json`).
 
-Local execution features, such as LLM-as-a-judge metrics, NLP models, and fully local evaluation, currently remain in the Python package while we expand TypeScript support.
+### Unit-Test Workflow
+- `evaluate()` — run metrics over test cases with progress bars, reporting, and caching
+- `assertTest()` — call from Jest/Vitest tests; throws detailed `AssertionError` on failure
+- `deepeval test run` CLI command — runs Jest test files and posts results to Confident AI
+- `compare()` — arena-style comparison of contestant outputs
 
-## Roadmap
+### Synthetic Data Generation
+- `Synthesizer` class — generate goldens from documents, contexts, scratch, or existing goldens
+- Evolution types: Reasoning, MultiContext, Concretizing, Constrained, Comparative, Hypothetical, InBreadth
+- Configurable filtration and evolution distribution
+- Supports both single-turn and conversational goldens
 
-Our next milestone is to reach **80% feature parity** across the Confident AI integration surface by the **end of July**. This includes:
+### Prompt Optimization
+- `PromptOptimizer` — evolutionary prompt improvement using evaluation metrics
+- Configurable iterations, minibatch size, Pareto set, and patience-based early stopping
+- Automatic feedback generation and prompt rewriting
 
-- **Shared prompt templates** — one source of truth for prompt templates, consumed by both Python and TypeScript so the implementations stay aligned.
-- **TypeScript-native APIs** — equivalents for the relevant Python functions and classes, shaped to feel natural in TypeScript while staying familiar to DeepEval users.
-- **Dedicated TypeScript docs** — TypeScript examples and guides alongside the existing Python documentation.
+### Confident AI Integration
+- Full API client with multi-region support (US/EU/AU), retry logic
+- Dataset CRUD, test run posting, experiment management
+- Prompt management with versioning, branching, and labels
+- Tracing with OpenTelemetry
+- Governance assessment
+
+## Quick Start
+
+### Installation
+
+```bash
+npm install deepeval
+```
+
+### Set up your model
+
+```typescript
+import { OpenAIModel } from "deepeval";
+
+const model = new OpenAIModel({ model: "gpt-4o" });
+```
+
+Or use the factory to auto-detect providers:
+
+```typescript
+import { ModelFactory } from "deepeval/models";
+
+const model = ModelFactory.createLLM({ model: "gpt-4o" });
+const local = ModelFactory.createLLM({
+  model: "my-model",
+  provider: "local",
+  baseURL: "http://localhost:8000/v1",
+});
+```
+
+### Run a metric
+
+```typescript
+import { FaithfulnessMetric, LLMTestCase } from "deepeval";
+
+const metric = new FaithfulnessMetric({ model: "gpt-4o" });
+const testCase = new LLMTestCase({
+  input: "What is the capital of France?",
+  actualOutput: "Paris is the capital of France.",
+  retrievalContext: ["France is a country in Europe."],
+});
+
+await metric.measure(testCase);
+console.log(metric.score, metric.reason);
+```
+
+### Write eval tests (Jest/Vitest)
+
+```typescript
+import { assertTest, LLMTestCase, ExactMatchMetric } from "deepeval";
+
+test("response should exactly match expected", async () => {
+  await assertTest({
+    testCase: new LLMTestCase({
+      input: "What is 2+2?",
+      actualOutput: "4",
+      expectedOutput: "4",
+    }),
+    metrics: [new ExactMatchMetric({ threshold: 1 })],
+  });
+});
+```
+
+Run with: `npx deepeval test run`
+
+### Generate synthetic data
+
+```typescript
+import { Synthesizer, OpenAIModel } from "deepeval";
+
+const synth = new Synthesizer(new OpenAIModel());
+const goldens = await synth.generateGoldensFromContexts([
+  ["Paris is the capital of France."],
+]);
+```
+
+### Optimize a prompt
+
+```typescript
+import { PromptOptimizer } from "deepeval";
+
+const optimizer = new PromptOptimizer({
+  modelCallback: async (prompt, golden) => {
+    const rendered = prompt.interpolate({ input: golden.input }) as string;
+    const { output } = await model.generate(rendered);
+    return output;
+  },
+  metrics: [new FaithfulnessMetric()],
+});
+
+const report = await optimizer.optimize(prompt, goldens);
+console.log("Best score:", report.logs[0]?.before, "→", report.logs[0]?.after);
+```
+
+## Submodule Imports
+
+```typescript
+import { ... } from "deepeval/metrics";     // All metric classes
+import { ... } from "deepeval/models";       // Model classes + factory
+import { ... } from "deepeval/evaluate";     // evaluate, assertTest
+import { ... } from "deepeval/dataset";      // Dataset management
+import { ... } from "deepeval/prompt";       // Prompt management
+import { ... } from "deepeval/synthesizer";  // Synthetic data generation
+import { ... } from "deepeval/optimizer";    // Prompt optimization
+import { ... } from "deepeval/tracing";      // OpenTelemetry tracing
+import { ... } from "deepeval/confident";    // Confident AI client
+```
 
 ## Python vs TypeScript
 
-Python remains DeepEval's most complete implementation and the first place new local evaluation capabilities will land. TypeScript complements that foundation by making DeepEval workflows that integrate with Confident AI available to JavaScript and TypeScript teams, with a clear path toward broader feature coverage.
+The TypeScript SDK aims for full API parity with the Python package while feeling natural in TypeScript (strong typing, interfaces, generics, discriminated unions). Shared resources like metric templates are compiled from a single source of truth.
diff --git a/typescript/eslint.config.mts b/typescript/eslint.config.mts
@@ -6,8 +6,9 @@ import globals from "globals";
 import tseslint from "typescript-eslint";
 
 export default defineConfig([
+  { ignores: ["dist/**"] },
   {
-    ignores: ["dist/**", "test/**"],
+    ignores: ["test/**"],
     files: ["**/*.{js,mjs,cjs,ts,mts,cts}"],
     languageOptions: {
       globals: globals.node,

diff --git a/typescript/examples/evaluate/example-evaluate.ts b/typescript/examples/evaluate/example-evaluate.ts
@@ -1,17 +1,16 @@
-import { LLMTestCase, ToolCall } from "../../src/test-case";
-import { evaluate } from "../../src/confident/evaluate";
+import { LLMTestCase, ToolCall, evaluate, ExactMatchMetric } from "../../src";
 
 async function main() {
   const testCase1 = new LLMTestCase({
     input: "What is the capital of Germany?",
-    actualOutput: "Berlin is the capital of Germany.",
+    actualOutput: "Berlin",
     expectedOutput: "Berlin",
     context: ["Geography", "Europe"],
     retrievalContext: ["Germany is a country in Central Europe."],
   });
   const testCase2 = new LLMTestCase({
     input: "What is the formula for water?",
-    actualOutput: "The chemical formula for water is H2O.",
+    actualOutput: "H2O",
     expectedOutput: "H2O",
     context: ["Chemistry", "Molecules"],
     retrievalContext: [
@@ -27,25 +26,21 @@ async function main() {
   });
   const testCase3 = new LLMTestCase({
     input: "What is the chemical formula for water?",
-    actualOutput: "The chemical formula for water is H2O.",
+    actualOutput: "H2O",
     expectedOutput: "H2O",
     context: ["Chemistry"],
     retrievalContext: ["Water is composed of hydrogen and oxygen"],
     additionalMetadata: { source: "chemistry textbook" },
     comments: "Example with tool calls",
     toolsCalled: [toolCall],
   });
-  const testCases = [testCase1, testCase2, testCase3];
 
-  try {
-    const metricCollection = "New Collection";
-    await evaluate({
-      metricCollection,
-      llmTestCases: testCases,
-    });
-  } catch (error: any) {
-    console.error("Error evaluating test cases:", error);
-  }
+  const metric = new ExactMatchMetric({ threshold: 1 });
+  const result = await evaluate([testCase1, testCase2, testCase3], [metric], {
+    displayConfig: { showIndicator: true, printResults: true },
+  });
+
+  console.log(`Passed: ${result.testResults.filter((r) => r.success).length}/${result.testResults.length}`);
 }
 
 main().catch((error) => {

diff --git a/typescript/package-lock.json b/typescript/package-lock.json
diff --git a/typescript/package.json b/typescript/package.json
@@ -43,6 +43,36 @@
       "require": "./dist/openai/index.js",
       "types": "./dist/openai/index.d.ts"
     },
+    "./models": {
+      "import": "./dist/models/index.js",
+      "require": "./dist/models/index.js",
+      "types": "./dist/models/index.d.ts"
+    },
+    "./metrics": {
+      "import": "./dist/metrics/index.js",
+      "require": "./dist/metrics/index.js",
+      "types": "./dist/metrics/index.d.ts"
+    },
+    "./evaluate": {
+      "import": "./dist/evaluate/index.js",
+      "require": "./dist/evaluate/index.js",
+      "types": "./dist/evaluate/index.d.ts"
+    },
+    "./prompt": {
+      "import": "./dist/prompt/index.js",
+      "require": "./dist/prompt/index.js",
+      "types": "./dist/prompt/index.d.ts"
+    },
+    "./synthesizer": {
+      "import": "./dist/synthesizer/index.js",
+      "require": "./dist/synthesizer/index.js",
+      "types": "./dist/synthesizer/index.d.ts"
+    },
+    "./optimizer": {
+      "import": "./dist/optimizer/index.js",
+      "require": "./dist/optimizer/index.js",
+      "types": "./dist/optimizer/index.d.ts"
+    },
     "./integrations/ai-sdk": {
       "import": "./dist/integrations/ai-sdk/index.js",
       "require": "./dist/integrations/ai-sdk/index.js",
@@ -84,6 +114,24 @@
       "openai": [
         "dist/openai/index.d.ts"
       ],
+      "models": [
+        "dist/models/index.d.ts"
+      ],
+      "metrics": [
+        "dist/metrics/index.d.ts"
+      ],
+      "evaluate": [
+        "dist/evaluate/index.d.ts"
+      ],
+      "prompt": [
+        "dist/prompt/index.d.ts"
+      ],
+      "synthesizer": [
+        "dist/synthesizer/index.d.ts"
+      ],
+      "optimizer": [
+        "dist/optimizer/index.d.ts"
+      ],
       "integrations/ai-sdk": [
         "dist/integrations/ai-sdk/index.d.ts"
       ],