Skip to content

ChanZany/diting-ts

Repository files navigation

Diting - AI Evaluation Platform

Diting is a domain-driven AI application evaluation system built with TypeScript and Next.js. It provides comprehensive tools for managing evaluation datasets, configuring evaluation targets, and running evaluation experiments with detailed analytics.

Features

  • Domain-Driven Design: Clean architecture with clear domain boundaries and business logic
  • Evaluation Dataset Management: Support for creating, managing, and querying evaluation datasets with version control
  • Multiple Evaluation Targets: Support for Mock, HTTP, and Function-based AI applications
  • Flexible Evaluators: Built-in accuracy and semantic similarity evaluators with custom evaluator support
  • Experiment Management: Complete evaluation experiment workflow with progress tracking and FIFO-based experiment flow control
  • Type Safety: Fully developed in TypeScript with complete type definitions
  • Monorepo Architecture: Organized with packages for global types, service layer, and web components

Architecture

Core Components

1. Evaluation Dataset

  • Manages collections of evaluation data with user input, expected output, context, and metadata
  • Supports multiple file formats (JSONL, Excel) and HuggingFace dataset import/export
  • Automatic generation from prompts and corpora
  • Multi-version control and data querying capabilities

2. Evaluation Targets (EvalTarget)

  • Represents AI applications to be evaluated
  • Supports three types: Mock, HTTP, and Function
  • Provides unified invocation interface

3. Evaluators

  • Implements evaluation logic
  • Built-in accuracy and semantic similarity evaluators
  • Support for custom evaluator extensions

4. Evaluation Chain (EvalChain)

  • Coordinates the entire evaluation workflow
  • Implements standard evaluation steps
  • Progress tracking and result aggregation

5. Evaluation Tasks (EvalTask)

  • Provides experiment flow control
  • Batch processing and retry mechanisms
  • Detailed statistics and reporting

6. Infrastructure

  • Data persistence for business entities
  • Hook/callback-based tracing for evaluation tasks

Project Structure

diting/
├── packages/
│   ├── global/          # Shared types, constants, and utilities
│   ├── service/         # Domain models and business logic
│   └── web/            # React components and UI utilities
├── projects/
│   └── app/            # Next.js main application
└── examples/           # Usage examples and demonstrations

Getting Started

Prerequisites

  • Node.js 18+
  • pnpm 8+

Installation

  1. Clone the repository:
git clone <repository-url>
cd diting
  1. Install dependencies:
pnpm install
  1. Build packages:
pnpm run build:packages
  1. Start the development server:
pnpm run dev

The application will be available at http://localhost:3000.

Development Commands

  • pnpm run dev - Start development server
  • pnpm run build - Build all packages and the application
  • pnpm run typecheck - Run TypeScript type checking
  • pnpm run lint - Run ESLint

Usage Example

import { 
  EvaluationDataset, 
  EvalTarget, 
  Evaluator, 
  EvalChain,
  EvalExperiment 
} from '@diting/service';
import { generateId } from '@diting/global';

// Create evaluation dataset
const dataset = new EvaluationDataset({
  name: 'QA Dataset',
  version: '1.0.0',
  data: [
    {
      id: generateId(),
      user_input: 'What is the capital of France?',
      expected_output: 'Paris',
      context: 'Geography question'
    }
  ],
  source_type: 'manual'
});

// Create evaluation target
const target = new EvalTarget({
  name: 'Mock AI',
  config: {
    type: 'mock',
    config: {
      responses: [{ output: 'Paris' }]
    }
  }
});

// Create evaluator
const evaluator = new Evaluator({
  name: 'Accuracy',
  config: { type: 'accuracy' }
});

// Run evaluation
const chain = new EvalChain(target, [evaluator]);
const result = await chain.execute(dataset.data[0]);

console.log('Evaluation result:', result);

API Endpoints

Datasets

  • GET /api/datasets - List all datasets
  • POST /api/datasets - Create new dataset
  • GET /api/datasets/:id - Get dataset by ID
  • PUT /api/datasets/:id - Update dataset
  • DELETE /api/datasets/:id - Delete dataset

Dashboard

  • GET /api/dashboard/stats - Get dashboard statistics

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

[Add your license information here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors