Diting is a domain-driven AI application evaluation system built with TypeScript and Next.js. It provides comprehensive tools for managing evaluation datasets, configuring evaluation targets, and running evaluation experiments with detailed analytics.
- Domain-Driven Design: Clean architecture with clear domain boundaries and business logic
- Evaluation Dataset Management: Support for creating, managing, and querying evaluation datasets with version control
- Multiple Evaluation Targets: Support for Mock, HTTP, and Function-based AI applications
- Flexible Evaluators: Built-in accuracy and semantic similarity evaluators with custom evaluator support
- Experiment Management: Complete evaluation experiment workflow with progress tracking and FIFO-based experiment flow control
- Type Safety: Fully developed in TypeScript with complete type definitions
- Monorepo Architecture: Organized with packages for global types, service layer, and web components
- Manages collections of evaluation data with user input, expected output, context, and metadata
- Supports multiple file formats (JSONL, Excel) and HuggingFace dataset import/export
- Automatic generation from prompts and corpora
- Multi-version control and data querying capabilities
- Represents AI applications to be evaluated
- Supports three types: Mock, HTTP, and Function
- Provides unified invocation interface
- Implements evaluation logic
- Built-in accuracy and semantic similarity evaluators
- Support for custom evaluator extensions
- Coordinates the entire evaluation workflow
- Implements standard evaluation steps
- Progress tracking and result aggregation
- Provides experiment flow control
- Batch processing and retry mechanisms
- Detailed statistics and reporting
- Data persistence for business entities
- Hook/callback-based tracing for evaluation tasks
diting/
├── packages/
│ ├── global/ # Shared types, constants, and utilities
│ ├── service/ # Domain models and business logic
│ └── web/ # React components and UI utilities
├── projects/
│ └── app/ # Next.js main application
└── examples/ # Usage examples and demonstrations
- Node.js 18+
- pnpm 8+
- Clone the repository:
git clone <repository-url>
cd diting- Install dependencies:
pnpm install- Build packages:
pnpm run build:packages- Start the development server:
pnpm run devThe application will be available at http://localhost:3000.
pnpm run dev- Start development serverpnpm run build- Build all packages and the applicationpnpm run typecheck- Run TypeScript type checkingpnpm run lint- Run ESLint
import {
EvaluationDataset,
EvalTarget,
Evaluator,
EvalChain,
EvalExperiment
} from '@diting/service';
import { generateId } from '@diting/global';
// Create evaluation dataset
const dataset = new EvaluationDataset({
name: 'QA Dataset',
version: '1.0.0',
data: [
{
id: generateId(),
user_input: 'What is the capital of France?',
expected_output: 'Paris',
context: 'Geography question'
}
],
source_type: 'manual'
});
// Create evaluation target
const target = new EvalTarget({
name: 'Mock AI',
config: {
type: 'mock',
config: {
responses: [{ output: 'Paris' }]
}
}
});
// Create evaluator
const evaluator = new Evaluator({
name: 'Accuracy',
config: { type: 'accuracy' }
});
// Run evaluation
const chain = new EvalChain(target, [evaluator]);
const result = await chain.execute(dataset.data[0]);
console.log('Evaluation result:', result);GET /api/datasets- List all datasetsPOST /api/datasets- Create new datasetGET /api/datasets/:id- Get dataset by IDPUT /api/datasets/:id- Update datasetDELETE /api/datasets/:id- Delete dataset
GET /api/dashboard/stats- Get dashboard statistics
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
[Add your license information here]