A TypeScript library for AI-powered data extraction and schema validation with integrated web server capabilities.
SchemaLoom provides a robust framework for extracting structured data from unstructured text using AI models. The library combines the power of LangChain and Google Gemini with flexible schema validation through Zod, offering both programmatic and web-based interfaces.
- AI-Powered Extraction: Leverage Google Gemini models for intelligent data extraction
- Schema Validation: Comprehensive schema definition and validation using Zod
- Flexible Chunking: Configurable text chunking with overlap control
- Multiple Interfaces: Use as a library or deploy as a web service
- Type Safety: Full TypeScript support with comprehensive type definitions
- Extensible Architecture: Support for custom schemas and extraction logic
- Production Ready: Built-in error handling, validation, and monitoring
npm install schemaloomimport { SchemaLoomExtractor, GeminiProvider, EventListSchema } from 'schemaloom';
// Initialize extractor with custom configuration
const extractor = new SchemaLoomExtractor({
chunkSize: 100000,
chunkOverlap: 0,
temperature: 0
});
// Configure AI provider
const provider = new GeminiProvider({
model: "gemini-2.5-flash",
temperature: 0
});
// Extract structured data
const result = await extractor.extract(
"Your text content here...",
EventListSchema,
provider.getLLM()
);
console.log(result.data); // Extracted data
console.log(result.metadata); // Processing informationimport { startServer } from 'schemaloom/server';
// Start server with custom configuration
startServer({
port: 3000,
host: 'localhost'
});The primary extraction engine that handles text processing, chunking, and AI model interaction.
interface ExtractionOptions {
chunkSize?: number; // Default: 100000
chunkOverlap?: number; // Default: 0
temperature?: number; // Default: 0
model?: string; // Default: "gemini-2.5-flash"
}extract<T>(text: string, schema: z.ZodSchema<T>, provider: any): Promise<ExtractionResult<T>>extractBatch<T>(chunks: TextChunk[], schema: z.ZodSchema<T>, provider: any): Promise<ExtractionResult<T[]>>getOptions(): ExtractionOptionsupdateOptions(newOptions: Partial<ExtractionOptions>): void
Manages Google Gemini AI model interactions with configurable parameters.
interface GeminiOptions {
model?: string; // Default: "gemini-2.5-flash"
temperature?: number; // Default: 0
}getLLM(): ChatGoogleGenerativeAIupdateConfig(options: Partial<GeminiOptions>): void
The library includes several pre-built schemas for common use cases:
const EventSchema = z.object({
name: z.string(),
date: z.string(),
place: z.string()
});const ProductSchema = z.object({
name: z.string(),
price: z.number(),
category: z.string(),
description: z.string().optional(),
brand: z.string().optional(),
sku: z.string().optional()
});const ArticleSchema = z.object({
title: z.string(),
author: z.string().optional(),
publishDate: z.string().optional(),
content: z.string().optional(),
tags: z.array(z.string()).optional(),
summary: z.string().optional()
});const ContactSchema = z.object({
name: z.string(),
email: z.string().optional(),
phone: z.string().optional(),
company: z.string().optional(),
position: z.string().optional(),
address: z.string().optional()
});const InvoiceSchema = z.object({
invoiceNumber: z.string(),
date: z.string(),
dueDate: z.string().optional(),
customer: z.string(),
items: z.array(z.object({
description: z.string(),
quantity: z.number(),
unitPrice: z.number(),
total: z.number()
})),
subtotal: z.number(),
tax: z.number().optional(),
total: z.number()
});GET /health- Health check endpointGET /extract- Schema and parameter documentationPOST /extract- Data extraction using predefined schemasPOST /extract/custom- Data extraction using custom schema definitions
POST /extract?schema=product&chunkSize=50000&chunkOverlap=1000&temperature=0.1
Content-Type: multipart/form-data
file: [your-file]POST /extract/custom
Content-Type: application/json
{
"content": "Your text content here...",
"schemaDefinition": {
"type": "object",
"properties": {
"name": {"type": "string"},
"value": {"type": "number"},
"active": {"type": "boolean"}
}
},
"chunkSize": 50000,
"temperature": 0
}import { createPipeline } from 'schemaloom/server';
import { Hono } from 'hono';
// Create base pipeline
const baseApp = createPipeline();
// Extend with custom functionality
const customApp = new Hono();
// Add custom routes
customApp.get('/status', (c) => c.json({ status: 'healthy' }));
// Mount extraction routes
customApp.route('/api', baseApp);
// Add middleware
customApp.use('*', async (c, next) => {
console.log(`${c.req.method} ${c.req.url}`);
await next();
});const chunks = [
{ content: "Text chunk 1", index: 0 },
{ content: "Text chunk 2", index: 1 }
];
const result = await extractor.extractBatch(
chunks,
CustomSchema,
provider.getLLM()
);import { SchemaRegistry } from 'schemaloom';
// Access predefined schemas
const productSchema = SchemaRegistry.product;
const articleSchema = SchemaRegistry.article;
// Use in extraction
const result = await extractor.extract(
content,
productSchema,
provider.getLLM()
);GOOGLE_API_KEY=your_google_api_key_hereinterface ServerOptions {
port?: number; // Default: 3000
host?: string; // Default: 'localhost'
cors?: boolean; // Default: false
}The library provides comprehensive error handling with detailed error messages and metadata:
interface ExtractionResult<T> {
data: T;
chunks: number;
processingTime: number;
errors?: string[];
}- Chunk Size: Larger chunks reduce API calls but may impact accuracy
- Chunk Overlap: Overlap preserves context between chunks
- Temperature: Lower values (0-0.3) for factual extraction, higher (0.7-1.0) for creative tasks
- Model Selection: Choose models based on accuracy vs. speed requirements
# Install dependencies
npm install
# Build library
npm run build
# Development mode with watch
npm run dev
# Start server
npm startFull TypeScript support with comprehensive type definitions:
import type {
ExtractionOptions,
ExtractionResult,
TextChunk,
ExtractionProvider,
ServerOptions
} from 'schemaloom';- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details.
For issues and questions, please use the GitHub issue tracker or refer to the documentation.