AI Platform AWS

Production-ready AI platform for AWS. Gateway, SDK, RAG, Agents, and more.

Overview

AI Platform AWS is a unified API platform that routes AI/ML requests across multiple providers. It provides a single interface to interact with AWS Bedrock, OpenAI, and other AI providers with built-in caching, rate limiting, cost tracking, and streaming support.

Architecture

graph TD
    App[Your Application] --> SDK["@ai-platform-aws/sdk"]
    SDK --> Gateway["AI Gateway (ECS Fargate)"]
    Gateway --> Bedrock["AWS Bedrock<br/>Claude 3 | Titan"]
    Gateway --> OpenAI["OpenAI API<br/>GPT-4o | DALL-E"]
    Gateway --> Azure["Azure OpenAI<br/>Copilot Studio"]
    Gateway --> Others["Other APIs<br/>Cohere | Gemini"]
    Gateway --> Cache["Redis Cache"]
    Gateway --> CostDB["MongoDB<br/>Cost Tracking | Prompts"]
    Agents["@ai-platform-aws/agents"] --> SDK
    Agents --> Tools["Tool Registry<br/>HTTP | DB | Search | Code"]
    Agents --> Memory["Memory<br/>Conversation | Persistent"]
    RAG["@ai-platform-aws/rag"] --> VectorDB["MongoDB Atlas<br/>Vector Search"]
    RAG --> SDK

Editable source: architecture-overview.drawio -- open in draw.io

Dashboard

Examples

Looking for runnable, self-contained examples? Check out ai-platform-aws-examples - 7 standalone projects covering gateway, RAG, agents, streaming, cost tracking, and full-stack deployment.

Quick Start

Using Docker Compose

# Clone the repository
git clone https://github.com/tysoncung/ai-platform-aws.git
cd ai-platform-aws

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Start all services
docker-compose up -d

# Test health endpoint
curl http://localhost:3100/health

Local Development

# Prerequisites: Node.js 22+, pnpm 9+
pnpm install

# Generate OpenAPI types
pnpm nx run openapi:generate

pnpm build

# Start Redis and MongoDB
docker-compose up -d redis mongodb

# Start the gateway in dev mode
pnpm dev

SDK Usage

import { AIGateway } from '@ai-platform-aws/sdk';

const gateway = new AIGateway({
  baseUrl: 'http://localhost:3100',
  apiKey: 'your-api-key', // optional
});

// Simple completion
const response = await gateway.complete({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  model: 'claude-3-haiku',
});

console.log(response.content);
console.log(`Cost: $${response.usage.estimatedCost}`);

// Streaming
for await (const chunk of gateway.stream({
  messages: [{ role: 'user', content: 'Write a story' }],
  model: 'gpt-4o',
})) {
  process.stdout.write(chunk);
}

// Embeddings
const { embeddings } = await gateway.embed({
  input: ['hello world', 'goodbye world'],
  model: 'text-embedding-3-small',
});

// Classification
const result = await gateway.classify({
  input: 'I love this product!',
  labels: ['positive', 'negative', 'neutral'],
});

API Reference

`POST /v1/complete`

Generate a text completion.

{
  "model": "claude-3-haiku",
  "messages": [{ "role": "user", "content": "Hello" }],
  "maxTokens": 1024,
  "temperature": 0.7,
  "stream": false,
  "systemPrompt": "You are a helpful assistant"
}

`POST /v1/embed`

Generate embeddings.

{
  "model": "titan-embed",
  "input": ["text to embed"]
}

`POST /v1/classify`

Classify text into categories.

{
  "model": "claude-3-haiku",
  "input": "Text to classify",
  "labels": ["positive", "negative", "neutral"]
}

`GET /health`

Health check endpoint.

Provider Configuration

AWS Bedrock

Supported models:

claude-3-sonnet - High capability
claude-3-haiku - Fast and cost-effective
titan-embed - Text embeddings

Configure via environment variables:

AWS_REGION=ap-southeast-2
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret

OpenAI

Supported models:

gpt-4o - Latest GPT-4
gpt-4o-mini - Cost-effective
text-embedding-3-small - Compact embeddings
text-embedding-3-large - High-dimension embeddings

OPENAI_API_KEY=your-key

Azure OpenAI

Supported models:

azure-gpt-4o - GPT-4o via Azure deployment
azure-gpt-4o-mini - GPT-4o-mini via Azure deployment
azure-text-embedding-3-small - Compact embeddings via Azure
azure-text-embedding-3-large - High-dimension embeddings via Azure

AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-key

Google Gemini

Supported models:

gemini-2.0-flash - Fast and efficient
gemini-1.5-pro - High capability
text-embedding-004 - Text embeddings

GOOGLE_AI_API_KEY=your-key

Anthropic Direct

Supported models:

claude-3-5-sonnet - High capability
claude-3-haiku-direct - Fast and cost-effective
claude-3-opus - Most capable

ANTHROPIC_API_KEY=your-key

Cohere

Supported models:

command-r-plus - High capability
command-r - Cost-effective
embed-english-v3.0 - English embeddings
embed-multilingual-v3.0 - Multilingual embeddings

COHERE_API_KEY=your-key

RAG Pipeline

flowchart LR
    subgraph Ingestion
        Docs[Documents] --> Chunk[Chunker] --> Embed1[Embeddings] --> Store[MongoDB Atlas<br/>Vector Store]
    end
    subgraph Query
        Question[User Question] --> Embed2[Embed Query] --> Search[Vector Search] --> TopK[Top K Chunks] --> Prompt[Augment Prompt] --> LLM[LLM] --> Answer[Answer]
    end
    Store -.-> Search

The @ai-platform-aws/rag package provides a ready-to-use RAG pipeline:

import { RAGPipeline } from '@ai-platform-aws/rag';

const rag = new RAGPipeline({
  gatewayUrl: 'http://localhost:3100',
  mongoUrl: 'mongodb://localhost:27017',
  database: 'my_app',
  collection: 'documents',
});

await rag.connect();

// Ingest documents
await rag.ingest('Your document text here...', { source: 'doc.pdf' });

// Query
const result = await rag.query('What is the main topic?');
console.log(result.answer);
console.log(`Sources: ${result.sources.length}`);

Deployment (AWS CDK)

cd infra
pnpm install

# Bootstrap CDK (first time)
pnpm cdk bootstrap

# Deploy
pnpm deploy

# Deploy with alarm email
pnpm cdk deploy --all --context alarmEmail=you@example.com

This deploys:

ECS Fargate - Auto-scaling container service (2-10 tasks)
Application Load Balancer - Public-facing HTTPS endpoint
ElastiCache Redis - Response caching
CloudWatch - Dashboards and alarms

Monorepo Management

This project uses Nx for monorepo management on top of pnpm workspaces, and OpenAPI for contract-first API design.

Nx

Nx provides build caching, task orchestration, and affected-based CI:

# Build all packages
pnpm build

# Only build/test affected packages (CI)
pnpm build:affected
pnpm test:affected

# Run a specific target
pnpm nx run gateway:typecheck

OpenAPI Contract-First

The API contract is defined in packages/openapi/openapi.yaml. TypeScript types are auto-generated from this spec and shared across the gateway and SDK - a single source of truth.

# Generate types from the OpenAPI spec
pnpm nx run openapi:generate

# Types are output to packages/openapi/generated/types.ts

The SDK uses openapi-fetch for fully typed API calls that match the spec exactly.

Agents Framework

The @ai-platform-aws/agents package provides a full agentic AI framework built on top of the gateway.

Architecture

The Gateway handles LLM calls; Agents handle orchestration, tool use, memory, and multi-step reasoning.

flowchart TD
    Task[User Task] --> Agent[Agent]
    Agent --> LLM[LLM - Think]
    LLM --> Decision{Tool Call or Final Answer?}
    Decision -->|Tool Call| Guardrails[Guardrails Check]
    Guardrails --> Approval{Needs Approval?}
    Approval -->|Yes| Human[Human Approval]
    Human -->|Approved| Execute[Execute Tool]
    Human -->|Rejected| Agent
    Approval -->|No| Execute
    Execute --> Observe[Observation]
    Observe --> Memory[Update Memory]
    Memory --> Agent
    Decision -->|Final Answer| Result[Return Result]

Editable source: agent-react-loop.drawio -- open in draw.io

Multi-Agent Orchestration Patterns

graph LR
    subgraph Router
        T1[Task] --> R[Router] --> A1[Best Agent] --> R1[Result]
    end
    subgraph Pipeline
        T2[Task] --> P1[Agent A] --> P2[Agent B] --> P3[Agent C] --> R2[Result]
    end
    subgraph Parallel
        T3[Task] --> PA1[Agent A] & PA2[Agent B] & PA3[Agent C] --> Merge[Merge] --> R3[Result]
    end
    subgraph Supervisor
        T4[Task] --> S[Supervisor] --> W1[Worker A] & W2[Worker B] --> S --> R4[Result]
    end

Quick Start

import { AIGateway } from '@ai-platform-aws/sdk';
import { Agent, calculatorTool, httpTool } from '@ai-platform-aws/agents';

const gateway = new AIGateway({ baseUrl: 'http://localhost:3100' });

const agent = new Agent(
  {
    name: 'assistant',
    description: 'A helpful research assistant',
    model: 'claude-3-haiku',
    tools: [calculatorTool, httpTool],
    maxIterations: 10,
  },
  gateway,
);

const result = await agent.run('What is 42 * 17 plus the square root of 256?');
console.log(result.output);

Features

ReAct Loop - Think -> Act -> Observe -> Repeat until done
Built-in Tools - HTTP, MongoDB, vector search, calculator, file system, sandboxed code execution
Memory - In-memory conversation history + MongoDB-backed long-term memory with vector search
Planner - LLM-powered task decomposition and re-planning on failure
Multi-Agent Orchestration - Route, pipeline, parallel, and supervisor patterns
Guardrails - Block destructive ops, PII detection, cost limits, domain allowlists
Human-in-the-Loop - Configurable approval for sensitive tool calls

See packages/agents/ for full documentation.

Examples

Ready-to-run examples demonstrating real-world usage patterns:

Example	Description
Bedrock Basic	Get started with AWS Bedrock - completions, streaming, embeddings, and vision
OpenAI External	Use OpenAI models with automatic fallback to Bedrock
BYOK Multi-Tenant	Let users bring their own API keys with per-tenant billing and rate limiting
RAG Pipeline	Full retrieval-augmented generation with MongoDB Atlas Vector Search
Agent Basic	Simple agent with calculator + HTTP tools using the ReAct pattern
Agent Multi	Multi-agent pipeline: researcher -> writer -> reviewer
Agent Auto-Tagger	Agent that auto-tags a product catalog using DB queries + LLM analysis

Each example is self-contained with its own README, dependencies, and .env.example.

Observability

The gateway includes built-in observability features:

OpenTelemetry Tracing

Distributed tracing for every LLM request with nested spans (request -> provider call -> cache check). Traces include provider, model, token counts, latency, and cost attributes.

Configure via OTEL_EXPORTER_OTLP_ENDPOINT env var
Compatible with AWS X-Ray, Jaeger, and any OTLP endpoint
Local development: Jaeger UI at http://localhost:16686

Structured Logging

JSON-structured logging via pino with request context:

Configure log level via LOG_LEVEL env var (debug, info, warn, error)
Every log includes request ID, provider, model, tokens, and cost

Metrics

Prometheus-compatible metrics at /metrics:

ai_gateway_requests_total - total requests (per provider/model)
ai_gateway_tokens_total - total tokens consumed
ai_gateway_cost_total - total cost in USD
ai_gateway_latency_seconds - request latency histogram
ai_gateway_errors_total - error count

Admin Dashboard

A React-based dashboard for monitoring and managing the gateway.

Overview - Request counts, token usage, cost, error rates, charts
Cost Analytics - Cost breakdowns by provider/model, projections
Agent Runs - View agent execution history with step-by-step details
Prompts - Manage prompt templates with versioning
Settings - System status, provider health, cache status

Running the Dashboard

# With Docker Compose
docker-compose up -d dashboard
# Dashboard at http://localhost:3200

# Local development
cd packages/dashboard
pnpm dev

Admin API

All admin endpoints require Authorization: Bearer <ADMIN_API_KEY> header.

Endpoint	Method	Description
`/admin/stats`	GET	Overview stats (requests, tokens, cost, errors)
`/admin/costs`	GET	Cost analytics with filtering
`/admin/agent-runs`	GET	List agent runs
`/admin/agent-runs/:id`	GET	Single agent run with steps
`/admin/prompts`	GET	List prompt templates
`/admin/prompts/:id`	PUT	Update a prompt template
`/admin/health`	GET	Detailed system health

Project Structure

ai-platform-aws/
 packages/
    openapi/          # OpenAPI spec & generated types
    gateway/          # Fastify AI Gateway service
      src/
        observability/  # Tracing, logging, metrics
        routes/admin/   # Admin API endpoints
    dashboard/        # React admin dashboard (Vite + TailwindCSS)
    sdk/              # TypeScript client SDK (openapi-fetch)
    rag/              # RAG pipeline utilities
    agents/           # Agentic AI framework (ReAct, tools, memory, orchestration)
 examples/
    bedrock-basic/    # AWS Bedrock usage examples
    openai-external/  # OpenAI with fallback
    byok-multi-tenant/# Bring Your Own Key multi-tenant
    rag-pipeline/     # Full RAG with MongoDB Atlas
    agent-basic/      # Simple agent example
    agent-multi/      # Multi-agent pipeline
    agent-auto-tagger/# Auto-tagging with DB + LLM
 infra/                # AWS CDK infrastructure
 nx.json               # Nx configuration
 docker-compose.yml    # Local development
 .github/workflows/    # CI/CD (nx affected)

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
.husky		.husky
docs		docs
examples		examples
infra		infra
packages		packages
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
nx.json		nx.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
vitest.workspace.ts		vitest.workspace.ts

Folders and files

Latest commit

History

Repository files navigation

AI Platform AWS

Overview

Architecture

Dashboard

Examples

Quick Start

Using Docker Compose

Local Development

SDK Usage

API Reference

POST /v1/complete

POST /v1/embed

POST /v1/classify

GET /health

Provider Configuration

AWS Bedrock

OpenAI

Azure OpenAI

Google Gemini

Anthropic Direct

Cohere

RAG Pipeline

Deployment (AWS CDK)

Monorepo Management

Nx

OpenAPI Contract-First

Agents Framework

Architecture

Multi-Agent Orchestration Patterns

Quick Start

Features

Examples

Observability

OpenTelemetry Tracing

Structured Logging

Metrics

Admin Dashboard

Running the Dashboard

Admin API

Project Structure

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/complete`

`POST /v1/embed`

`POST /v1/classify`

`GET /health`

Packages