An open-source template demonstrating how to charge for AI inference on a pay-per-token basis using the x402 protocol.
Features · How It Works · Running locally
- Accurate Token Metering: Extract actual token usage from AI responses using Vercel AI SDK
- Flexible Pricing: Set your own price per token (ex: $0.000001 / token)
- Real-time Cost Display: Users see the exact cost of each AI response in the UI
- Asynchronous Settlement: Payment is settled after streaming completes for optimal UX
- Pre-verification: Verify signed payment data before processing requests
- Maximum Amount Protection: Set a max token limit to cap potential costs
- Post-inference Settlement: Charge only for actual tokens used
- Onchain Payments: Paid in the token and chain of your choice with gasless transactions
This template demonstrates a complete pay-per-token flow:
-
Payment Verification (
verifyPayment)- User signs payment authorization with maximum amount
- Server verifies signature before processing request
- Prevents unauthorized inference calls
-
AI Inference (
streamText)- Process chat request and stream AI response to user
- Non-blocking payment flow ensures optimal UX
- Extract token usage via
onFinishcallback
-
Asynchronous Settlement (
settlePayment)- Calculate final price:
PRICE_PER_INFERENCE_TOKEN_WEI × totalTokens - Settle payment on-chain after streaming completes
- Only charge for actual tokens consumed
- Calculate final price:
-
Cost Display
- Stream token metadata to frontend via
messageMetadata - Display cost card below each AI response
- Full transparency for users
- Stream token metadata to frontend via
Backend - Token Extraction & Payment Settlement (app/api/chat/route.ts):
const stream = streamText({
// ... model config
onFinish: async (event) => {
const totalTokens = event.totalUsage.totalTokens;
const finalPrice = PRICE_PER_INFERENCE_TOKEN_WEI * totalTokens;
await settlePayment({
facilitator: twFacilitator,
network: arbitrum,
price: { amount: finalPrice.toString(), asset: usdcAsset },
// ... other params
});
},
});Frontend - Cost Display (components/messages.tsx):
const totalTokens = metadata?.totalTokens;
const costInUsdc = (PRICE_PER_INFERENCE_TOKEN_WEI * totalTokens) / 10 ** 6;- Next.js App Router for server-side rendering and performance
- Vercel AI SDK for LLM API and streaming
- thirdweb x402 for HTTP micropayments and payment infrastructure
You will need the following API keys and environment variables:
- AI Provider API Keys: Anthropic, Fireworks, or Groq (depending on which model you want to use)
- thirdweb Credentials: For x402 payment infrastructure
- Get your secret key from thirdweb dashboard
- Client ID for frontend wallet connection
- Clone the repository
git clone <repository-url>
cd x402-ai-inference- Install dependencies
pnpm install- Set up environment variables
Create a .env.local file in the root directory:
# AI Provider API Keys
ANTHROPIC_API_KEY=your_anthropic_api_key
OPENAI_API_KEY=your_openai_api_key
# thirdweb Configuration
THIRDWEB_SECRET_KEY=your_thirdweb_secret_key
THIRDWEB_SERVER_WALLET_ADDRESS=your_server_wallet_address
NEXT_PUBLIC_THIRDWEB_CLIENT_ID=your_thirdweb_client_idImportant: Never commit your
.env.localfile. It contains secrets that will allow others to control access to your AI provider and thirdweb accounts.
- Configure pricing (Optional)
Edit lib/constants.ts to adjust your pricing:
export const PRICE_PER_INFERENCE_TOKEN_WEI = 1; // 0.000001 USDC per token
export const MAX_INFERENCE_TOKENS_PER_CALL = 1000000; // 1M tokens maxYou can also change the chain and token used for the payment in that file.
- Start the development server
pnpm devYour app should now be running on localhost:3000.
- Connect a wallet with USDC on Arbitrum
- Send a chat message to trigger an AI inference
- The app will:
- Verify your payment signature
- Stream the AI response
- Settle payment based on actual tokens used
- Display the cost below the response
This project is open source and available under the MIT License.