A tool that forwards responses through an Ollama-like server. Allows you to pipe through responses from LM Studio or OpenRouter to any Ollama-compatible endpoint.
PseudoLlama is a simple Express server that mimics the Ollama API. It serves content from a text file as responses to API requests, making it useful for testing applications that integrate with Ollama.
- Simulates Ollama API endpoints (
/api/chat
,/api/generate
, etc.) - Also supports OpenAI-compatible endpoints (
/v1/chat/completions
,/v1/completions
, etc.) - Web UI for editing the content and testing the server
- Supports both streaming and non-streaming responses
- Comprehensive logging of all model communications (requests and responses)
npm install
Start the server:
npm start
The server runs on port 12345 by default. This is a fixed port for testing purposes.
IMPORTANT: When connecting to this server from other tools, you must specify port 12345 in your configuration.
POST /api/chat
- Chat completionsPOST /api/generate
- Text generationPOST /api/embeddings
- Generate embeddingsGET /api/tags
- List available modelsPOST /api/pull
- Simulate model pulling
POST /v1/chat/completions
- Chat completionsPOST /v1/completions
- Text completionsPOST /v1/embeddings
- Generate embeddingsGET /v1/models
- List available models
GET /api/server/status
- Check server statusPOST /api/server/toggle
- Enable/disable the serverGET /api/content
- Get the current contentPOST /api/content
- Update the content
Access the web UI by navigating to http://localhost:12345
in your browser. The UI allows you to:
- View and edit the content that will be returned by the API
- Test the API by sending a request to the server
- Enable/disable the server
PseudoLlama includes comprehensive logging of all model communications:
Basic request and response information is logged to the console when the server is running.
Complete model communications (including full request and response bodies) are logged to logs/model_communications.log
. This is particularly useful for:
- Debugging applications that integrate with language models
- Analyzing the exact data sent to and received from models
- Understanding the structure of streaming responses
A log viewer utility is included to help analyze the logs:
# View all logs
node view-logs.js
# Show only the last 10 log entries
node view-logs.js --limit=10
# Filter logs by model
node view-logs.js --model=openrouter
# Filter logs by endpoint
node view-logs.js --endpoint=/v1/chat
# Show only requests
node view-logs.js --requests
# Show only responses
node view-logs.js --responses
# Watch for new log entries in real-time
node view-logs.js --tail
# Show help
node view-logs.js --help
The log files are automatically rotated when they reach 10MB to prevent excessive disk usage.