These components and examples will be deployed:
- LiteLLM for AI gateway
- vLLM for deploying and serving LLM models, with 2 models deployed:
- Qwen3-30B-A3B-Instruct-2507-FP8 - 8-bit quantization (FP8) on single g6e EC2 instance, fast model ~75 token/sec, MoE model with non-thinking mode
- Qwen3-32B-FP8 - 8-bit quantization (FP8) on single g6e EC2 instance, slow model ~15 token/sec, dense model with both thinking and non-thinking modes
- Langfuse for observability
- Open WebUI fore GUI App
- Qdrant for vector database
- Text Embedding Inference (TEI) deploying and serving Embedding models, with 1 model deployed:
- Qwen3-Embedding-4B - 16-bit quantization (BF16) on single r7i EC2 instance
- Calculator MCP server built by using FastMCP 2.0
- Calculator Agent built by using Strands Agents
Access Open WebUI at openwebui.<DOMAIN> and then:
- Agent pipe functions are automatically registered when agents are installed (e.g.,
./cli strands-agents calculator-agent install). TheStrands Agents - Calculator Agentfunction will appear in Open WebUI automatically. - Optionally, add Time Token Tracker from the Open WebUI Functions marketplace

- Change Open WebUI RAG embedding model to use the deployed Qwen3-Embedding model (check
LITELLM_API_KEYon.env.localfor API Key)
-
Using Open WebUI to interact with LLM models, document RAG and AI Agents:
-
Check Chat Features Overview on how to start using the chat features
-
Check Tutorial: Configuring RAG with Open WebUI Documentation on how to start using the document RAG feature
-
Select and explore
Strands Agents - Calculator Agent(code available atexamples/strands-agents/calculator-agent), this is a basic agent with memory so you can continue the calculation and/or reset the caculator
-
-
Access LiteLLM dashboard at
litellm.<DOMAIN>/ui(checkLITELLM_UI_USERNAMEandLITELLM_UI_PASSWORDon.env.localfor Username and Password):- Check LiteLLM Proxy Server (LLM Gateway) to explore some of the features
- LiteLLM Proxy will receive the requests from Open WebUI and AI Agent
-
Access Langfuse dashboard at
langfuse.<DOMAIN>(checkLANGFUSE_USERNAMEandLANGFUSE_PASSWORDon.env.localfor Email and Password):- Check Feature Overview to explore some of the features
- LiteLLM Proxy logging integration with Langfuse is already configured
