This repository contains a NextJS demo app of a Customer Service with a Human in the loop (HITL) use case built on top of the Responses API. It leverages the file search built-in tool and implements 2 views of a chat interface: one for the customer, and one for the human agent.
This demo is an example flow where a human agent would be assisted by an AI agent to answer customer questions, while staying in control of sensitive actions.
Features:
- Multi-turn conversation handling
- File search tool
- Vector store creation & file upload for use with the file search
- Knowledge base display
- Function calling
- Streaming suggested responses
- Suggested actions to execute tool calls
- Auto-execution of tool calls for non-sensitive actions
- Optional auto reply mode to automatically send suggested messages
- Filters out irrelevant questions and jailbreaking attempts
- Idle sessions auto-close after 4 minutes, sending unsaved messages and marking the session as ended
- Works with either
openaiorollamaproviders. The built-in tools operate the same with both.
Feel free to customize this demo to suit your specific use case.
app/for Next.js routes & API handlerscomponents/for UI componentsstores/for state management (e.g.,useConversationStore)scripts/for maintenance tasks likecleanupSessions.tsprisma/for schema & migrations, etc.
Copy .env.ai, adjust credentials or port mappings, and add your AGENT_TOKENS mapping. This file is used when running the container.
The easiest way to run the entire stack with the baked-in port mappings is:
docker compose -f docker-compose.agent.yml up --buildThe steps below show how to build and run each container manually.
-
Build the image
docker build -t ai-agent . -
Create a network and start PostgreSQL and Redis on free ports
docker network create support-net docker run --rm -d --name demo-db --network support-net -e POSTGRES_PASSWORD=postgres -p 5433:5432 postgres docker run --rm -d --name demo-redis --network support-net -p 6380:6379 redis
-
Run the AI agent
docker run --rm -p 3001:3001 --network support-net --env-file .env.ai \ -e DATABASE_URL=postgresql://postgres:postgres@demo-db:5432/support_agent_demo \ -e REDIS_URL=redis://demo-redis:6379 ai-agent
P1001: Can't reach database server– verify container names/ports and ensure thesupport-netnetwork exists.P1000: Authentication failed– check the database credentials in.env.ai.network support-net not found– create the network first:docker network create support-net.port already allocated– choose unused host ports or stop conflicting services.
-
Install dependencies:
npm install
-
Configure environment:
Create a
.envfile with your database connection string, Redis URL, session retention setting, and (optionally) Chatwoot credentials:DATABASE_URL="postgresql://<user>:<password>@localhost:5432/<dbname>" REDIS_URL=redis://localhost:6379 SESSION_RETENTION_DAYS=30 # How many days to retain ended sessions CHATWOOT_URL=https://e245e7cb03bc.ngrok-free.app CHATWOOT_APP_TOKEN=<chatwoot-app-token> CHATWOOT_BOT_TOKEN=<bot access token> AGENT_TOKENS='{"1":"agent-1-secret","2":"agent-2-secret"}'
When running this service inside Docker, copy the same `AGENT_TOKENS` mapping into `.env.ai`.
When running this service inside Docker, `CHATWOOT_URL` must be a fully
qualified address reachable from the container (for example, a public
ngrok or Cloudflare Tunnel URL) so that the agent can resolve the
Chatwoot instance.
If Redis runs on a dynamically mapped port (e.g. `docker port` or `docker compose port`), first determine the host port and then point `REDIS_URL` to it:
```bash
docker compose port redis 6379
# or: docker port <container_name> 6379
# Suppose it prints 0.0.0.0:49153
REDIS_URL=redis://localhost:49153
SESSION_RETENTION_DAYS controls how long ended sessions are kept before cleanup (defaults to 30 days).
Session messages cached in Redis are retained indefinitely until the session is cleaned up.
The Chatwoot variables configure the webhook endpoints to send automated replies and handle status changes in your Chatwoot instance.
Create two webhooks in Chatwoot:
- http://ai-agent:3001/api/chatwoot-webhook subscribed only to message_created.
- http://ai-agent:3001/api/chatwoot-status-webhook subscribed to conversation_status_changed, conversation_updated, and message_created (system events).
Release retries for these endpoints are controlled by `RELEASE_MAX_ATTEMPTS`, `RELEASE_RETRY_BASE_MS`, and `CHATWOOT_TIMEOUT_MS`.
**Attachment handling.** Incoming Chatwoot messages may include attachments on `message.attachments` or inside `content_attributes`.
The webhook normalizes these entries before building the provider request:
- If the configured model supports vision (e.g. `gpt-4o`, `gpt-4.1`, or any model listed via `CHATWOOT_WEBHOOK_MODEL` that includes the substrings `4o`, `4.1`, `o1`, `o3`, or `omni`), image attachments are forwarded to the provider as `input_image` parts. URLs or embedded data URIs are passed through so the model can inspect the image content directly.
- For other models, the webhook appends a textual note that lists each attachment, its MIME type when available, and its download URL or indicates when only base64 data is present. This ensures agents are aware of additional context even when the model cannot open the file itself.
- When an inbound message only includes image attachments (no textual content), the webhook skips the relevance guardrail so customers do not receive unnecessary clarification prompts. After deploying, monitor your logs for the `Skipping relevance guardrail for image-only attachment message` entry to confirm the behavior is active.
- Whenever an image is present, the webhook now routes the attachment through a lightweight image-understanding pass (default model `CHATWOOT_IMAGE_MODEL` or `gpt-4.1-mini`). The analysis produces catalog-ready descriptors, suggested queries, and top matches from the knowledge base so the assistant can contrast alternatives without asking the customer for more detail. The generated summary is appended to the user turn and a developer message shares the ranked matches with the model.
- Tune the helper with:
- `CHATWOOT_IMAGE_MODEL` to select the OpenAI vision model used for analysis.
- `CHATWOOT_IMAGE_SEARCH_PROVIDER` to override the provider used when searching the knowledge base (defaults to the webhook provider).
- `CHATWOOT_IMAGE_KB_LIMIT` to cap the number of knowledge base snippets injected into the prompt (defaults to 3).
- Image-related logs now include an `Image insight queries` entry showing the normalized search terms used to interrogate the knowledge base. Track this log to validate catalog coverage and iterate on tagging when matches are sparse.
You can override the attachment detection model by setting `CHATWOOT_WEBHOOK_MODEL`; otherwise the default falls back to the global `MODEL` configured for the provider.
The `AGENT_TOKENS` environment variable supplies per-agent access tokens in JSON form, e.g. `{ "1": "secret" }`. Store these secrets outside of source control (such as a `.env` file or your hosting platform's secret manager). To rotate a token, update the JSON with the new value and redeploy or restart the service so it reads the updated mapping.
#### Release retry settings
RELEASE_MAX_ATTEMPTS(default5) – number of retry attempts when releasing an agent fails.RELEASE_RETRY_BASE_MS(default1000) – base delay for exponential backoff.CHATWOOT_TIMEOUT_MS(default15000) – API request timeout before retries.
A standalone Docker setup is available to test the service in isolation:
docker compose -f docker-compose.agent.yml up --buildThis starts the AI agent on http://localhost:3001 along with PostgreSQL, Redis, and Ollama.
-
Run database migrations:
npx prisma generate npx prisma migrate deploy
-
Start Redis and the app:
Start a Redis server (e.g.
redis-serverordocker run -p 6379:6379 redis) and then run:npm run dev
-
Ticket IDs in chats:
When a customer does not share an email, the
create_tickettool generates a ticket ID in the format#<index>/<date>(for example#1/2024-05-01). The ID is stored with the chat session so the conversation can be resumed later using that ticket number.
Releases are triggered only when a conversation moves from open to either pending or resolved and still carries the agent-assigned label. The chatwoot-status-webhook checks for this combination before calling releaseAgent. Inside releaseAgent, the label is removed, which generates additional webhook events to handle follow-up processing.
Retry behavior for this release process is governed by the RELEASE_MAX_ATTEMPTS, RELEASE_RETRY_BASE_MS, and CHATWOOT_TIMEOUT_MS environment variables.
-
Set up the OpenAI API:
- If you're new to the OpenAI API, sign up for an account.
- Follow the Quickstart to retrieve your API key.
-
Clone the Repository:
git clone https://github.com/openai/openai-support-agent-demo.git
-
Set the OpenAI API key:
2 options:
- Set the
OPENAI_API_KEYenvironment variable globally in your system - Set the
OPENAI_API_KEYenvironment variable in the project: Create a.envfile at the root of the project and add the following line (see.env.examplefor reference):
OPENAI_API_KEY=<your_api_key>
Note: File search uses the OpenAI vector store when the provider is set to
openai. When using theollamaprovider, search falls back to a local vector store built from the knowledge base using Ollama'sembeddingsendpoint. You can keep both stores initialized and switch providers at any time. - Set the
-
Choose your provider (optional):
The assistant can run using the
openaiAPI, theollamapackage, or Ollama's OpenAI compatible endpoint. You can switch providers from the dropdown next to Auto reply in the agent view. When selecting eitherollamaorollama-openai, make sure you have an Ollama server running locally (e.g. by executingollama serve). The built-in tools work the same with all providers.
When using the ollama provider you need a local server running.
-
Start the server:
ollama serve
-
Begin with a lightweight model such as
llama3:ollama run llama3
The command downloads the model if needed. Use
ollama run <model>orollama pull <model>to get other models. -
After installing multiple models, switch between them using the Model dropdown in the agent view. Ensure the provider dropdown is set to
ollama. -
(Optional) Configure the default model and context size by adding the following variables to your
.envfile (see.env.example):OLLAMA_MODEL=llama3.2 OLLAMA_NUM_CTX=32768 OLLAMA_HOST=http://localhost:11434
To enable the OpenAI compatible endpoint, also set:
OLLAMA_OPENAI_BASE_URL=http://localhost:11434/v1 OLLAMA_OPENAI_API_KEY=ollama
-
Install dependencies:
Run in the project root:
npm install
You must run this command before executing
npm run lintornpm run dev. -
Run the app:
npm run dev
The app will be available at
http://localhost:3000. -
Initialize the vector store:
Visit
/init_vswhere you can create both OpenAI and Ollama vector stores.- Initialize OpenAI vector store: click the OpenAI button. Copy
the returned vector store ID and paste it into
config/constants.tsasVECTOR_STORE_ID.
- Initialize OpenAI vector store: click the OpenAI button. Copy
the returned vector store ID and paste it into
- Initialize Ollama vector store: click the Ollama button to
generate embeddings locally. This stores the embeddings in the
data/local_vector_store.jsonfile. - Rebuild Ollama vector store: click the Rebuild Ollama button
on the same page to regenerate embeddings after updating the knowledge base
(this calls
/api/local_vector_store/init?force=true).
The OpenAI vector store will be used when the provider is set to openai
while the local store powers file search when using the ollama provider.
Both stores can exist side by side.
By default, searches return up to 10 results. Local search applies a cosine
similarity threshold of 0.3. Provide a limit option to control the number of results,
a threshold option to adjust the cutoff, or set topKOnly: true to ignore
the threshold and rely solely on the highest scoring limit matches.
To try out the demo, you can ask questions that will trigger a file search.
Example questions:
- What is the return policy?
- How do I return a product?
- How can I cancel an order?
- What does your company do?
- Do you sell sensors?
When an answer is generated, it will be displayed as a suggested response for the customer support representative. In the agent view, you can edit the message or send it as is. You can toggle Auto reply in the agent view to automatically send the suggested response.
You can also click on the "Relevant articles" to see the corresponding articles in the knowledge base or FAQ.
You can then continue the conversation as the user.
You can ask for help to trigger actions.
Example questions:
- Help me cancel order ORD1001 => Should suggest the
cancel_orderaction - Help me reset my password => Should suggest the
reset_passwordaction - Give me a list of my past orders => Should trigger the execution of
get_order_history
- Ask as the user "How can I cancel my order?"
- Confirm the suggested response
- Ask as the user "Help me cancel order ORD1001"
- Confirm the suggested response
- Confirm the suggested action to cancel the order
- Confirm the suggested response
Note that the functions that are executed are just placeholders and are not actually modifying any data, so the actions will not have any effect. For example, calling cancel_order won't change the status of the order.
To customize this demo you can:
- Edit prompts, initial message and model in
config/constants.ts - Edit available functions in
config/tools-list.ts - Edit functions logic in
config/functions.ts - (optional) Edit the demo data in
config/demoData.ts
You can also customize the endpoints in the /api folder to call your own backend or external services.
If you want to use this code repository as a starting point for your own project in production, please note that this demo is not production-ready and that you would need to implement safety measures such as input guardrails, user authentication, etc.
This demo can store customer profiles and chat sessions in a local PostgreSQL database using Prisma.
-
Add your connection string to
.env:DATABASE_URL="postgresql://<user>:<password>@localhost:5432/<dbname>" -
Run the migrations to create the tables (including the latest schema updates):
npx prisma migrate deploy
-
Start a Redis instance (for example, run
redis-serverlocally or use Dockerdocker run -p 6379:6379 redis). -
Configure Redis by setting
REDIS_URLin your.envfile:REDIS_URL=redis://localhost:6379
The new API endpoints under /api/users and /api/sessions/start allow the agent to create or retrieve customer records, manage chat sessions, and store conversation history using this database and Redis. During each turn, /api/turn_response persists messages via saveSessionMessages, so a separate /api/sessions/[session_id]/save call is no longer required.
- Sessions automatically end after 4 minutes of inactivity.
- Run
npm run cleanup:sessionsto remove ended sessions older than the number of days specified inSESSION_RETENTION_DAYS. - By default, sessions are retained for 30 days. Change the value of
SESSION_RETENTION_DAYSin your.envfile to adjust the retention period.
Sessions automatically summarize and prune older messages once the number of unsummarized messages exceeds the MAX_UNSUMMARIZED_MESSAGES limit (default 50). This keeps active sessions lightweight while preserving a running summary of prior context.
Set the MAX_UNSUMMARIZED_MESSAGES environment variable to control how many recent messages are kept verbatim.
To prune existing sessions to the current limit, run:
npm run prune:sessionsThe pruning script also updates each user's longSummary retroactively. After a session's message log is reduced and its summary
is refreshed, that summary is itself summarized and appended to the user's existing long-term summary so historical context
is preserved.
You are welcome to open issues or submit PRs to improve this app, however, please note that we may not review all suggestions.
This project is licensed under the MIT License. See the LICENSE file for details.
