Skip to content

Joewus12345/openai-support-agent-demo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Support Agent with Human in the Loop Demo

MIT License NextJS OpenAI API

This repository contains a NextJS demo app of a Customer Service with a Human in the loop (HITL) use case built on top of the Responses API. It leverages the file search built-in tool and implements 2 views of a chat interface: one for the customer, and one for the human agent.

This demo is an example flow where a human agent would be assisted by an AI agent to answer customer questions, while staying in control of sensitive actions.

screenshot

Features:

  • Multi-turn conversation handling
  • File search tool
  • Vector store creation & file upload for use with the file search
  • Knowledge base display
  • Function calling
  • Streaming suggested responses
  • Suggested actions to execute tool calls
  • Auto-execution of tool calls for non-sensitive actions
  • Optional auto reply mode to automatically send suggested messages
  • Filters out irrelevant questions and jailbreaking attempts
  • Idle sessions auto-close after 4 minutes, sending unsaved messages and marking the session as ended
  • Works with either openai or ollama providers. The built-in tools operate the same with both.

Feel free to customize this demo to suit your specific use case.

Project structure

  • app/ for Next.js routes & API handlers
  • components/ for UI components
  • stores/ for state management (e.g., useConversationStore)
  • scripts/ for maintenance tasks like cleanupSessions.ts
  • prisma/ for schema & migrations, etc.

Docker Quickstart

Copy .env.ai, adjust credentials or port mappings, and add your AGENT_TOKENS mapping. This file is used when running the container.

The easiest way to run the entire stack with the baked-in port mappings is:

docker compose -f docker-compose.agent.yml up --build

The steps below show how to build and run each container manually.

  1. Build the image

    docker build -t ai-agent .
  2. Create a network and start PostgreSQL and Redis on free ports

    docker network create support-net
    docker run --rm -d --name demo-db --network support-net -e POSTGRES_PASSWORD=postgres -p 5433:5432 postgres
    docker run --rm -d --name demo-redis --network support-net -p 6380:6379 redis
  3. Run the AI agent

    docker run --rm -p 3001:3001 --network support-net --env-file .env.ai \
      -e DATABASE_URL=postgresql://postgres:postgres@demo-db:5432/support_agent_demo \
      -e REDIS_URL=redis://demo-redis:6379 ai-agent

Troubleshooting

  • P1001: Can't reach database server – verify container names/ports and ensure the support-net network exists.
  • P1000: Authentication failed – check the database credentials in .env.ai.
  • network support-net not found – create the network first: docker network create support-net.
  • port already allocated – choose unused host ports or stop conflicting services.

Getting Started

  1. Install dependencies:

    npm install
  2. Configure environment:

    Create a .env file with your database connection string, Redis URL, session retention setting, and (optionally) Chatwoot credentials:

    DATABASE_URL="postgresql://<user>:<password>@localhost:5432/<dbname>"
    REDIS_URL=redis://localhost:6379
    SESSION_RETENTION_DAYS=30 # How many days to retain ended sessions
    CHATWOOT_URL=https://e245e7cb03bc.ngrok-free.app
    CHATWOOT_APP_TOKEN=<chatwoot-app-token>
    CHATWOOT_BOT_TOKEN=<bot access token>
    AGENT_TOKENS='{"1":"agent-1-secret","2":"agent-2-secret"}'

When running this service inside Docker, copy the same `AGENT_TOKENS` mapping into `.env.ai`.

When running this service inside Docker, `CHATWOOT_URL` must be a fully
qualified address reachable from the container (for example, a public
ngrok or Cloudflare Tunnel URL) so that the agent can resolve the
Chatwoot instance.

If Redis runs on a dynamically mapped port (e.g. `docker port` or `docker compose port`), first determine the host port and then point `REDIS_URL` to it:

 ```bash
 docker compose port redis 6379
 # or: docker port <container_name> 6379
 # Suppose it prints 0.0.0.0:49153
 REDIS_URL=redis://localhost:49153

SESSION_RETENTION_DAYS controls how long ended sessions are kept before cleanup (defaults to 30 days). Session messages cached in Redis are retained indefinitely until the session is cleaned up. The Chatwoot variables configure the webhook endpoints to send automated replies and handle status changes in your Chatwoot instance. Create two webhooks in Chatwoot: - http://ai-agent:3001/api/chatwoot-webhook subscribed only to message_created. - http://ai-agent:3001/api/chatwoot-status-webhook subscribed to conversation_status_changed, conversation_updated, and message_created (system events).

Release retries for these endpoints are controlled by `RELEASE_MAX_ATTEMPTS`, `RELEASE_RETRY_BASE_MS`, and `CHATWOOT_TIMEOUT_MS`.

**Attachment handling.** Incoming Chatwoot messages may include attachments on `message.attachments` or inside `content_attributes`.
The webhook normalizes these entries before building the provider request:

- If the configured model supports vision (e.g. `gpt-4o`, `gpt-4.1`, or any model listed via `CHATWOOT_WEBHOOK_MODEL` that includes the substrings `4o`, `4.1`, `o1`, `o3`, or `omni`), image attachments are forwarded to the provider as `input_image` parts. URLs or embedded data URIs are passed through so the model can inspect the image content directly.
- For other models, the webhook appends a textual note that lists each attachment, its MIME type when available, and its download URL or indicates when only base64 data is present. This ensures agents are aware of additional context even when the model cannot open the file itself.
- When an inbound message only includes image attachments (no textual content), the webhook skips the relevance guardrail so customers do not receive unnecessary clarification prompts. After deploying, monitor your logs for the `Skipping relevance guardrail for image-only attachment message` entry to confirm the behavior is active.
- Whenever an image is present, the webhook now routes the attachment through a lightweight image-understanding pass (default model `CHATWOOT_IMAGE_MODEL` or `gpt-4.1-mini`). The analysis produces catalog-ready descriptors, suggested queries, and top matches from the knowledge base so the assistant can contrast alternatives without asking the customer for more detail. The generated summary is appended to the user turn and a developer message shares the ranked matches with the model.
- Tune the helper with:
  - `CHATWOOT_IMAGE_MODEL` to select the OpenAI vision model used for analysis.
  - `CHATWOOT_IMAGE_SEARCH_PROVIDER` to override the provider used when searching the knowledge base (defaults to the webhook provider).
  - `CHATWOOT_IMAGE_KB_LIMIT` to cap the number of knowledge base snippets injected into the prompt (defaults to 3).
- Image-related logs now include an `Image insight queries` entry showing the normalized search terms used to interrogate the knowledge base. Track this log to validate catalog coverage and iterate on tagging when matches are sparse.

You can override the attachment detection model by setting `CHATWOOT_WEBHOOK_MODEL`; otherwise the default falls back to the global `MODEL` configured for the provider.

The `AGENT_TOKENS` environment variable supplies per-agent access tokens in JSON form, e.g. `{ "1": "secret" }`. Store these secrets outside of source control (such as a `.env` file or your hosting platform's secret manager). To rotate a token, update the JSON with the new value and redeploy or restart the service so it reads the updated mapping.

#### Release retry settings
  • RELEASE_MAX_ATTEMPTS (default 5) – number of retry attempts when releasing an agent fails.
  • RELEASE_RETRY_BASE_MS (default 1000) – base delay for exponential backoff.
  • CHATWOOT_TIMEOUT_MS (default 15000) – API request timeout before retries.

A standalone Docker setup is available to test the service in isolation:

docker compose -f docker-compose.agent.yml up --build

This starts the AI agent on http://localhost:3001 along with PostgreSQL, Redis, and Ollama.

  1. Run database migrations:

    npx prisma generate
    npx prisma migrate deploy
  2. Start Redis and the app:

    Start a Redis server (e.g. redis-server or docker run -p 6379:6379 redis) and then run:

    npm run dev
  3. Ticket IDs in chats:

    When a customer does not share an email, the create_ticket tool generates a ticket ID in the format #<index>/<date> (for example #1/2024-05-01). The ID is stored with the chat session so the conversation can be resumed later using that ticket number.

Agent release workflow

Releases are triggered only when a conversation moves from open to either pending or resolved and still carries the agent-assigned label. The chatwoot-status-webhook checks for this combination before calling releaseAgent. Inside releaseAgent, the label is removed, which generates additional webhook events to handle follow-up processing.

Retry behavior for this release process is governed by the RELEASE_MAX_ATTEMPTS, RELEASE_RETRY_BASE_MS, and CHATWOOT_TIMEOUT_MS environment variables.

How to use

  1. Set up the OpenAI API:

  2. Clone the Repository:

    git clone https://github.com/openai/openai-support-agent-demo.git
  3. Set the OpenAI API key:

    2 options:

    • Set the OPENAI_API_KEY environment variable globally in your system
    • Set the OPENAI_API_KEY environment variable in the project: Create a .env file at the root of the project and add the following line (see .env.example for reference):
    OPENAI_API_KEY=<your_api_key>

    Note: File search uses the OpenAI vector store when the provider is set to openai. When using the ollama provider, search falls back to a local vector store built from the knowledge base using Ollama's embeddings endpoint. You can keep both stores initialized and switch providers at any time.

  4. Choose your provider (optional):

    The assistant can run using the openai API, the ollama package, or Ollama's OpenAI compatible endpoint. You can switch providers from the dropdown next to Auto reply in the agent view. When selecting either ollama or ollama-openai, make sure you have an Ollama server running locally (e.g. by executing ollama serve). The built-in tools work the same with all providers.

Running Ollama

When using the ollama provider you need a local server running.

  1. Start the server:

    ollama serve
  2. Begin with a lightweight model such as llama3:

    ollama run llama3

    The command downloads the model if needed. Use ollama run <model> or ollama pull <model> to get other models.

  3. After installing multiple models, switch between them using the Model dropdown in the agent view. Ensure the provider dropdown is set to ollama.

  4. (Optional) Configure the default model and context size by adding the following variables to your .env file (see .env.example):

    OLLAMA_MODEL=llama3.2
    OLLAMA_NUM_CTX=32768
    OLLAMA_HOST=http://localhost:11434

    To enable the OpenAI compatible endpoint, also set:

    OLLAMA_OPENAI_BASE_URL=http://localhost:11434/v1
    OLLAMA_OPENAI_API_KEY=ollama
  5. Install dependencies:

    Run in the project root:

    npm install

    You must run this command before executing npm run lint or npm run dev.

  6. Run the app:

    npm run dev

    The app will be available at http://localhost:3000.

  7. Initialize the vector store:

    Visit /init_vs where you can create both OpenAI and Ollama vector stores.

    • Initialize OpenAI vector store: click the OpenAI button. Copy the returned vector store ID and paste it into config/constants.ts as VECTOR_STORE_ID.
  • Initialize Ollama vector store: click the Ollama button to generate embeddings locally. This stores the embeddings in the data/local_vector_store.json file.
  • Rebuild Ollama vector store: click the Rebuild Ollama button on the same page to regenerate embeddings after updating the knowledge base (this calls /api/local_vector_store/init?force=true).

The OpenAI vector store will be used when the provider is set to openai while the local store powers file search when using the ollama provider. Both stores can exist side by side.

By default, searches return up to 10 results. Local search applies a cosine similarity threshold of 0.3. Provide a limit option to control the number of results, a threshold option to adjust the cutoff, or set topKOnly: true to ignore the threshold and rely solely on the highest scoring limit matches.

Demo Flow

To try out the demo, you can ask questions that will trigger a file search.

Example questions:

  • What is the return policy?
  • How do I return a product?
  • How can I cancel an order?
  • What does your company do?
  • Do you sell sensors?

When an answer is generated, it will be displayed as a suggested response for the customer support representative. In the agent view, you can edit the message or send it as is. You can toggle Auto reply in the agent view to automatically send the suggested response.

You can also click on the "Relevant articles" to see the corresponding articles in the knowledge base or FAQ.

You can then continue the conversation as the user.

You can ask for help to trigger actions.

Example questions:

  • Help me cancel order ORD1001 => Should suggest the cancel_order action
  • Help me reset my password => Should suggest the reset_password action
  • Give me a list of my past orders => Should trigger the execution of get_order_history

End-to-end demo flow

  1. Ask as the user "How can I cancel my order?"
  2. Confirm the suggested response
  3. Ask as the user "Help me cancel order ORD1001"
  4. Confirm the suggested response
  5. Confirm the suggested action to cancel the order
  6. Confirm the suggested response

Limitations

Note that the functions that are executed are just placeholders and are not actually modifying any data, so the actions will not have any effect. For example, calling cancel_order won't change the status of the order.

Customization

To customize this demo you can:

  • Edit prompts, initial message and model in config/constants.ts
  • Edit available functions in config/tools-list.ts
  • Edit functions logic in config/functions.ts
  • (optional) Edit the demo data in config/demoData.ts

You can also customize the endpoints in the /api folder to call your own backend or external services.

If you want to use this code repository as a starting point for your own project in production, please note that this demo is not production-ready and that you would need to implement safety measures such as input guardrails, user authentication, etc.

Local database

This demo can store customer profiles and chat sessions in a local PostgreSQL database using Prisma.

  1. Add your connection string to .env:

    DATABASE_URL="postgresql://<user>:<password>@localhost:5432/<dbname>"
  2. Run the migrations to create the tables (including the latest schema updates):

    npx prisma migrate deploy
  3. Start a Redis instance (for example, run redis-server locally or use Docker docker run -p 6379:6379 redis).

  4. Configure Redis by setting REDIS_URL in your .env file:

    REDIS_URL=redis://localhost:6379

The new API endpoints under /api/users and /api/sessions/start allow the agent to create or retrieve customer records, manage chat sessions, and store conversation history using this database and Redis. During each turn, /api/turn_response persists messages via saveSessionMessages, so a separate /api/sessions/[session_id]/save call is no longer required.

Session lifecycle & cleanup

  • Sessions automatically end after 4 minutes of inactivity.
  • Run npm run cleanup:sessions to remove ended sessions older than the number of days specified in SESSION_RETENTION_DAYS.
  • By default, sessions are retained for 30 days. Change the value of SESSION_RETENTION_DAYS in your .env file to adjust the retention period.

Session summarization & pruning

Sessions automatically summarize and prune older messages once the number of unsummarized messages exceeds the MAX_UNSUMMARIZED_MESSAGES limit (default 50). This keeps active sessions lightweight while preserving a running summary of prior context.

Set the MAX_UNSUMMARIZED_MESSAGES environment variable to control how many recent messages are kept verbatim.

To prune existing sessions to the current limit, run:

npm run prune:sessions

The pruning script also updates each user's longSummary retroactively. After a session's message log is reduced and its summary is refreshed, that summary is itself summarized and appended to the user's existing long-term summary so historical context is preserved.

Contributing

You are welcome to open issues or submit PRs to improve this app, however, please note that we may not review all suggestions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Demo of a customer support agent interface using NextJS and the OpenAI Responses API with File Search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 61.4%
  • JavaScript 38.2%
  • Other 0.4%