Skip to content

Kaist-ICLab/active-listening-voice-agent-woz

Repository files navigation

Wizard of Oz Interface for "Fostering Self-Exploration for Mental Health: User Experiences with Content-Based Active Listening Voice Agents"

This repository contains the source code for the Wizard of Oz (WoZ) web interface used in our research paper in submission.

The project investigates how content-based active listening—specifically the summarization and paraphrasing of a user's free-form speech—can foster self-exploration in a mental health context. This interface facilitated a human "wizard" (the researcher) to simulate an active listening voice agent in real-time.

About The Project

Our study explores how a voice agent can move beyond generic replies ('um-hmm', 'that's good') to provide empathetic responses based on the content of what a user says.

To achieve this in a controlled lab setting, we used this WoZ application. It allowed a researcher to:

  • Generate summarization and paraphrasing responses by typing them in real-time.
  • Select pre-written conversational prompts from a script.
  • Trigger audio playback for guided meditation.

This tool was central to answering our research questions on how users perceive and are affected by content-based empathetic responses from a voice agent.

Tech Stack

Getting Started

Follow these instructions to get a local copy up and running.

Prerequisites

  • Node.js (v18 or later)
  • npm or yarn
  • API credentials for Naver CLOVA Voice (Client ID and Client Secret)

Installation & Setup

  1. Install NPM packages:

    npm install
  2. Set up environment variables: Create a new file named .env.local in the root of the project and add your Naver CLOVA API credentials.

    .env.local

    CLOVA_CLIENT_ID=your_client_id_here
    CLOVA_CLIENT_SECRET=your_client_secret_here
    
  3. Run the development server:

    npm run dev

    Open http://localhost:3000 with your browser to see the interface.

How It Works

The application logic is contained primarily in src/app/page.js.

  1. UI Components:

    • MessageSelector: Renders the tabs and buttons for pre-defined scripts, which are loaded from src/app/assets/scenario.json. It also includes a text input for the wizard to type custom messages. This is where the wizard typed the custom summarization and paraphrasing responses discussed in the paper.
    • Queue: Displays the list of messages that are waiting to be spoken by the agent.
  2. State Management:

    • A React state array msgs holds the queue of messages (as strings) to be played.
  3. Audio Playback:

    • An HTML <audio> element is used to play the generated speech or pre-recorded sound files.
    • When the wizard selects a message, it is added to the msgs queue.
    • The onEnded event of the audio player triggers the next message in the queue to be played, ensuring sequential playback.
  4. Text-to-Speech (TTS) Flow:

    • When a text message needs to be spoken, the clovaGenerate function makes a POST request to our local API endpoint (/tts).
    • The backend route at /tts/route.js receives the request, attaches the secret API keys, and forwards the request to the official Naver CLOVA API.
    • It returns the audio data (as an MP3 blob), which the frontend then plays in the <audio> element.
  5. Sound File Playback:

    • Scripts in scenario.json can reference sound files using a special format, e.g., #(meditation1.wav).
    • The parseMsg function detects this pattern and plays the corresponding file from the public/ directory.

Citing Our Work

If you use this code or our findings in your own research, please cite our paper:

@inproceedings{Anonymous2025Fostering,
  author = {Anonymous Author(s)},
  title = {Fostering Self-Exploration for Mental Health: User Experiences with Content-Based Active Listening Voice Agents},
  year = {2025},
  booktitle = {},
  publisher = {},
  address = {},
  pages = {}
}

About

Listeners enjoy listening.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published