Wizard of Oz Interface for "Fostering Self-Exploration for Mental Health: User Experiences with Content-Based Active Listening Voice Agents"
This repository contains the source code for the Wizard of Oz (WoZ) web interface used in our research paper in submission.
The project investigates how content-based active listening—specifically the summarization and paraphrasing of a user's free-form speech—can foster self-exploration in a mental health context. This interface facilitated a human "wizard" (the researcher) to simulate an active listening voice agent in real-time.
Our study explores how a voice agent can move beyond generic replies ('um-hmm', 'that's good') to provide empathetic responses based on the content of what a user says.
To achieve this in a controlled lab setting, we used this WoZ application. It allowed a researcher to:
- Generate summarization and paraphrasing responses by typing them in real-time.
- Select pre-written conversational prompts from a script.
- Trigger audio playback for guided meditation.
This tool was central to answering our research questions on how users perceive and are affected by content-based empathetic responses from a voice agent.
- Framework: Next.js
- UI Library: React
- TTS Service: Naver CLOVA Voice (via a backend API route)
Follow these instructions to get a local copy up and running.
- Node.js (v18 or later)
- npm or yarn
- API credentials for Naver CLOVA Voice (Client ID and Client Secret)
-
Install NPM packages:
npm install
-
Set up environment variables: Create a new file named
.env.localin the root of the project and add your Naver CLOVA API credentials..env.localCLOVA_CLIENT_ID=your_client_id_here CLOVA_CLIENT_SECRET=your_client_secret_here -
Run the development server:
npm run dev
Open http://localhost:3000 with your browser to see the interface.
The application logic is contained primarily in src/app/page.js.
-
UI Components:
MessageSelector: Renders the tabs and buttons for pre-defined scripts, which are loaded fromsrc/app/assets/scenario.json. It also includes a text input for the wizard to type custom messages. This is where the wizard typed the custom summarization and paraphrasing responses discussed in the paper.Queue: Displays the list of messages that are waiting to be spoken by the agent.
-
State Management:
- A React state array
msgsholds the queue of messages (as strings) to be played.
- A React state array
-
Audio Playback:
- An HTML
<audio>element is used to play the generated speech or pre-recorded sound files. - When the wizard selects a message, it is added to the
msgsqueue. - The
onEndedevent of the audio player triggers the next message in the queue to be played, ensuring sequential playback.
- An HTML
-
Text-to-Speech (TTS) Flow:
- When a text message needs to be spoken, the
clovaGeneratefunction makes aPOSTrequest to our local API endpoint (/tts). - The backend route at
/tts/route.jsreceives the request, attaches the secret API keys, and forwards the request to the official Naver CLOVA API. - It returns the audio data (as an MP3 blob), which the frontend then plays in the
<audio>element.
- When a text message needs to be spoken, the
-
Sound File Playback:
- Scripts in
scenario.jsoncan reference sound files using a special format, e.g.,#(meditation1.wav). - The
parseMsgfunction detects this pattern and plays the corresponding file from thepublic/directory.
- Scripts in
If you use this code or our findings in your own research, please cite our paper:
@inproceedings{Anonymous2025Fostering,
author = {Anonymous Author(s)},
title = {Fostering Self-Exploration for Mental Health: User Experiences with Content-Based Active Listening Voice Agents},
year = {2025},
booktitle = {},
publisher = {},
address = {},
pages = {}
}