Skip to content

Commit 777af55

Browse files
author
Mohtashim
committed
Add Voice Assistant README
1 parent bc72c14 commit 777af55

File tree

2 files changed

+203
-1
lines changed

2 files changed

+203
-1
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# EVAT Voice Assistant – Trimester 3
2+
3+
**Author:** Mohtashim Misbah
4+
**Date:** Week 5
5+
6+
---
7+
8+
## Table of Contents
9+
- [1. Project Overview](#1-project-overview)
10+
- [2. Voice Interaction Architecture](#2-voice-interaction-architecture)
11+
- [3. In-Scope Functionality](#3-in-scope-functionality)
12+
- [4. Out-of-Scope Functionality](#4-out-of-scope-functionality)
13+
- [5. Deliverables](#5-deliverables)
14+
- [6. Justification for Scope](#6-justification-for-scope)
15+
16+
---
17+
18+
## 1. Project Overview
19+
20+
In this trimester, I am working on building the EVAT Voice Assistant. The goal of this system is to help users access EV-related information more easily by asking simple, natural-language questions. Instead of navigating dashboards or searching through menus, users should be able to ask things like *“Is the charger busy?”* or *“How much would a 150 km EV trip cost?”* and get a quick answer.
21+
22+
To keep the scope realistic, I decided to make the backend fully **text-based**, while the frontend team can optionally add the voice input and output. This way, the backend stays focused on the core intelligence of the assistant: intent detection, entity extraction, connecting to the EVAT models, and generating responses. This structure also aligns with how real voice assistants like Alexa or Google Assistant handle their backend systems.
23+
24+
This project is meant to create a strong foundation that future trimesters can expand into a full multi-turn conversational assistant.
25+
26+
---
27+
28+
## 2. Voice Interaction Architecture
29+
30+
Even though this is called a “Voice Assistant,” the backend won’t process any audio. Instead, the workflow is simple and clean:
31+
32+
1. **User speaks into the mobile or web app.**
33+
The app records the audio.
34+
35+
2. **Frontend converts the speech to text.**
36+
They can use tools like:
37+
- Web Speech API
38+
- Whisper
39+
- Google Speech-to-Text
40+
- Any STT the app team chooses
41+
42+
3. **The backend receives the text version of the query.**
43+
Example: “How busy is the Burwood charger?”
44+
45+
Example request structure (placeholder):
46+
47+
```json
48+
{
49+
"query": "<user text query>"
50+
}
51+
```
52+
53+
4. **I process the text on the backend:**
54+
- Detect the user’s intent
55+
- Extract important information (like location or distance)
56+
- Pass it to the correct EVAT use case
57+
- Generate a clear and helpful response
58+
59+
5. **The frontend displays the text response** or turns it back into audio if needed.
60+
61+
This architecture keeps things modular and avoids overcomplicating the backend. It also makes the system easier to test and scale later.
62+
63+
---
64+
65+
## 3. In-Scope Functionality
66+
67+
### 3.1 Intent Classification
68+
I will build a simple intent classifier that can recognise the main types of questions users might ask. For now, I am focusing on three core intents:
69+
70+
- **Congestion Status Query**
71+
Example: “Is the Burwood charger busy right now?”
72+
73+
- **Trip Cost Comparison**
74+
Example: “How much would a 150 km EV trip cost?”
75+
76+
- **Help / Unsupported Query**
77+
Example: “What can you do?”
78+
79+
These cover the most important and realistic use cases for this trimester.
80+
81+
---
82+
83+
### 3.2 Entity Extraction
84+
The system will pull out key information from user queries, such as:
85+
86+
- Location names
87+
- Distances
88+
- Any values the models need to run
89+
90+
This helps the assistant generate more accurate and personalised responses.
91+
92+
---
93+
94+
### 3.3 Integration with EVAT Use Cases
95+
I will integrate the Voice Assistant with two EVAT models that are already well-developed:
96+
97+
#### 1. Congestion Prediction
98+
- Provides charger busyness
99+
- Gives estimated wait times
100+
- Helps users plan ahead
101+
102+
#### 2. EV vs Petrol Trip Cost Comparison
103+
- Calculates EV trip cost
104+
- Calculates petrol cost
105+
- Helps users compare both options quickly
106+
107+
These use cases are mature enough to work reliably with natural-language inputs.
108+
109+
---
110+
111+
### 3.4 Response Generation
112+
The assistant will return short, easy-to-understand answers.
113+
Examples:
114+
115+
- “The Burwood charger is moderately busy with a 5-minute wait.”
116+
- “A 150 km trip would cost about $18 for EV and around $24 for petrol.”
117+
118+
My goal is to keep the responses simple and practical.
119+
120+
---
121+
122+
### 3.5 Logging
123+
To meet HD-level requirements, I will also implement logging for:
124+
125+
- Inputs
126+
- Detected intents
127+
- Extracted entities
128+
- System responses
129+
130+
This will help evaluate the performance in Week 8 and identify areas for improvement.
131+
132+
---
133+
134+
## 4. Out-of-Scope Functionality
135+
136+
### 4.1 Additional EVAT Use Cases
137+
At this stage, I am **not** integrating with the other EVAT use cases because their models aren’t fully ready or they require more complex logic. Examples include:
138+
139+
- Environmental impact
140+
- Gamification
141+
- Charger rental
142+
- Usage insights
143+
- Weather-based routing
144+
- Reliability scoring
145+
- Site suitability
146+
- Demand forecasting
147+
148+
These can be added in future trimesters.
149+
150+
---
151+
152+
### 4.2 Audio Processing
153+
The backend will not:
154+
- Process audio
155+
- Handle speech recognition
156+
- Convert text to speech
157+
158+
All of this is handled by the frontend team.
159+
160+
---
161+
162+
### 4.3 Multi-Turn Conversations
163+
The assistant will not remember previous queries or support follow-up questions.
164+
Each query is treated independently.
165+
166+
---
167+
168+
### 4.4 UI Development
169+
UI work is outside my scope.
170+
The frontend team will manage:
171+
- Input boxes
172+
- Voice buttons
173+
- Display responses
174+
175+
---
176+
177+
## 5. Deliverables
178+
179+
By Week 10, I plan to deliver:
180+
181+
- A working Voice Assistant backend
182+
- Intent classifier + entity extraction
183+
- Integrated handlers for congestion and cost comparison
184+
- A `/voice/query` production API
185+
- Logging + evaluation results
186+
- Full documentation (architecture, mapping, API details)
187+
- A short demo for the mentor/panel
188+
189+
---
190+
191+
## 6. Justification for Scope
192+
193+
I chose this scope because it is achievable within the trimester, avoids unnecessary complexity, and focuses on delivering real value. This setup:
194+
195+
- Fits the 6-week timeline
196+
- Uses EVAT models that are already stable
197+
- Minimises dependencies on other teams
198+
- Keeps the backend clean and realistic
199+
- Sets up a strong base for future expansion
200+
- Meets HD expectations by including logging, evaluation, and clear architecture
201+
202+
Overall, this scope gives the data team something functional and meaningful while keeping the workload manageable and focused.
203+

Use_Cases/Voice Assistant/code

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)