|
| 1 | +# EVAT Voice Assistant – Trimester 3 |
| 2 | + |
| 3 | +**Author:** Mohtashim Misbah |
| 4 | +**Date:** Week 5 |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Table of Contents |
| 9 | +- [1. Project Overview](#1-project-overview) |
| 10 | +- [2. Voice Interaction Architecture](#2-voice-interaction-architecture) |
| 11 | +- [3. In-Scope Functionality](#3-in-scope-functionality) |
| 12 | +- [4. Out-of-Scope Functionality](#4-out-of-scope-functionality) |
| 13 | +- [5. Deliverables](#5-deliverables) |
| 14 | +- [6. Justification for Scope](#6-justification-for-scope) |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## 1. Project Overview |
| 19 | + |
| 20 | +In this trimester, I am working on building the EVAT Voice Assistant. The goal of this system is to help users access EV-related information more easily by asking simple, natural-language questions. Instead of navigating dashboards or searching through menus, users should be able to ask things like *“Is the charger busy?”* or *“How much would a 150 km EV trip cost?”* and get a quick answer. |
| 21 | + |
| 22 | +To keep the scope realistic, I decided to make the backend fully **text-based**, while the frontend team can optionally add the voice input and output. This way, the backend stays focused on the core intelligence of the assistant: intent detection, entity extraction, connecting to the EVAT models, and generating responses. This structure also aligns with how real voice assistants like Alexa or Google Assistant handle their backend systems. |
| 23 | + |
| 24 | +This project is meant to create a strong foundation that future trimesters can expand into a full multi-turn conversational assistant. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## 2. Voice Interaction Architecture |
| 29 | + |
| 30 | +Even though this is called a “Voice Assistant,” the backend won’t process any audio. Instead, the workflow is simple and clean: |
| 31 | + |
| 32 | +1. **User speaks into the mobile or web app.** |
| 33 | + The app records the audio. |
| 34 | + |
| 35 | +2. **Frontend converts the speech to text.** |
| 36 | + They can use tools like: |
| 37 | + - Web Speech API |
| 38 | + - Whisper |
| 39 | + - Google Speech-to-Text |
| 40 | + - Any STT the app team chooses |
| 41 | + |
| 42 | +3. **The backend receives the text version of the query.** |
| 43 | + Example: “How busy is the Burwood charger?” |
| 44 | + |
| 45 | + Example request structure (placeholder): |
| 46 | + |
| 47 | +```json |
| 48 | +{ |
| 49 | + "query": "<user text query>" |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +4. **I process the text on the backend:** |
| 54 | + - Detect the user’s intent |
| 55 | + - Extract important information (like location or distance) |
| 56 | + - Pass it to the correct EVAT use case |
| 57 | + - Generate a clear and helpful response |
| 58 | + |
| 59 | +5. **The frontend displays the text response** or turns it back into audio if needed. |
| 60 | + |
| 61 | +This architecture keeps things modular and avoids overcomplicating the backend. It also makes the system easier to test and scale later. |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## 3. In-Scope Functionality |
| 66 | + |
| 67 | +### 3.1 Intent Classification |
| 68 | +I will build a simple intent classifier that can recognise the main types of questions users might ask. For now, I am focusing on three core intents: |
| 69 | + |
| 70 | +- **Congestion Status Query** |
| 71 | + Example: “Is the Burwood charger busy right now?” |
| 72 | + |
| 73 | +- **Trip Cost Comparison** |
| 74 | + Example: “How much would a 150 km EV trip cost?” |
| 75 | + |
| 76 | +- **Help / Unsupported Query** |
| 77 | + Example: “What can you do?” |
| 78 | + |
| 79 | +These cover the most important and realistic use cases for this trimester. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +### 3.2 Entity Extraction |
| 84 | +The system will pull out key information from user queries, such as: |
| 85 | + |
| 86 | +- Location names |
| 87 | +- Distances |
| 88 | +- Any values the models need to run |
| 89 | + |
| 90 | +This helps the assistant generate more accurate and personalised responses. |
| 91 | + |
| 92 | +--- |
| 93 | + |
| 94 | +### 3.3 Integration with EVAT Use Cases |
| 95 | +I will integrate the Voice Assistant with two EVAT models that are already well-developed: |
| 96 | + |
| 97 | +#### 1. Congestion Prediction |
| 98 | +- Provides charger busyness |
| 99 | +- Gives estimated wait times |
| 100 | +- Helps users plan ahead |
| 101 | + |
| 102 | +#### 2. EV vs Petrol Trip Cost Comparison |
| 103 | +- Calculates EV trip cost |
| 104 | +- Calculates petrol cost |
| 105 | +- Helps users compare both options quickly |
| 106 | + |
| 107 | +These use cases are mature enough to work reliably with natural-language inputs. |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +### 3.4 Response Generation |
| 112 | +The assistant will return short, easy-to-understand answers. |
| 113 | +Examples: |
| 114 | + |
| 115 | +- “The Burwood charger is moderately busy with a 5-minute wait.” |
| 116 | +- “A 150 km trip would cost about $18 for EV and around $24 for petrol.” |
| 117 | + |
| 118 | +My goal is to keep the responses simple and practical. |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +### 3.5 Logging |
| 123 | +To meet HD-level requirements, I will also implement logging for: |
| 124 | + |
| 125 | +- Inputs |
| 126 | +- Detected intents |
| 127 | +- Extracted entities |
| 128 | +- System responses |
| 129 | + |
| 130 | +This will help evaluate the performance in Week 8 and identify areas for improvement. |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +## 4. Out-of-Scope Functionality |
| 135 | + |
| 136 | +### 4.1 Additional EVAT Use Cases |
| 137 | +At this stage, I am **not** integrating with the other EVAT use cases because their models aren’t fully ready or they require more complex logic. Examples include: |
| 138 | + |
| 139 | +- Environmental impact |
| 140 | +- Gamification |
| 141 | +- Charger rental |
| 142 | +- Usage insights |
| 143 | +- Weather-based routing |
| 144 | +- Reliability scoring |
| 145 | +- Site suitability |
| 146 | +- Demand forecasting |
| 147 | + |
| 148 | +These can be added in future trimesters. |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +### 4.2 Audio Processing |
| 153 | +The backend will not: |
| 154 | +- Process audio |
| 155 | +- Handle speech recognition |
| 156 | +- Convert text to speech |
| 157 | + |
| 158 | +All of this is handled by the frontend team. |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | +### 4.3 Multi-Turn Conversations |
| 163 | +The assistant will not remember previous queries or support follow-up questions. |
| 164 | +Each query is treated independently. |
| 165 | + |
| 166 | +--- |
| 167 | + |
| 168 | +### 4.4 UI Development |
| 169 | +UI work is outside my scope. |
| 170 | +The frontend team will manage: |
| 171 | +- Input boxes |
| 172 | +- Voice buttons |
| 173 | +- Display responses |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## 5. Deliverables |
| 178 | + |
| 179 | +By Week 10, I plan to deliver: |
| 180 | + |
| 181 | +- A working Voice Assistant backend |
| 182 | +- Intent classifier + entity extraction |
| 183 | +- Integrated handlers for congestion and cost comparison |
| 184 | +- A `/voice/query` production API |
| 185 | +- Logging + evaluation results |
| 186 | +- Full documentation (architecture, mapping, API details) |
| 187 | +- A short demo for the mentor/panel |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## 6. Justification for Scope |
| 192 | + |
| 193 | +I chose this scope because it is achievable within the trimester, avoids unnecessary complexity, and focuses on delivering real value. This setup: |
| 194 | + |
| 195 | +- Fits the 6-week timeline |
| 196 | +- Uses EVAT models that are already stable |
| 197 | +- Minimises dependencies on other teams |
| 198 | +- Keeps the backend clean and realistic |
| 199 | +- Sets up a strong base for future expansion |
| 200 | +- Meets HD expectations by including logging, evaluation, and clear architecture |
| 201 | + |
| 202 | +Overall, this scope gives the data team something functional and meaningful while keeping the workload manageable and focused. |
| 203 | + |
0 commit comments