Skip to content

Commit b09d502

Browse files
authored
Merge pull request #22 from jvachier/jv/REAMDE_2
Updating README.
2 parents a79d9e5 + ee9fbf8 commit b09d502

File tree

2 files changed

+80
-13
lines changed

2 files changed

+80
-13
lines changed

README.md

Lines changed: 80 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,31 +19,85 @@ This repository provides a comprehensive solution for real-time **speech-to-text
1919

2020
*Figure: High-level workflow of the application, including speech-to-text, sentiment analysis, and translation.*
2121

22+
---
23+
24+
## Key Highlights
25+
26+
**From-Scratch Implementation**: Complete Transformer architecture built from the ground up, demonstrating deep understanding of attention mechanisms, positional encodings, and encoder-decoder architectures.
27+
28+
**Production-Ready Pipeline**: End-to-end system integrating speech recognition, sentiment classification, and neural machine translation in a single application.
29+
30+
**Research-Grade Code**: Clean, well-documented implementation suitable for educational purposes and research experimentation.
31+
32+
**Hyperparameter Optimization**: Automated tuning with Optuna for both sentiment and translation models.
33+
34+
---
35+
## Architecture
36+
37+
### Translation Transformer Model
38+
39+
The English-to-French translation system implements a **Transformer architecture built from scratch**. Rather than using pre-trained models or high-level APIs, this implementation provides full control over each component, from multi-head attention mechanisms to positional encodings.
40+
41+
![Transformer Architecture](docs/images/translation_transformer.jpeg)
42+
43+
*Figure: Detailed architecture of the Transformer model showing encoder-decoder structure with multi-head attention mechanisms.*
44+
45+
### Sentiment Analysis Model
46+
47+
The sentiment classifier uses a Bidirectional LSTM architecture:
48+
- Embedding layer for word representations
49+
- Bidirectional LSTM layers for capturing context from both directions
50+
- Dense layers with dropout for classification
51+
- Binary output (positive/negative sentiment)
2252

2353
---
2454

2555
## Features
2656

57+
### Speech Processing
58+
- **Real-time Speech-to-Text**: Audio capture and transcription using Vosk library
59+
- **English Language Support**: Optimized for US English accent (vosk-model-en-us-0.22)
60+
- **Downloadable Transcripts**: Export recognized text as `.txt` files
61+
2762
### Sentiment Analysis
28-
- **Speech-to-Text**: Converts spoken audio into text using the Vosk library.
29-
- **Text Preprocessing**: Uses TensorFlow's `TextVectorization` layer to tokenize and vectorize text data.
30-
- **Bidirectional LSTM Model**: Implements a deep learning model with embedding, bidirectional LSTM, and dense layers for sentiment classification.
31-
- **Training and Evaluation**: Includes functionality to train the model on a dataset and evaluate its performance on validation and test sets.
32-
- **Inference**: Provides an inference pipeline to predict sentiment for new text inputs.
33-
- **Interactive Application**: A Dash-based web application for real-time speech-to-text and sentiment analysis.
63+
- **Bidirectional LSTM Architecture**: Deep learning model with embedding and recurrent layers
64+
- **TensorFlow Text Processing**: Efficient tokenization and vectorization with `TextVectorization`
65+
- **Binary Classification**: Positive/negative sentiment prediction
66+
- **Hyperparameter Optimization**: Automated tuning with Optuna
67+
- **Alternative Architectures**: Optional BERT-based models for comparison
3468

3569
### English-to-French Translation
36-
- **Transformer Model**: Implements a sequence-to-sequence Transformer model for English-to-French translation.
37-
- **BLEU Score Evaluation**: Evaluates the quality of translations using the BLEU metric.
38-
- **Preprocessing**: Includes utilities for tokenizing and vectorizing English and French text.
39-
- **Model Saving and Loading**: Supports saving and loading trained Transformer models for reuse.
40-
- **Integration with Speech-to-Text**: Translates recognized speech from English to French in real-time.
70+
- **From-Scratch Transformer Implementation**: Full encoder-decoder architecture built without pre-trained models
71+
- **Custom Multi-Head Attention**: Manually implemented attention mechanisms with configurable heads
72+
- **Positional Encoding**: Hand-crafted sinusoidal position embeddings
73+
- **BLEU Score Evaluation**: Translation quality metrics for model assessment
74+
- **Flexible Architecture**: Easily configurable dimensions, layers, and attention heads
75+
- **Model Persistence**: Save and load trained models for inference
76+
- **Real-time Integration**: Seamless connection with speech-to-text pipeline
77+
78+
### Interactive Web Application
79+
- **Dash Framework**: Responsive web interface for real-time interaction
80+
- **Live Processing**: Instant speech recognition, sentiment analysis, and translation
81+
- **Visual Feedback**: Clear display of recognized text, sentiment, and translations
82+
- **Export Functionality**: Download transcripts for offline use
4183

4284
---
4385

44-
## Note on Models
86+
## Performance
87+
88+
Current model performance on test datasets:
4589

46-
The sentiment analysis and translation models included in this repository are **toy models** designed for demonstration purposes. They may not achieve production-level accuracy and are intended for educational and exploratory use.
90+
| Model | Metric | Score |
91+
|-------|--------|-------|
92+
| Sentiment Analysis (BiLSTM) | Test Accuracy | 95.00% |
93+
| Translation (Transformer) | Test Accuracy | 67.26% |
94+
| Translation (Transformer) | BLEU Score | 0.52 |
95+
96+
**Note on Model Status**: These models were **built from scratch as educational implementations** of the underlying architectures. The Transformer implementation provides a complete, working example of the attention mechanism without relying on pre-trained models or high-level abstractions. While they demonstrate solid understanding of these architectures, they are not optimized for production deployment. For production use, consider:
97+
- Training on larger datasets (millions of examples)
98+
- Increasing model capacity (more layers, larger dimensions)
99+
- Extended training duration with learning rate scheduling
100+
- Ensemble methods and model distillation
47101

48102
---
49103

@@ -240,3 +294,16 @@ Sentiment_Analysis/
240294
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.
241295

242296
---
297+
298+
## Citation
299+
300+
If you use this project in your research or work, please cite:
301+
302+
```bibtex
303+
@software{sentiment_translation_2025,
304+
author = {Vachier, Jeremy},
305+
title = {Sentiment Analysis and Translation},
306+
year = {2025},
307+
url = {https://github.com/jvachier/Sentiment_Analysis}
308+
}
309+
```
62.1 KB
Loading

0 commit comments

Comments
 (0)