Hearly is a cloud-native platform designed to simplify and automate the transcription and summarization of audio content.
Developed as part of the Cloud Systems course of the Master's Degree in Computer Science at the University of Catania, Hearly transforms unstructured audio into accessible, searchable, and structured information.
With just a few clicks, users can upload audio files in common formats (MP3, WAV, etc.) and receive:
- ✅ Automatic transcription powered by AWS Transcribe
- 🧠 Smart summarization using OpenAI GPT-4o via Azure
Each user has access to a personal dashboard with analytics such as:
- Detected language
- Audio duration
- Upload frequency
- And more...
Hearly supports two deployment modes to fit different scalability needs:
- Frontend and backend hosted on Amazon EC2
- Direct integration with AWS services for processing and storage
- Services are containerized and deployed via Kubernetes
- Greater scalability and flexibility
- Still integrated with core AWS services
| Service | Purpose |
|---|---|
| Amazon S3 | Stores audio files, transcriptions, and summaries |
| AWS Transcribe | Performs speech-to-text transcription |
| Azure OpenAI GPT-4o | Generates contextual and concise summaries |
| Amazon DynamoDB | Stores metadata and user information |
| AWS Cognito | Manages secure user registration and authentication |
| AWS Lambda | Serverless functions to automate and orchestrate transcription workflows |
| Amazon EventBridge / SNS | Event-driven architecture for triggering Lambda functions |
Hearly uses two main AWS Lambda functions to automate its processing pipeline:
-
lambda-audio-transcribe
Triggered when the user clicks the "Transcribe" button in the web interface. It starts a transcription job using AWS Transcribe. -
transcribe-status-updater
Periodically checks the status of transcription jobs and updates the system when transcriptions are complete.
These functions are available in two variants:
- Using Amazon EventBridge (default in the current deployment)
- Using Amazon SNS (alternative version also implemented)
Processing Flow:
- The user registers or logs in through a secure flow managed by AWS Cognito
- The authenticated user uploads an audio file via the web app
- The file is stored in Amazon S3
- When the user presses "Transcribe", the
lambda-audio-transcribefunction is invoked - AWS Transcribe processes the audio and starts a transcription job
transcribe-status-updatermonitors the transcription job status and updates the system- Once the transcription is complete, it is summarized using GPT-4o on Azure OpenAI
- All results and metadata are saved in DynamoDB
- The user's dashboard displays transcription, summary, and detailed statistics
- 🎙️ Upload MP3, WAV, and other common audio formats
- ✍️ Get accurate transcriptions in minutes
- 🧾 Receive short, meaningful summaries
- 📈 Monitor activity through the dashboard
- 🔐 Register and login securely via AWS Cognito
Hearly.mov
This project is developed for academic purposes and is distributed under the MIT License.
Note: Due to infrastructure costs, the platform may not be publicly available at all times.
For demo access or more information, feel free to contact the authors.
