Authors: Dan Vicente (dan.vicente.ihanus@gmail.com), Alexander Gutell (alex.gutell@gmail.com)
This project was developed to gain experience in Dockerized applications and analysis of real-time streaming time series data. We built an end-to-end application that:
- Collects real-time traffic data from Trafikverket's API through a Kafka producer
- Processes streaming data using Apache Kafka
- Performs real-time analysis and feature engineering with PySpark
- Forecasts traffic conditions five minutes into the future using:
- An autoregressive Gated Recurrent Unit (GRU) built using PyTorch
- An autoregressive Moving Average model for prediction and comparison
Visualizes both current traffic conditions and predictions in an interactive dashboard
The application follows a microservices architecture with several components:Data Producer: Python service that queries Trafikverket's API and sends data to Kafka Kafka Broker: Message queue that ensures reliable data delivery between services Data Consumer: Processes and transforms incoming data streams ML Model Service: GRU for traffic prediction Visualization Service: Streamlit dashboard for real-time monitoring
All components are containerized with Docker and orchestrated with Docker Compose, allowing for easy scaling and deployment. The data flows through the system as follows: Trafikverket API → Producer → Kafka → Consumer → PySpark Processing → ML Prediction → Dashboard
This architecture enables real-time data processing with minimal latency while maintaining system reliability.
First, clone this repository to your local machine: Make sure you have Docker and Docker Compose installed on your system before proceeding.
This application requires a .env file with API credentials and configuration parameters.
Create one before starting:
- Create the
.envfile:
touch .env- Add the required parameters to the file:
# Trafikverket API key
echo "TRAFIKVERKET_API_KEY=your_api_key_here" >> .env
# Kafka config
echo "KAFKA_BOOTSTRAP_SERVERS=broker:29092" >> .env
echo "KAFKA_TOPIC=traffic_data" >> .env
# region and samples config
echo "REGION_ID=4" >> .env
echo "N_SAMPLES=100000" >> .envNote: Replace your_api_key_here with your actual Trafikverket API key. You can get an API key free of charge by registering at Trafikverket's API Portal.
- Install Docker and Docker Compose
- Create the
.envfile as described above - Run the application:
docker-compose up --build- Go to http://localhost:8501/ in your browser
- Wait 20 minutes until the model starts monitoring and forecasting, the model waits for data the first 20 minutes
producer/: Contains code for collecting data from Trafikverket APIconsumer/: Contains code for processing the Kafka streammodels/: Contains the models for traffic predictionvisualization/: Contains the Streamlit dashboard



