This Smart Learning Platform aims to revolutionize course discovery by aggregating learning resources from various providers, enhancing search capabilities with AI-powered keywords, and providing real-time course recommendations. Built with scalability in mind, the platform serves as a bridge between learners and educational content providers.
Key Technologies & Choices:
- Oracle XE Database: Chosen due to academic requirements and professor's recommendation for relational database management
- Streamlit: Rapid web interface development for Python-based data applications
- Gemini Flash 1.5: For efficient keyword generation and natural language processing
- Spark MLlib & Kafka: Real-time processing and scalable machine learning pipelines
- Docker: Containerization for portable and reproducible environments
- Selenuim: Webscraping !
Goal: Match courses with instructors/organizations
- Processed initial dataset (courses.csv) containing:
- Course ID, Category, Subcategory, Title
- Developed matching algorithm to associate courses with providers:
- Web scraping (e.g., Coursera, Google Maps)
- Manual verification for data accuracy
- Output: Enriched dataset with provider information
Components:
-
Oracle XE Database:
- Designed using UML relational model
- Tables: Courses, Providers, Categories, Keywords, Users
- Implemented data quality checks:
- Null value validation
- Category consistency checks
- Provider verification
-
AI-Powered Search:
- Integrated Gemini Flash 1.5 for keyword generation
- Search features:
- Category filtering
- Keyword-based search
- Provider-based filtering
-
Streamlit Interface:
- Course browsing dashboard
- Advanced search functionality
- Admin panel for data management
Scalable Architecture:
- Content-Based Filtering:
- Cosine similarity for course recommendations
- TF-IDF vectorization using Spark MLlib
- Real-Time Pipeline:
- Kafka for streaming user interactions
- Spark Streaming for real-time processing
- Dockerized the architecture
The project architecture :
├── Application
│ ├── app.py
│ ├── models.py
│ └── tools.py
├── Data
│ ├── CSVs
│ │ └── *.csv # all CSVs
│ └── DDL
│ └── *.sql # all tables DDL
├── Scripts
│ ├── Data Insertion
│ │ ├── data_aliementation.ipynb
│ │ ├── power_database.py
│ │ └── power_database2.py
│ ├── Data Transformation
│ │ ├── Data_transformation.ipynb
│ │ └── Data_transformation_2.ipynb
│ ├── Scraping
│ │ ├── GoogleMaps_scraping.ipynb
│ │ └── web_scraper_coursera.py
│ ├── Test scripts
│ │ ├── test-kafka.py
│ │ └── test-spark.py
│ └── quality_augmentation.py # generating the keywords and data quality
└── config
├── Dockerfile
├── docker-compose.yml
├── notebooks
├── oracle
│ └── instantclient-basic-linux.x64-23.7.0.25.01.zip
├── spark_scripts
│ ├── consumer.py
│ ├── courses.csv
│ ├── model_studio.ipynb
│ └── note.txt
└── startup.sh # launched on spark master ! to start the jupyter lab
- Clone Repository
git clone https://github.com/yChaaby/Smart-Learning-Platfome.git cd Smart-Learning-Platfome```
- Ensure Oracle XE is running
Update database credentials in connection function:
./Application/tools.py
def connect_to_oracle():
try:
#locate the odbc
cx_Oracle.init_oracle_client(lib_dir="/Users/youssefchaabi/instantclient_19_16")
except:
print("An exception occurred")
# Remplace les valeurs ci-dessous par tes propres paramètres de connexion
dsn = cx_Oracle.makedsn("localhost", "1521", service_name="XEPDB1")
conn = cx_Oracle.connect(user="your_user", password="your_password", dsn=dsn)
return conn
3.Start the Big Data Services
cd config
docker-compose up -d --build
./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic interaction # on kafka node
/startup.sh # Run on Spark master node
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 consumer.py # run the spark streming consumer on master node !
- Launch Application
streamlit run Application/app.py
Access Points
Web UI: http://localhost:8501
Spark Master: http://localhost:8080
Kafka Server: http://localhost:9094
Oracle Server: http://localhost:1521
Allow 2-3 minutes for containers to initialize
Ensure port conflicts don't occur (8501/8080/9000)
Oracle credentials must match your database instance
Make sure that Docker, Streamlit, OracleODBC and GeminiAPI are installed
This project demonstrates a complete pipeline for educational resource aggregation and intelligent recommendation, combining:
- AI-enhanced data enrichment (Gemini keywords)
- Enterprise-grade database management (Oracle XE)
- Modern big data processing (Spark/Kafka)
Special thanks to our valuable contributors:
🙏 Your contributions were essential in establishing the project's foundation!