Merge branch 'master' into patch-17

vpollo11 · web-flow · commit 03a0da521ebd · 2025-11-05T15:50:32.000+01:00
diff --git a/subjects/ai/matrix-factorization/README.md b/subjects/ai/matrix-factorization/README.md
@@ -19,31 +19,69 @@ The goal of this project is to understand and apply advanced matrix factorizatio
 1. **Download the [MovieLens Dataset](https://grouplens.org/datasets/movielens/1m/)** (ratings, users, and movies).
 2. Preprocess the dataset to remove null values and prepare it for matrix factorization.
 3. Create a user-item interaction matrix from the data.
+4. Split the data into training and testing sets using a fixed `random_state = 42`.
+5. Normalize the user–item interaction matrix and save it under `processed/user_item_matrix.csv`.
 
 #### Singular Value Decomposition (SVD) Model
 
 1. Implement the SVD algorithm using the **scipy.sparse.linalg.svds** function for matrix factorization.
 2. Train the SVD model on the MovieLens dataset to generate predicted ratings for all users.
+3. Compute RMSE on the test set and append the value to `reports/model_metrics.json`.
+4. Save the full predicted rating matrix as `reports/svd_predictions.npy`.
 
 #### Probabilistic Matrix Factorization (PMF) Model
 
 1. Implement the PMF algorithm.
 2. Train the PMF model and visualize the model's convergence (e.g., plot Mean Squared Error over iterations).
+3. During training, log the Mean Squared Error (MSE) at each iteration/epoch.
+4. Generate and save a convergence plot (`MSE vs. iteration`) as `reports/pmf_convergence.png`.
+5. Save the learned latent factor matrices (`U` and `V`) under `reports/pmf_factors/`.
 
 #### Model Comparison and Evaluation
 
 1. Compare the performance of SVD and PMF using evaluation metrics such as **Mean Squared Error (MSE)**.
 2. Provide visual comparisons between the models using **matplotlib** to plot predicted vs. actual ratings.
+3. Save consolidated evaluation results as JSON:
+   `reports/model_metrics.json`
+
+   Example format:
+
+   ```json
+   {
+     "SVD_RMSE": 0.91,
+     "PMF_RMSE": 0.85,
+     "PMF_vs_SVD_improvement_%": 6.6
+   }
+   ```
+
+   - Generate and save comparison plots:
+     - Predicted vs Actual ratings: `reports/predicted_vs_actual.png`
+     - RMSE comparison (bar chart): `reports/rmse_comparison.png`
+   - Minimum expected performance:
+     - SVD RMSE ≤ 0.90
+     - PMF RMSE ≤ 0.85
+     - PMF improvement ≥ 5% over SVD
 
 #### Recommendation Generation
 
 1. Implement a function that generates movie recommendations for a user based on the predicted ratings from both the SVD and PMF models.
 2. Display top-rated movies for users and compare recommendations from both models.
+3. Implement in `utils/recommendation.py`:
+
+```python
+def generate_recommendations(user_id, model, top_n=10):
+    ...
+```
+
+4. Save the top-10 recommendations for each evaluated user in `reports/user_<id>_recommendations.csv`
 
 #### Analysis and Visualization
 
 1. Provide visualizations comparing SVD and PMF predictions for the same user.
 2. Offer insights into how the models differ in recommending movies for specific users based on their ratings history.
+3. Save the following plots under `reports/`:
+   - `user_comparison.png` — SVD vs PMF predictions for a selected user
+   - `top_recommendations.png` — Histogram (or bar chart) of top recommended movies
 
 #### Streamlit Dashboard
 
@@ -52,6 +90,7 @@ The goal of this project is to understand and apply advanced matrix factorizatio
    - Movie recommendations from both the **SVD** and **PMF** models.
    - Visual comparison of the SVD vs. PMF predictions for the user.
 2. Ensure real-time interaction, with recommendations and visualizations updating dynamically based on user input.
+3. The app must run successfully via: `streamlit run app.py`
 
 ### Project Repository Structure
 
@@ -72,6 +111,15 @@ matrix-factorization-project/
 │   ├── matrix_creation.py
 │   ├── recommendation.py
 │
+├── reports/
+│   ├── model_metrics.json
+│   ├── pmf_convergence.png
+│   ├── rmse_comparison.png
+│   ├── predicted_vs_actual.png
+│   ├── user_comparison.png
+│   ├── top_recommendations.png
+│   └── user_<id>_recommendations.csv
+│
 ├── app.py
 ├── requirement.txt
 ├── Movie_Recommender_System.ipynb
@@ -85,20 +133,6 @@ matrix-factorization-project/
 - **Movie_Recommender_System.ipynb**: A notebook for initial experiments, data exploration, and visualization of the model training and recommendations.
 - **README.md**: Project documentation with an overview of the recommender system, instructions for setup and running the dashboard, and additional resources.
 
-### Timeline (1-2 weeks)
-
-**Week 1:**
-
-- **Days 1-2:** Load and preprocess the dataset, create user-item interaction matrix.
-- **Days 3-4:** Implement and train the SVD model.
-- **Days 5-7:** Implement and train the PMF model, visualize MSE vs. iterations for PMF.
-
-**Week 2:**
-
-- **Days 1-2:** Compare SVD and PMF models, evaluate using MSE.
-- **Days 3-4:** Implement recommendation generation for both models.
-- **Days 5-7:** Build the Streamlit dashboard, create visualizations, and finalize the project.
-
 ### Tips
 
 Remember, a great recommender system needs to understand both the users and the content. Keep in mind the trade-off between model complexity and interpretability. Here are some additional considerations:
diff --git a/subjects/ai/matrix-factorization/audit/README.md b/subjects/ai/matrix-factorization/audit/README.md
@@ -8,6 +8,14 @@
 
 ###### Is there a `requirements.txt` or `environment.yml` file listing all necessary libraries and their versions?
 
+###### Do the core files exist: `app.py`, `models/svd_model.py`, `models/pmf_model.py`, and `utils/recommendation.py`?
+
+###### Do the main dependencies import without error?
+
+```bash
+python -c "import numpy, pandas, scipy, streamlit, matplotlib"
+```
+
 ##### Data Processing and Exploratory Data Analysis
 
 ###### Is there an exploratory data analysis notebook describing insights from the MovieLens dataset?
@@ -16,6 +24,10 @@
 
 ###### Has a user-item interaction matrix been created from the data?
 
+###### Was a reproducible split used (e.g., `random_state = 42`)?
+
+###### Does the normalized user–item matrix exist at `processed/user_item_matrix.csv`
+
 ##### Matrix Factorization Models
 
 ###### Has the Singular Value Decomposition (SVD) model been implemented using scipy.sparse.linalg.svds?
@@ -24,6 +36,12 @@
 
 ###### Have both models been trained on the MovieLens dataset?
 
+###### Is the SVD predicted rating matrix saved as `reports/svd_predictions.npy`?
+
+###### Does the PMF implementation save a convergence plot (`reports/pmf_convergence.png`)?
+
+###### Are the learned factor matrices (`U`, `V`) saved (e.g., under `reports/pmf_factors/`)?
+
 ##### Model Evaluation
 
 ###### Is the Root Mean Square Error (RMSE) calculated for both models on a test set?
@@ -36,12 +54,38 @@
 
 ###### Is there a justification for when to stop training based on the learning curves?
 
+###### Does `reports/model_metrics.json` exist with fields:
+
+```json
+{
+  "SVD_RMSE": ...,
+  "PMF_RMSE": ...,
+  "PMF_vs_SVD_improvement_%": ...
+}
+```
+
+###### Are the following thresholds met?
+
+- SVD RMSE ≤ 0.90
+- PMF RMSE ≤ 0.85
+- PMF improvement ≥ 5%
+- Are the plots saved? `reports/rmse_comparison.png` and `reports/predicted_vs_actual.png`.
+
 ##### Recommendation Generation
 
 ###### Is there a function that generates movie recommendations for a user based on both SVD and PMF models?
 
 ###### Does the recommendation system return the top 10 movie recommendations for a given user?
 
+###### Does `utils/recommendation.py` expose:
+
+```python
+def generate_recommendations(user_id, model, top_n=10):
+    ...
+```
+
+###### Are user-level outputs saved as `reports/user_<id>_recommendations.csv`
+
 ##### Model Interpretability
 
 ###### Is there an analysis of the key latent factors that drive recommendations (global interpretability)?
@@ -58,12 +102,25 @@
 
 ###### For the 2 users from the training set, is there an analysis of why the recommendations were accurate for one and less accurate for the other?
 
+###### Are required visuals present in `reports/` with proper titles and labeled axes?
+
+- `pmf_convergence.png`
+- `rmse_comparison.png`
+- `predicted_vs_actual.png`
+- `user_comparison.png`
+
 ##### Streamlit Dashboard
 
 ###### Has a Streamlit dashboard been implemented?
 
 ###### Does the dashboard take a user ID as input and return recommendations and required visualizations?
 
+###### Does `streamlit run app.py` launch the dashboard successfully?
+
+###### Does the dashboard update recommendations dynamically on user ID input?
+
+###### Does it handle invalid user IDs gracefully (error shown, no crash)?
+
 ##### Additional Considerations
 
 ###### Is the code well-documented and following these good coding practices:
diff --git a/subjects/ai/vision-track/README.md b/subjects/ai/vision-track/README.md
@@ -84,6 +84,74 @@ The primary goal of **VisionTrack** is to develop practical skills in building a
    - Evaluate the app's performance with multi-stream support using metrics like **precision**, **recall**, and **F1-score**.
    - Display performance analysis within the app to inform users of the detection and tracking accuracy.
 
+#### Validation
+
+To ensure project completeness and audit validation, include the following:
+
+1. **Model Artifacts**:
+   - Save all trained and optimized YOLO model weights in:
+     ```
+     models/checkpoints/
+     ├── best.pt
+     ├── best_quantized.onnx
+     └── config.yaml
+     ```
+   - Include logs or configuration files documenting training and optimization steps.
+
+2. **Evaluation Metrics**:
+   - Generate and save a report file:
+     reports/performance_metrics.json
+   - Example format:
+     ```json
+     {
+       "detection_precision": 0.92,
+       "detection_recall": 0.9,
+       "f1_score": 0.91,
+       "average_fps_per_stream": 18.5,
+       "average_latency_ms": 85.0
+     }
+     ```
+   - Minimum passing thresholds:
+     Precision ≥ 0.85
+     Recall ≥ 0.80
+     F1-score ≥ 0.85
+     Average FPS ≥ 15 (for 720p video)
+
+3. **Real-Time App Test**
+   - The app must run using:
+     ```
+     streamlit run app.py
+     ```
+   - The app should:
+     Display real-time detection overlays and FPS/latency counters.
+     Allow toggling of detection and tracking features per stream.
+     Handle missing or broken video sources gracefully.
+
+4. **ROI Counting Validation**
+   - Demonstrate ROI-based counting of people entering/exiting the region.
+   - Save examples in:
+     ```
+     reports/demo_results/
+     ├── roi_counting_example.png
+     └── multi_stream_demo.mp4
+     ```
+
+5. **GPU and Fallback Test**
+   - Check for CUDA availability in your code:
+     ```
+     import torch
+     print("Using CUDA:", torch.cuda.is_available())
+     ```
+   - The app must still run on CPU if CUDA is unavailable (with lower FPS
+
+6. **Error Handling**
+   - The app must not crash on missing files or failed streams.
+
+   - Log errors to:
+     ```
+     logs/app_errors.log
+     ```
+
 ### Project Repository Structure
 
 ```
@@ -97,15 +165,24 @@ vision-track/
 ├── models/
 │   ├── yolo_person_detection.py
 │   └── __init__.py
+│   └── /checkpoints/
+│     ├── best.pt
+│     ├── best_quantized.onnx
+│     └── config.yaml
 │
 ├── utils/
 │   ├── data_loader.py
 │   ├── preprocessing.py
 │   ├── multi_stream_tracking_helpers.py
 │   ├── counting_logic.py
+│   ├── VisionTrack_Analysis.ipynb
 │   └── __init__.py
 │
-├── app.py                   # Streamlit app for running multi-stream detection, tracking, and counting
+├── reports/demo_results/
+│    ├── roi_counting_example.png
+│    └── multi_stream_demo.mp4
+│
+├── app.py
 ├── README.md                # Project overview and setup instructions
 └── requirements.txt         # List of dependencies
 ```
diff --git a/subjects/ai/vision-track/audit/README.md b/subjects/ai/vision-track/audit/README.md
@@ -8,6 +8,8 @@
 
 ###### Is a `requirements.txt` file included with all dependencies and specific library versions required to run the project?
 
+###### import test `python -c "import torch, supervision, cv2, streamlit"`
+
 ##### Data Processing and Exploratory Data Analysis
 
 ###### Does the Jupyter notebook (`VisionTrack_Analysis.ipynb`) include EDA showcasing data distribution, object detection samples, and preprocessing methods?
@@ -16,6 +18,9 @@
 
 ###### Does data preprocessing include resizing and normalization, ensuring compatibility with YOLO model input formats?
 
+- Validation of YOLO-compatible annotations (.txt files with class, x, y, w, h).
+- Confirm frames are resized and normalized properly before inference.
+
 ##### Model Implementation
 
 ###### Is the YOLO model implemented for person detection with configuration options for detection thresholds and class-specific tuning?
@@ -32,6 +37,8 @@
 
 ###### Does the project include logic for tracking and counting entries and exits within specified regions of interest (ROIs)?
 
+###### Check that trained weights are saved in: `models/checkpoints/best.pt`
+
 ##### Streamlit App Development
 
 ###### Is the **Streamlit** app implemented to display video feeds with overlaid detection, tracking, and counting information?
@@ -56,6 +63,38 @@
 
 ###### Are evaluation metrics presented, showcasing precision, recall, and F1-score to assess the effectiveness of detection and tracking?
 
+###### Check:
+
+- Require metrics file:
+
+  ```
+  reports/performance_metrics.json
+  ```
+
+- Validate JSON includes:
+
+  ```json
+  {
+  "detection_precision": ...,
+  "detection_recall": ...,
+  "f1_score": ...,
+  "average_fps_per_stream": ...,
+  "average_latency_ms": ...
+  }
+  ```
+
+- Add minimum thresholds:
+
+  Precision ≥ 0.85
+
+  Recall ≥ 0.80
+
+  F1 ≥ 0.85
+
+  FPS ≥ 15 (720p)
+
+- Add check that metrics are visible in Streamlit dashboard (FPS + latency shown live).
+
 ##### Additional Considerations
 
 ###### Does the codebase is documented with comments and explanations for readability and maintainability?
diff --git a/subjects/guess-it-1/README.md b/subjects/guess-it-1/README.md
@@ -26,7 +26,7 @@ Each of the numbers will be your standard input and the purpose of your program
 This range should have a space separating the lower limit from the upper one like in the example:
 
 ```console
->$ ./your_program
+$ ./your_program
 189 --> the standard input
 120 200    --> the range for the next input, in this case for the number 113
 113 --> the standard input
diff --git a/subjects/rt/README.md b/subjects/rt/README.md