docs: establish project roadmap, and add DEVLOG

anibalrojosan · anibalrojosan · commit 64bc23055869 · 2026-02-08T01:39:13.000-03:00
- Add technical roadmap with 4-phase progression for 2026 standards
- Initialize DEVLOG.md for tracking engineering decisions and learning.
- Update .gitignore to handle documentation artifacts.
diff --git a/.gitignore b/.gitignore
@@ -20,3 +20,6 @@ models/
 
 # Environment variables
 .env
+
+# Project files
+github_projects.md
diff --git a/doc/DEVLOG.md b/doc/DEVLOG.md
@@ -0,0 +1,25 @@
+# Developer Log: `Production-Ready MLOps Workflow`
+
+This log documents my experiences, challenges, lessons learned, and solutions found during the development of this project.
+
+---
+
+[2026-02-08]
+
+### Project Planning & Strategic Roadmap
+Today I established the strategic foundation to transform the project from a basic prototype into an industrial-grade **ML Engineering** system for 2026.
+
+#### Achievements
+- **Technical Roadmap Design:** Created a 4-phase plan addressing API modernization, lifecycle management (MLOps), observability/explainability (XAI), and Generative AI integration.
+- **Infrastructure Strategy:** Defined a "Cloud-Native" approach using containers, automated CI/CD, and deployment options on AWS (App Runner/ECS) or modern PaaS.
+- **GitHub Projects Setup:** Broke down the roadmap into a `github_projects.md` file with detailed issues, ready to be converted into Kanban board cards.
+- **Overfitting Mitigation Plan:** Identified that the current model's overfitting is not a bug, but an opportunity to implement data "Guardrails" with Pydantic and drift monitoring with Evidently AI.
+
+#### Challenges & Solutions
+- **Challenge:** The Breast Cancer dataset is generic and the model overfits.
+- **Solution:** Shifted the focus towards a "controlled simulation environment". Range validation and robustness tests will protect the model, demonstrating strong ML Engineering skills.
+
+#### Next Steps
+- Begin **Phase 1: API Modernization**.
+- The first issue will be `phase1-01: migrate flask api to fastapi` to gain asynchrony and native validation.
+- Configure the development environment with `uv` to ensure maximum speed in the feedback cycle.
diff --git a/doc/ROADMAP.md b/doc/ROADMAP.md
@@ -0,0 +1,75 @@
+# Technical Roadmap: `Production-Ready MLOps Workflow`
+
+This document outlines the strategic technical progression from a basic ML pipeline to a comprehensive **Machine Learning Engineering** system, focusing on robustness, observability, and 2026 AI trends.
+
+## Tech Stack 2026
+
+| Category | Tool / Technology | Purpose | Implementation |
+| --- | --- | --- | --- |
+| **Backend API** | **FastAPI** | High-performance asynchronous inference and OpenAPI docs. | Phase 1 |
+| **Frontend UI** | **Streamlit** | Interactive dashboard for prediction visualization. | Phase 1 |
+| **ML Lifecycle** | **MLflow** | Experiment tracking, model registry, and lineage. | Phase 2 |
+| **Validation** | **Pydantic v2** | Strict type validation and statistical range checks. | Phase 1 - 2 |
+| **Observability** | **Evidently AI** | Data Drift monitoring and production model quality. | Phase 3 |
+| **Trustworthy AI** | **SHAP / LIME** | Model Explainability (XAI) to mitigate overfitting. | Phase 3 |
+| **Agentic AI** | **PydanticAI / LangChain** | AI Agent to interpret simulated medical results. | Phase 4 |
+| **Tooling (DX)** | **uv / Ruff / MyPy** | Ultra-fast Rust-based dev stack and strict typing. | **All Phases** |
+| **DevOps** | **Docker / GH Actions** | Containerization and CI/CD pipelines with Quality Gates. | **All Phases** |
+
+---
+
+## 🟢 Phase 1: API Modernization & Robustness
+
+**Goal:** Replace the technical foundation with 2026 modern standards and ensure the model does not process invalid data.
+
+### Backend & API
+* **FastAPI Migration:** Replace Flask to gain automatic validation, asynchronous support, and interactive documentation (/docs).
+* **Data Guardrails:** Implement `Pydantic` validators to verify that inputs fall within the statistical ranges of the original dataset (active overfitting mitigation).
+
+### Tooling & Quality
+* **Strict Typing:** Configure `MyPy` to ensure the data flow between preprocessing and inference is 100% type-safe.
+* **Fast Linting:** Maintain `Ruff` usage to ensure PEP 8 code standards and software quality.
+
+---
+
+## 🟡 Phase 2: Lifecycle Management (MLOps Core)
+
+**Goal:** Transition from "training scripts" to professional "experiment management."
+
+### ML Lifecycle
+* **Experiment Tracking (MLflow):** Record every model run, comparing training vs. validation metrics to visually identify overfitting.
+* **Model Registry:** Implement a workflow where only models passing robustness tests are promoted to the API.
+
+### Data Engineering
+* **DVC (Data Version Control):** Version the `data.csv` dataset to ensure every prediction can be traced back to the exact data used for training.
+
+---
+
+## 🟠 Phase 3: Observability & Explainability (XAI)
+
+**Goal:** Demonstrate that the system can detect failures and explain its decisions.
+
+### Monitoring
+* **Drift Detection (Evidently AI):** Create a pipeline to compare real-time input data with training data. If the "patient" differs significantly from known data, the system generates a "Low Confidence" alert.
+* **Structured Logging:** Implement JSON-format logs for compatibility with modern observability stacks.
+
+### Trustworthy AI
+* **Interpretability (SHAP):** Add a layer that returns *why* "Malignant" was predicted (e.g., "Mean Radius > 15.2"). This turns an overfitted model into an auditable tool.
+* **Robustness Tests:** A `Pytest` suite that injects random noise into data to verify model consistency.
+
+---
+
+## 🔵 Phase 4: Connected Intelligence (2026 Trend)
+
+**Goal:** Integrate the predictive model into a Generative AI and full automation workflow.
+
+### Agentic Workflows
+* **AI-Agent Reporting:** Implement an agent (LLM) that takes technical output (prediction + SHAP) and generates a narrative summary for the user in Streamlit.
+* **Automated Retraining Loop:** Simulate a workflow where, upon detecting severe drift, the system automatically triggers a GitHub Actions job to retrain the model.
+
+### Infrastructure & Deployment
+* **Cloud-Native Deployment:** Final deployment on scalable infrastructure (Kubernetes or Serverless Containers) with integrated health monitoring.
+    * **Container Orchestration:** Use `Docker Compose` for local multi-service orchestration (API + UI) and prepare for production-grade scaling.
+    * **AWS Ecosystem (Optional):** Leverage **AWS App Runner** or **ECS (Elastic Container Service)** for managed container execution, ensuring high availability and auto-scaling without manual server management.
+    * **Infrastructure as Code (IaC):** Define environment configurations through code (e.g., `render.yaml` or Docker specifications) to ensure environment parity between development and production.
+    * **Automated CI/CD:** Establish a "Quality Gate" pipeline where code is only deployed if all tests (Pytest), linting (Ruff), and type checks (MyPy) pass, ensuring a zero-downtime, reliable deployment flow.