Skip to content

Commit 23aaa7b

Browse files
Revise README for clarity and updated information
Updated personal introduction, technical skills, research interests, and project details in README.
1 parent a924bdf commit 23aaa7b

File tree

1 file changed

+66
-77
lines changed

1 file changed

+66
-77
lines changed

README.md

Lines changed: 66 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,34 @@
1-
## Hi there, I’m Diogo Ribeiro 👋
2-
**Senior Data Scientist • Mathematician • based between the United Kingdom and Portugal**
1+
## Hi there, I’m Diogo Ribeiro
2+
3+
Senior Data Scientist • Mathematician • based between the United Kingdom and Portugal
34

45
> “Knowledge is knowing a tomato is a fruit; wisdom is not putting it in a fruit salad.”
56
> — Miles Kington
67
7-
<p align="center">
8-
<a href="https://user-badge.committers.top/portugal_private/DiogoRibeiro7">
9-
<img src="https://user-badge.committers.top/portugal_private/DiogoRibeiro7.svg" alt="committers.top badge"/>
10-
</a>
11-
</p>
8+
[![Committers Top](https://user-badge.committers.top/portugal/diogoribeiro7.svg)](https://user-badge.committers.top/portugal/diogoribeiro7)
129

13-
I build production systems that turn messy data into decisions. Two decades across logistics, health, and engineering taught me the value of lean models, clean code, and reproducible pipelines. Lately I’ve been shipping NLP and statistical modelling that helps teams reason about text and time series in real time.
10+
I build production systems that turn messy data into decisions.
11+
Across logistics, health and engineering I’ve focused on lean models, clean code, and reproducible pipelines.
12+
Lately I’ve been shipping **sensor analytics, survival data tools, drift / anomaly detection**, and **LLM-powered reporting** that help teams reason about time series and text in real time.
1413

15-
<p align="center">
16-
<img src="data_has_a_better_idea.png"
17-
alt="Poster with the phrase 'Data has a better idea'"
18-
title="Data has a better idea"
19-
width="75%" />
20-
</p>
14+
![Data has a better idea](data_has_a_better_idea.png)
2115

2216
---
2317

2418
## 🧠 Areas of Expertise
2519

2620
- **Machine Learning**
2721
Supervised & unsupervised learning, anomaly detection, time-series forecasting, optimisation.
22+
2823
- **Graph & Network Analysis**
29-
Social/interaction networks, graph theory, dynamic metrics, community structure.
24+
Social / interaction networks, graph theory, dynamic metrics, community structure.
25+
3026
- **Big Data Analytics**
3127
Pattern discovery in marketing, logistics, and urban systems (structured & unstructured data).
28+
3229
- **Mathematical Modelling**
3330
Differential equations, statistical inference, numerical methods for complex systems.
31+
3432
- **Sustainability & Urban Systems**
3533
Energy optimisation, smart environments, traffic prediction.
3634

@@ -39,105 +37,96 @@ I build production systems that turn messy data into decisions. Two decades acro
3937
## 🛠️ Technical Skills
4038

4139
- **Programming** — Python (typed, NumPy-first), SQL, R, TypeScript, Bash/Zsh, C, Fortran
42-
- **ML / Data** — NumPy, Pandas, Polars, FireDucks; scikit-learn, XGBoost/LightGBM; PyTorch, TensorFlow; Statsmodels
43-
_Focus:_ time series, anomaly detection, GLMs/IRLS, robust statistics
44-
- **Data Eng & Streaming** — Apache Kafka, Flink, Spark, Databricks; Arrow/Parquet; Apache Iceberg (lakehouse)
40+
- **ML / Data** — NumPy, Pandas, Polars, FireDucks; scikit-learn, XGBoost / LightGBM; PyTorch, TensorFlow; statsmodels
41+
Focus: **time series**, **anomaly detection**, **GLMs / IRLS**, **robust statistics**, **survival / event-time data**
42+
- **Data Eng & Streaming** — Apache Kafka, Flink, Spark, Databricks; Arrow / Parquet; Apache Iceberg (lakehouse)
4543
- **Cloud & Storage** — AWS S3, DynamoDB; PostgreSQL, MySQL, SQLite; MongoDB, InfluxDB
4644
- **DevEx & CI/CD** — Docker; GitHub Actions, Jenkins; Poetry; pre-commit (ruff, mypy, pytest-cov); semantic versioning
4745
- **Testing & Quality** — pytest, coverage, property-based tests (hypothesis); static typing; security linting (bandit)
4846

4947
---
5048

51-
## 🔭 Research Interests
49+
## 🔍 Research Interests
5250

53-
- **Health Data Science** — real-time analytics from wearables/sensors, personalised baselines, clinical interpretability
54-
- **Graph Theory & Social Networks** — interaction graphs, diffusion/contagion models, community & role discovery
51+
- **Health Data Science** — real-time analytics from wearables / sensors, personalised baselines, clinical interpretability
52+
- **Graph Theory & Social Networks** — interaction graphs, diffusion / contagion models, community & role discovery
5553
- **Big Data & Marketing Analytics** — uplift modelling, sequence-aware attribution, lifetime value with drift control
5654
- **Sustainability & Energy Systems** — demand forecasting, optimisation under constraints, carbon-aware scheduling
5755
- **Smart Environments & Sensor Networks** — multimodal fusion (RSSI + activations), localisation, reliability modelling
5856
- **Behavioural & Labour Economics** — micro-behavioural patterns, incentive effects, heterogeneity and fairness
59-
- **Inequality & Sustainable Development** — distributional metrics, policy simulation, causal and counterfactual analysis
57+
- **Inequality & Sustainable Development** — distributional metrics, policy simulation, causal and counterfactual analysis
6058

61-
> **Now:** real-time anomaly detection; Bayesian filtering/HMMs for indoor localisation; robust regression & GLMs (IRLS); LLM-assisted reporting with audit trails; **abx-next** (modern A/B experimentation utilities).
59+
> Now: real-time anomaly detection; Bayesian filtering / HMMs for indoor localisation; robust regression & GLMs (IRLS); LLM-assisted reporting with audit trails; survival data generators and drift-aware evaluation.
6260
6361
---
6462

65-
## 📌 Pinned Projects
63+
## 📌 Pinned Projects (selection)
6664

67-
- **abx-next** — A/B experimentation utilities: CUPED/CUPAC hooks, triggered analysis, SRM guardrails, switchback helpers, and power simulations.
68-
👉 [repo](https://github.com/DiogoRibeiro7/abx-next)
65+
- **abx-next** — A/B experimentation utilities: CUPED / CUPAC hooks, triggered analysis, SRM guardrails, switchback helpers, and power simulations.
66+
- **genSurvPy** — Survival-data generators (AFT / CPHM, censored data), reproducible simulations, and validation utilities.
67+
- **smart-todo-action** — GitHub Action that extracts TODOs, groups by semantic labels / tags / metadata, and opens issues / changelogs.
68+
- **navier-stokes-solvers** — CFD solvers for the 2D / 3D Navier–Stokes equations (finite-difference & spectral variants), with buildable CLI targets and basic tests.
69+
- **heavytails** — Utilities for heavy-tailed modelling and inference (tail index estimation, Pareto-like fits, EVT diagnostics).
6970

70-
- **genSurvPy** — Survival-data generators (AFT/CPHM, censored data), reproducible simulations, and validation utilities.
71-
👉 [repo](https://github.com/DiogoRibeiro7/genSurvPy)
71+
*(I also work on outlier detection, volatility, genetic algorithms, and drift libraries in other repos — some still private.)*
7272

73-
- **smart-todo-action** — GitHub Action that extracts TODOs, groups by semantic labels/tags/metadata, and opens issues/changelogs.
74-
👉 [repo](https://github.com/DiogoRibeiro7/smart-todo-action)
73+
---
7574

76-
- **navier-stokes-solvers** — CFD solvers for the 2D/3D Navier–Stokes equations (finite-difference & spectral variants), with buildable CLI targets and basic tests.
77-
👉 [repo](https://github.com/DiogoRibeiro7/navier-stokes-solvers)
75+
## 🎓 Publications / Teaching
7876

79-
- **heavytails** — Utilities for heavy-tailed modelling and inference (tail index estimation, Pareto-like fits, EVT diagnostics).
80-
👉 [repo](https://github.com/DiogoRibeiro7/heavytails)
77+
### Teaching @ESMAD
8178

82-
---
79+
- **Introduction to Logic & Set Theory** (First Semester, 15 weeks)
80+
Logic (prop / FO), sets, induction, differential & integral calculus; notes + LaTeX.
8381

84-
## 📚 Publications / Teaching
82+
- **Linear Algebra** (Second Semester, 15 weeks)
83+
Vector spaces and linear maps; matrices and determinants; eigenvalues / eigenvectors, diagonalisation; orthogonality, projections, Gram–Schmidt; least squares; SVD and PCA; numerical stability & conditioning; applications to optimisation and data science.
84+
Syllabus: link · Slides (Beamer): link
8585

86-
### Teaching @ESMAD
87-
- **Introduction to Logic & Set Theory (First Semester, 15 weeks)** — Logic (prop/FO), sets, induction, **differential & integral calculus**; notes + LaTeX.
88-
- **Linear Algebra (Second Semester, 15 weeks)** — Vector spaces and linear maps; matrices and determinants; eigenvalues/eigenvectors, diagonalisation; orthogonality, projections, Gram–Schmidt; least squares; **SVD and PCA**; numerical stability & conditioning; applications to optimisation and data science.
89-
Syllabus: _link_ · Slides (Beamer): _link_
90-
- **NLP & LLM mini-workshops** — Prompt design, evals, lightweight retrieval, and report generation with structured → narrative transforms.
86+
- **NLP & LLM mini-workshops**
87+
Prompt design, evals, lightweight retrieval, and report generation with structured → narrative transforms.
9188

9289
### Seminars & Workshops
93-
- **Data Science Seminars** — End-to-end ML pipelines, feature engineering for time series, evaluation under drift, MLOps (CI/CD, data/versioning), and reproducible research practices.
94-
Slides: _link_ · Notebooks: _link_
95-
- **Sensors & Dashboards** — IoT data ingestion (MQTT/Kafka), time-series storage (InfluxDB/Parquet), streaming analytics (Flink), and dashboards (Grafana/Plotly/Dash) with alerting & anomaly detection.
96-
Slides: _link_ · Demo repo: _link_
97-
- **Applications of Matrices to Computational Graphics** — Linear transforms in 2D/3D, homogeneous coordinates, rotations (Euler vs. quaternions), camera models & projections, shading basics; **SVD/PCA** for geometry processing.
98-
Slides: _link_ · Code samples: _link_
90+
91+
- **Data Science Seminars**
92+
End-to-end ML pipelines, feature engineering for time series, evaluation under drift, MLOps (CI/CD, data / versioning), and reproducible research practices.
93+
Slides: link · Notebooks: link
94+
95+
- **Sensors & Dashboards**
96+
IoT data ingestion (MQTT / Kafka), time-series storage (InfluxDB / Parquet), streaming analytics (Flink), and dashboards (Grafana / Plotly / Dash) with alerting & anomaly detection.
97+
Slides: link · Demo repo: link
98+
99+
- **Applications of Matrices to Computational Graphics**
100+
Linear transforms in 2D / 3D, homogeneous coordinates, rotations (Euler vs. quaternions), camera models & projections, shading basics; SVD / PCA for geometry processing.
101+
Slides: link · Code samples: link
99102

100103
### Selected Writings / Demos
101-
- **Streaming analytics with Iceberg + Flink + DynamoDB** — Architecture notes and example pipelines.
102-
- **Robust regression with IRLS** — ψ-functions, influence diagnostics, and uncertainty reporting.
103-
- **Time-series anomaly detection** — EWMA variants, adaptive σ, and change-point alerts for sensors.
104+
105+
- Streaming analytics with Iceberg + Flink + DynamoDB — architecture notes and example pipelines.
106+
- Robust regression with IRLS — ψ-functions, influence diagnostics, and uncertainty reporting.
107+
- Time-series anomaly detection — EWMA variants, adaptive σ, and change-point alerts for sensors.
104108

105109
---
106110

107-
## 🌟 Highlights
111+
## Highlights
108112

109113
- Interdisciplinary approach spanning computer science, mathematics, economics, and natural sciences.
110-
- Practical projects in **IoT**, automation, and environmental monitoring (Raspberry Pi + sensors).
111-
- Ongoing work in ML for time series, anomaly detection, and robust statistical modelling.
114+
- Practical projects in IoT, automation, and environmental monitoring (Raspberry Pi + sensors).
115+
- Ongoing work in ML for time series, anomaly detection, survival analysis, and robust statistical modelling.
112116

113117
---
114118

115-
## 📊 GitHub Stats
119+
## 📊 GitHub Stats & Trophies
120+
121+
[![trophy](https://github-profile-trophy.vercel.app/?username=DiogoRibeiro7&theme=onedark&margin-w=5&margin-h=5)](https://github.com/ryo-ma/github-profile-trophy)
116122

117-
<div align="center">
118-
<a href="https://github.com/ryo-ma/github-profile-trophy">
119-
<img src="https://stable-github-profile-trophy.vercel.app/?username=DiogoRibeiro7&column=3&no-frame=true&theme=algolia" alt="Trophy" />
120-
</a>
121-
</div>
123+
![GitHub Metrics](./github-metrics.svg)
122124

123125
---
124126

125-
## 📈 Let’s Connect and Collaborate
126-
127-
Thanks for visiting! I’m keen to partner with data enthusiasts, researchers, and product teams. Browse my projects or get in touch—happy to explore ideas and build useful things together.
128-
129-
<div align="center">
130-
<a href="https://medium.com/@neverforget-1975">
131-
<img src="https://img.shields.io/badge/Medium-12100E?style=for-the-badge&logo=medium&logoColor=white" alt="Medium" />
132-
</a>
133-
<a href="https://dev.to/diogoribeiro7">
134-
<img src="https://img.shields.io/badge/dev.to-0A0A0A?style=for-the-badge&logo=dev.to&logoColor=white" alt="Dev.to" />
135-
</a>
136-
<a href="https://www.linkedin.com/in/diogo-ribeiro-9094604a/">
137-
<img src="https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn" />
138-
</a>
139-
<a href="mailto:[email protected]">
140-
<img src="https://img.shields.io/badge/Gmail-D14836?logo=gmail&logoColor=white" alt="Email">
141-
</a>
142-
</div>
127+
## 🤝 Let’s Connect and Collaborate
128+
129+
Thanks for visiting! I’m keen to partner with data enthusiasts, researchers, and product teams.
130+
Browse my projects or get in touch — happy to explore ideas and build useful things together.
143131

132+
[Medium](https://medium.com) · [Dev.to](https://dev.to) · [LinkedIn](https://www.linkedin.com/in/diogo-ribeiro-9094604a) · ✉️

0 commit comments

Comments
 (0)