Skip to content

Commit f6f2bc2

Browse files
Revise README for better presentation and content
Updated README to enhance clarity and formatting.
1 parent 99ee348 commit f6f2bc2

File tree

1 file changed

+77
-66
lines changed

1 file changed

+77
-66
lines changed

README.md

Lines changed: 77 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,36 @@
1-
## Hi there, I’m Diogo Ribeiro
2-
3-
Senior Data Scientist • Mathematician • based between the United Kingdom and Portugal
1+
## Hi there, I’m Diogo Ribeiro 👋
2+
**Senior Data Scientist • Mathematician • based between the United Kingdom and Portugal**
43

54
> “Knowledge is knowing a tomato is a fruit; wisdom is not putting it in a fruit salad.”
65
> — Miles Kington
76
8-
[![Committers Top](https://user-badge.committers.top/portugal/diogoribeiro7.svg)](https://user-badge.committers.top/portugal/diogoribeiro7)
7+
<p align="center">
8+
<a href="https://user-badge.committers.top/portugal_private/DiogoRibeiro7">
9+
<img src="https://user-badge.committers.top/portugal_private/DiogoRibeiro7.svg" alt="committers.top badge"/>
10+
</a>
11+
</p>
912

10-
I build production systems that turn messy data into decisions.
11-
Across logistics, health and engineering I’ve focused on lean models, clean code, and reproducible pipelines.
12-
Lately I’ve been shipping **sensor analytics, survival data tools, drift / anomaly detection**, and **LLM-powered reporting** that help teams reason about time series and text in real time.
13+
I build production systems that turn messy data into decisions. Two decades across logistics, health, and engineering taught me the value of lean models, clean code, and reproducible pipelines. Lately I’ve been shipping NLP and statistical modelling that helps teams reason about text and time series in real time.
1314

14-
![Data has a better idea](data_has_a_better_idea.png)
15+
<p align="center">
16+
<img src="data_has_a_better_idea.png"
17+
alt="Poster with the phrase 'Data has a better idea'"
18+
title="Data has a better idea"
19+
width="75%" />
20+
</p>
1521

1622
---
1723

1824
## 🧠 Areas of Expertise
1925

2026
- **Machine Learning**
2127
Supervised & unsupervised learning, anomaly detection, time-series forecasting, optimisation.
22-
2328
- **Graph & Network Analysis**
24-
Social / interaction networks, graph theory, dynamic metrics, community structure.
25-
29+
Social/interaction networks, graph theory, dynamic metrics, community structure.
2630
- **Big Data Analytics**
2731
Pattern discovery in marketing, logistics, and urban systems (structured & unstructured data).
28-
2932
- **Mathematical Modelling**
3033
Differential equations, statistical inference, numerical methods for complex systems.
31-
3234
- **Sustainability & Urban Systems**
3335
Energy optimisation, smart environments, traffic prediction.
3436

@@ -37,96 +39,105 @@ Lately I’ve been shipping **sensor analytics, survival data tools, drift / ano
3739
## 🛠️ Technical Skills
3840

3941
- **Programming** — Python (typed, NumPy-first), SQL, R, TypeScript, Bash/Zsh, C, Fortran
40-
- **ML / Data** — NumPy, Pandas, Polars, FireDucks; scikit-learn, XGBoost / LightGBM; PyTorch, TensorFlow; statsmodels
41-
Focus: **time series**, **anomaly detection**, **GLMs / IRLS**, **robust statistics**, **survival / event-time data**
42-
- **Data Eng & Streaming** — Apache Kafka, Flink, Spark, Databricks; Arrow / Parquet; Apache Iceberg (lakehouse)
42+
- **ML / Data** — NumPy, Pandas, Polars, FireDucks; scikit-learn, XGBoost/LightGBM; PyTorch, TensorFlow; Statsmodels
43+
_Focus:_ time series, anomaly detection, GLMs/IRLS, robust statistics
44+
- **Data Eng & Streaming** — Apache Kafka, Flink, Spark, Databricks; Arrow/Parquet; Apache Iceberg (lakehouse)
4345
- **Cloud & Storage** — AWS S3, DynamoDB; PostgreSQL, MySQL, SQLite; MongoDB, InfluxDB
4446
- **DevEx & CI/CD** — Docker; GitHub Actions, Jenkins; Poetry; pre-commit (ruff, mypy, pytest-cov); semantic versioning
4547
- **Testing & Quality** — pytest, coverage, property-based tests (hypothesis); static typing; security linting (bandit)
4648

4749
---
4850

49-
## 🔍 Research Interests
51+
## 🔭 Research Interests
5052

51-
- **Health Data Science** — real-time analytics from wearables / sensors, personalised baselines, clinical interpretability
52-
- **Graph Theory & Social Networks** — interaction graphs, diffusion / contagion models, community & role discovery
53+
- **Health Data Science** — real-time analytics from wearables/sensors, personalised baselines, clinical interpretability
54+
- **Graph Theory & Social Networks** — interaction graphs, diffusion/contagion models, community & role discovery
5355
- **Big Data & Marketing Analytics** — uplift modelling, sequence-aware attribution, lifetime value with drift control
5456
- **Sustainability & Energy Systems** — demand forecasting, optimisation under constraints, carbon-aware scheduling
5557
- **Smart Environments & Sensor Networks** — multimodal fusion (RSSI + activations), localisation, reliability modelling
5658
- **Behavioural & Labour Economics** — micro-behavioural patterns, incentive effects, heterogeneity and fairness
57-
- **Inequality & Sustainable Development** — distributional metrics, policy simulation, causal and counterfactual analysis
59+
- **Inequality & Sustainable Development** — distributional metrics, policy simulation, causal and counterfactual analysis
5860

59-
> Now: real-time anomaly detection; Bayesian filtering / HMMs for indoor localisation; robust regression & GLMs (IRLS); LLM-assisted reporting with audit trails; survival data generators and drift-aware evaluation.
61+
> **Now:** real-time anomaly detection; Bayesian filtering/HMMs for indoor localisation; robust regression & GLMs (IRLS); LLM-assisted reporting with audit trails; **abx-next** (modern A/B experimentation utilities).
6062
6163
---
6264

63-
## 📌 Pinned Projects (selection)
65+
## 📌 Pinned Projects
6466

65-
- **abx-next** — A/B experimentation utilities: CUPED / CUPAC hooks, triggered analysis, SRM guardrails, switchback helpers, and power simulations.
66-
- **genSurvPy** — Survival-data generators (AFT / CPHM, censored data), reproducible simulations, and validation utilities.
67-
- **smart-todo-action** — GitHub Action that extracts TODOs, groups by semantic labels / tags / metadata, and opens issues / changelogs.
68-
- **navier-stokes-solvers** — CFD solvers for the 2D / 3D Navier–Stokes equations (finite-difference & spectral variants), with buildable CLI targets and basic tests.
69-
- **heavytails** — Utilities for heavy-tailed modelling and inference (tail index estimation, Pareto-like fits, EVT diagnostics).
67+
- **abx-next** — A/B experimentation utilities: CUPED/CUPAC hooks, triggered analysis, SRM guardrails, switchback helpers, and power simulations.
68+
👉 [repo](https://github.com/DiogoRibeiro7/abx-next)
7069

71-
*(I also work on outlier detection, volatility, genetic algorithms, and drift libraries in other repos — some still private.)*
70+
- **genSurvPy** — Survival-data generators (AFT/CPHM, censored data), reproducible simulations, and validation utilities.
71+
👉 [repo](https://github.com/DiogoRibeiro7/genSurvPy)
7272

73-
---
73+
- **smart-todo-action** — GitHub Action that extracts TODOs, groups by semantic labels/tags/metadata, and opens issues/changelogs.
74+
👉 [repo](https://github.com/DiogoRibeiro7/smart-todo-action)
7475

75-
## 🎓 Publications / Teaching
76+
- **navier-stokes-solvers** — CFD solvers for the 2D/3D Navier–Stokes equations (finite-difference & spectral variants), with buildable CLI targets and basic tests.
77+
👉 [repo](https://github.com/DiogoRibeiro7/navier-stokes-solvers)
7678

77-
### Teaching @ESMAD
79+
- **heavytails** — Utilities for heavy-tailed modelling and inference (tail index estimation, Pareto-like fits, EVT diagnostics).
80+
👉 [repo](https://github.com/DiogoRibeiro7/heavytails)
7881

79-
- **Introduction to Logic & Set Theory** (First Semester, 15 weeks)
80-
Logic (prop / FO), sets, induction, differential & integral calculus; notes + LaTeX.
82+
---
8183

82-
- **Linear Algebra** (Second Semester, 15 weeks)
83-
Vector spaces and linear maps; matrices and determinants; eigenvalues / eigenvectors, diagonalisation; orthogonality, projections, Gram–Schmidt; least squares; SVD and PCA; numerical stability & conditioning; applications to optimisation and data science.
84-
Syllabus: link · Slides (Beamer): link
84+
## 📚 Publications / Teaching
8585

86-
- **NLP & LLM mini-workshops**
87-
Prompt design, evals, lightweight retrieval, and report generation with structured → narrative transforms.
86+
### Teaching @ESMAD
87+
- **Introduction to Logic & Set Theory (First Semester, 15 weeks)** — Logic (prop/FO), sets, induction, **differential & integral calculus**; notes + LaTeX.
88+
- **Linear Algebra (Second Semester, 15 weeks)** — Vector spaces and linear maps; matrices and determinants; eigenvalues/eigenvectors, diagonalisation; orthogonality, projections, Gram–Schmidt; least squares; **SVD and PCA**; numerical stability & conditioning; applications to optimisation and data science.
89+
Syllabus: _link_ · Slides (Beamer): _link_
90+
- **NLP & LLM mini-workshops** — Prompt design, evals, lightweight retrieval, and report generation with structured → narrative transforms.
8891

8992
### Seminars & Workshops
90-
91-
- **Data Science Seminars**
92-
End-to-end ML pipelines, feature engineering for time series, evaluation under drift, MLOps (CI/CD, data / versioning), and reproducible research practices.
93-
Slides: link · Notebooks: link
94-
95-
- **Sensors & Dashboards**
96-
IoT data ingestion (MQTT / Kafka), time-series storage (InfluxDB / Parquet), streaming analytics (Flink), and dashboards (Grafana / Plotly / Dash) with alerting & anomaly detection.
97-
Slides: link · Demo repo: link
98-
99-
- **Applications of Matrices to Computational Graphics**
100-
Linear transforms in 2D / 3D, homogeneous coordinates, rotations (Euler vs. quaternions), camera models & projections, shading basics; SVD / PCA for geometry processing.
101-
Slides: link · Code samples: link
93+
- **Data Science Seminars** — End-to-end ML pipelines, feature engineering for time series, evaluation under drift, MLOps (CI/CD, data/versioning), and reproducible research practices.
94+
Slides: _link_ · Notebooks: _link_
95+
- **Sensors & Dashboards** — IoT data ingestion (MQTT/Kafka), time-series storage (InfluxDB/Parquet), streaming analytics (Flink), and dashboards (Grafana/Plotly/Dash) with alerting & anomaly detection.
96+
Slides: _link_ · Demo repo: _link_
97+
- **Applications of Matrices to Computational Graphics** — Linear transforms in 2D/3D, homogeneous coordinates, rotations (Euler vs. quaternions), camera models & projections, shading basics; **SVD/PCA** for geometry processing.
98+
Slides: _link_ · Code samples: _link_
10299

103100
### Selected Writings / Demos
104-
105-
- Streaming analytics with Iceberg + Flink + DynamoDB — architecture notes and example pipelines.
106-
- Robust regression with IRLS — ψ-functions, influence diagnostics, and uncertainty reporting.
107-
- Time-series anomaly detection — EWMA variants, adaptive σ, and change-point alerts for sensors.
101+
- **Streaming analytics with Iceberg + Flink + DynamoDB** — Architecture notes and example pipelines.
102+
- **Robust regression with IRLS** — ψ-functions, influence diagnostics, and uncertainty reporting.
103+
- **Time-series anomaly detection** — EWMA variants, adaptive σ, and change-point alerts for sensors.
108104

109105
---
110106

111-
## Highlights
107+
## 🌟 Highlights
112108

113109
- Interdisciplinary approach spanning computer science, mathematics, economics, and natural sciences.
114-
- Practical projects in IoT, automation, and environmental monitoring (Raspberry Pi + sensors).
115-
- Ongoing work in ML for time series, anomaly detection, survival analysis, and robust statistical modelling.
110+
- Practical projects in **IoT**, automation, and environmental monitoring (Raspberry Pi + sensors).
111+
- Ongoing work in ML for time series, anomaly detection, and robust statistical modelling.
116112

117113
---
118114

119-
## 📊 GitHub Stats & Trophies
120-
121-
[![trophy](https://github-profile-trophy.vercel.app/?username=DiogoRibeiro7&theme=onedark&margin-w=5&margin-h=5)](https://github.com/ryo-ma/github-profile-trophy)
115+
## 📊 GitHub Stats
122116

123-
![GitHub Metrics](./github-metrics.svg)
117+
<div align="center">
118+
<a href="https://github.com/ryo-ma/github-profile-trophy">
119+
<img src="https://stable-github-profile-trophy.vercel.app/?username=DiogoRibeiro7&column=3&no-frame=true&theme=algolia" alt="Trophy" />
120+
</a>
121+
</div>
124122

125123
---
126124

127-
## 🤝 Let’s Connect and Collaborate
128-
129-
Thanks for visiting! I’m keen to partner with data enthusiasts, researchers, and product teams.
130-
Browse my projects or get in touch — happy to explore ideas and build useful things together.
125+
## 📈 Let’s Connect and Collaborate
126+
127+
Thanks for visiting! I’m keen to partner with data enthusiasts, researchers, and product teams. Browse my projects or get in touch—happy to explore ideas and build useful things together.
128+
129+
<div align="center">
130+
<a href="https://medium.com/@neverforget-1975">
131+
<img src="https://img.shields.io/badge/Medium-12100E?style=for-the-badge&logo=medium&logoColor=white" alt="Medium" />
132+
</a>
133+
<a href="https://dev.to/diogoribeiro7">
134+
<img src="https://img.shields.io/badge/dev.to-0A0A0A?style=for-the-badge&logo=dev.to&logoColor=white" alt="Dev.to" />
135+
</a>
136+
<a href="https://www.linkedin.com/in/diogo-ribeiro-9094604a/">
137+
<img src="https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn" />
138+
</a>
139+
<a href="mailto:[email protected]">
140+
<img src="https://img.shields.io/badge/Gmail-D14836?logo=gmail&logoColor=white" alt="Email">
141+
</a>
142+
</div>
131143

132-
[Medium](https://medium.com) · [Dev.to](https://dev.to) · [LinkedIn](https://www.linkedin.com/in/diogo-ribeiro-9094604a) · ✉️

0 commit comments

Comments
 (0)