I'm a data scientist and software developer with a background in computer science (BSc, MSc) and a PhD in bioinformatics. I'm interested in decoding disease biology using computational approaches.
Below are some of my projects that I build to learn, prototype ideas, and explore different approaches in machine learning, bioinformatics, and software development.
-
Machine and deep learning end-to-end projects with APIs:
- Gene type prediction from DNA sequence using a Transformer encoder - ONNX Runtime inference, FastAPI + BentoML serving, Docker compose, deployed on AWS EKS (Kubernetes) 🔗 Link
- Predicting Molecular Solubility in Water via a Flask API deployed on AWS Elastic Beanstalk 🔗 Link
- Immune cell classifier trained on H&E-stained images using Xception + MLP, exported to TFLite and deployed via Docker and AWS Lambda 🔗 Link
- Diffusion-based generative modeling and inpainting of H&E-stained blood cell images, deployed via Streamlit and AWS Batch 🔗 Link
-
Machine and deep learning playground 🔗 Link:
- Classifying endometriosis using single-cell RNA-seq from menstrual effluent via generative modeling and transfer learning 🔗 Link
- Surival analysis - Multiple Myeloma data challenge
- R package for the computational reconstruction of transcription regulatory networks from high-throughput data 🔗 Link
- Autoencoders and single-cell RNA-seq data imputation
- CNNs and image classification
- VAEs to mitigate batch effects in scRNA-seq using federated learning
- VAE, transformer and semi-supervised NMF for the cell type deconvolution
- GNNs for spatial transcriptomics
- Bayesian state space models for forecasting
- Bayesian A/B Testing with a beta-binomial model for user / email / page view results
- Retrieval-Augmented Generation (RAG) applied in bioinformatics
- LLM-powered SPARQL bioinformatics assistant
- Optimizing XGBoost hyperparameters with Bayesian optimization using Hyperopt and explaining model predictions with SHAP/LIME 🔗 Link
-
Implementations of computational biology algorithms 🔗 Link:
- simulated annealing and replica exchange Monte Carlo for protein folding
- Felsenstein's tree-pruning for computing likelihood of evolutionary trees
- de Bruijn graph with eulerian walk-finder algorithm for genome assembly
-
Data engineering 🔗 Link:
- data pipelines in DuckDB
- data wrangling in Polars and Narwhals
-
Web-based:
- Lovable-built web app for real-time collaborative coding interviews deployed on Google Cloud Run 🔗 Link
- Sudoku game implemented in JavaScript and JQuery 🔗 Link
- Minesweeper game implemented in Java using SWING and AWT libraries 🔗 Link
- Django-based server for Multiple Sequence Alignment visualization 🔗 Link
- Mobile application using Django, manifesto app, and localStorage 🔗 Link
- Interactive tool in html+pyodide for finding career match 🔗 Link
- ✉️ Email: [email protected]
- 🌐 My website

