Skip to content
View katwre's full-sized avatar

Block or report katwre

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
katwre/README.md

About Me

I'm a data scientist and software developer with a background in computer science (BSc, MSc) and a PhD in bioinformatics. I'm interested in decoding disease biology using computational approaches.


Below are some of my projects that I build to learn, prototype ideas, and explore different approaches in machine learning, bioinformatics, and software development.

  • Machine and deep learning end-to-end projects with APIs:

    • Gene type prediction from DNA sequence using a Transformer encoder - ONNX Runtime inference, FastAPI + BentoML serving, Docker compose, deployed on AWS EKS (Kubernetes) 🔗 Link
    • Predicting Molecular Solubility in Water via a Flask API deployed on AWS Elastic Beanstalk 🔗 Link
    • Immune cell classifier trained on H&E-stained images using Xception + MLP, exported to TFLite and deployed via Docker and AWS Lambda 🔗 Link
    • Diffusion-based generative modeling and inpainting of H&E-stained blood cell images, deployed via Streamlit and AWS Batch 🔗 Link
  • Machine and deep learning playground 🔗 Link:

    • Classifying endometriosis using single-cell RNA-seq from menstrual effluent via generative modeling and transfer learning 🔗 Link
    • Surival analysis - Multiple Myeloma data challenge
    • R package for the computational reconstruction of transcription regulatory networks from high-throughput data 🔗 Link
    • Autoencoders and single-cell RNA-seq data imputation
    • CNNs and image classification
    • VAEs to mitigate batch effects in scRNA-seq using federated learning
    • VAE, transformer and semi-supervised NMF for the cell type deconvolution
    • GNNs for spatial transcriptomics
    • Bayesian state space models for forecasting
    • Bayesian A/B Testing with a beta-binomial model for user / email / page view results
    • Retrieval-Augmented Generation (RAG) applied in bioinformatics
    • LLM-powered SPARQL bioinformatics assistant
    • Optimizing XGBoost hyperparameters with Bayesian optimization using Hyperopt and explaining model predictions with SHAP/LIME 🔗 Link
  • Implementations of computational biology algorithms 🔗 Link:

    • simulated annealing and replica exchange Monte Carlo for protein folding
    • Felsenstein's tree-pruning for computing likelihood of evolutionary trees
    • de Bruijn graph with eulerian walk-finder algorithm for genome assembly
  • Data engineering 🔗 Link:

    • data pipelines in DuckDB
    • data wrangling in Polars and Narwhals
  • Web-based:

    • Lovable-built web app for real-time collaborative coding interviews deployed on Google Cloud Run 🔗 Link
    • Sudoku game implemented in JavaScript and JQuery 🔗 Link
    • Minesweeper game implemented in Java using SWING and AWT libraries 🔗 Link
    • Django-based server for Multiple Sequence Alignment visualization 🔗 Link
    • Mobile application using Django, manifesto app, and localStorage 🔗 Link
    • Interactive tool in html+pyodide for finding career match 🔗 Link

Connect with me via:

Pinned Loading

  1. ML-projects ML-projects Public

    ML/DL projects exploring neural architectures (incl. autoencoders, CNNs, VAEs, transformers, GNNs) applied to real bio/clinical datasets

    Jupyter Notebook 1

  2. bioinformatics-projects bioinformatics-projects Public

    Bioinformatics projects including protein folding simulations, genome assembly, DNA motif-based regulatory region discovery, and phylogenetic algorithms, Galaxy plug-in for mRNA annotation

    Python 3

  3. motifActivity motifActivity Public

    motifActivity: An R package for the computational reconstruction of transcription regulatory networks from high-throughput data

    R 2

  4. Minesweeper Minesweeper Public

    Minesweeper implemented in JAVA

    Java 2

  5. Solubility-api Solubility-api Public

    Predicting Molecular Solubility in Water via a Flask API deployed on AWS Elastic Beanstalk

    Jupyter Notebook

  6. Immune-cell-classifier-api Immune-cell-classifier-api Public

    Immune cell classifier using Xception + MLP, exported to TFLite and deployed as an AWS Lambda container image.

    Jupyter Notebook