Skip to content
View SimonYip22's full-sized avatar
🌎
Working from home
🌎
Working from home

Highlights

  • Pro

Block or report SimonYip22

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SimonYip22/README.md

Simon Yip 葉詠倫

MBBS and Applied Machine Learning Engineer building end-to-end clinical ML systems across time-series modelling and clinical NLP using large-scale medical datasets.

Developed pipelines with core emphasis on clinical transparency and model interpretability, using PyTorch, Scikit-Learn, Hugging Face, and Google Cloud Run deployment with GitHub Actions versioning.

Currently contributing to R&D workflows using GKE-based cloud infrastructure for production-scale clinical NLP system at RadNomics.

Featured Projects

Clinical Entity Extraction-Validation System

Python | PyTorch | Hugging Face | scikit-learn | pandas | FastAPI | Docker | Google Cloud Run | GitHub Actions

  • Cloud-deployed hybrid clinical NLP system combining rule-based extraction and transformer validation to generate structured entity outputs from ICU progress notes for downstream analysis and ML workflows
  • Implemented regex-based extraction schemas for recall-focused extraction of 3 clinical entity types; fine-tuned BioClinicalBERT classifier on 1000+ manually annotated entities for precision-oriented validation layer
  • Extracted 780,000+ structured entities from filtered adult ICU corpus of 160,000+ notes (30,000+ stays)
  • Transformer validation achieved +45.9% in precision and −83.3% in false positives relative to rule-only baseline
  • Deployed inference pipeline as stateless, containerised API on Google Cloud Run; versioned via GitHub Actions

Access Live API | View Repository | https://doi.org/10.5281/zenodo.20018309

Time-Series ICU Patient Deterioration Predictor

Python | PyTorch | LightGBM | scikit-learn | pandas | NumPy

  • Dual-architecture ICU early warning system combining Temporal CNN (TCN) and LightGBM to predict NEWS2-derived deterioration outcomes across 3 clinical risk dimensions
  • Clincally validated data preprocessing included CO2 retainer logic, GCS mapping, and supplemental O2 protocols
  • Engineered 171 timestamp-level features (8 vital parameters; 96-hour windows) and 40 aggregated patient-level features from 70,000+ time-series observations across 140 ICU stays
  • TCN achieved +9.3% AUC improvement for acute-event detection; LightGBM achieved −68% Brier score and −48% RMSE for prolonged risk exposure
  • Implemented SHAP and saliency mapping for clinican-interpretable feature insights

View Repository | https://doi.org/10.5281/zenodo.18487174

Experience

Applied Machine Learning Engineer @ RadNomics Ltd

  • Contributed to R&D workflows for production clinical NLP system within GCP/GKE cloud infrastructure
  • Processed 2.3M+ radiology reports involving medical data cleaning, preprocessing, and feature engineering
  • Implemented report augmentation pipeline, generating 17M+ report pairs for downstream LLM modelling and evaluation workflows
  • Used containerised remote development environments, Kubernetes pods, and Git-based collaboration under senior engineering supervision

Technical Stack

  • Machine Learning: PyTorch, Scikit-learn, LightGBM, Hugging Face Transformers, NLP
  • Cloud / Infra: Google Cloud Platform (GKE, Cloud Run), Kubernetes, Docker
  • Software Engineering: Python (OOP), FastAPI, Git/GitHub, CI/CD (GitHub Actions)
  • Data: Pandas, NumPy, SQL

Education

  • MSc, Computer Science with Artificial Intelligence @ City St George’s, University of London
    • Relevant Modules: Machine Learning, Artificial Intelligence, Cloud Computing, Software Engineering, Databases, Big Data Analytics and Visualisation
    • Final Project: Evaluating Retrieval-Augmented Generation for Radiology Impression Drafting Using Clinically Significant Error Metrics
  • MBBS, Medicine @ Norwich Medical School, University of East Anglia

Archives

Clinical Experience

MBBS Medical Student @ Norwich Medical School, University of East Anglia

Audit & Research Assistant @ Norfolk and Norwich University Hospital

Research

  • Lacertus syndrome and its surgical management using WALANT - our first 12 cases (Research Poster)
  • Giant trichoblastic carcinoma initially misdiagnosed as basal cell carcinoma (Case Report)

Audit Cycles

  • Head and Neck Surgery, Integrated Care Pathway Surgical Proforma Audit
  • Plastic Surgery, Free Flap Surgical Outcomes Audit

Healthcare Data Skills

  • Clinical Informatics: EHR Systems (ICE, SystmOne, MediViewer, EPMA), HL7-FHIR, NEWS2, GDPR
  • Clinical Research: Audit Methodology, Literature Review, Critical Appraisal, Manuscript Preparation

Pinned Loading

  1. Clinical-Entity-Extraction-Validation-System Clinical-Entity-Extraction-Validation-System Public

    Precision-first clinical NLP extraction-validation system converting ICU progress notes to structured entity outputs using rule-based regex extraction and BioClinicalBERT encoder validation

    Jupyter Notebook 2

  2. Time-Series-ICU-Patient-Deterioration-Predictor Time-Series-ICU-Patient-Deterioration-Predictor Public

    Early ICU deterioration detection system combining LightGBM and Temporal CCN (TCN) for multi-dimensional clinical risk modeling

    Python 2