Skip to content
View mdabrarfaiyaj's full-sized avatar

Block or report mdabrarfaiyaj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mdabrarfaiyaj/README.md

Hi, I'm Md Abrar Faiyaj 👋

Graduate Student in Biotechnology | Computational Biologist & Bioinformatics Analyst
Building reproducible pipelines and interactive tools for genomics and clinical data analysis

ORCID Email LinkedIn


🧬 About Me

I am a passionate computational biologist who combines strong biological understanding with robust data science and programming skills. In the last two months, I have significantly advanced my expertise by completing three major projects — including an end-to-end reproducible RNA-Seq pipeline from raw FASTQ files and clinical patient data analysis.

I specialize in turning complex biological and clinical data into actionable insights through interactive dashboards, reproducible workflows, and clear scientific communication. My work spans infectious disease genomics, immunology, and healthcare data analysis.

Currently open to research positions, internships, and full-time roles in Bioinformatics, Computational Biology, and Healthcare Data Analysis.


🛠️ Technical Skills

Bioinformatics & Genomics

  • Advanced R & Bioconductor (DESeq2, apeglm, org.Mm.eg.db, AnnotationDbi, pheatmap)
  • RNA-Seq analysis (raw FASTQ → alignment → quantification → differential expression)
  • Interactive dashboard development (Shiny + shinydashboard)
  • Genomic data retrieval (GEO, SRA, NCBI)

Data Science & Programming

  • Python (Pandas, NumPy, Matplotlib, Seaborn)
  • SQL & Database querying (SQLite)
  • Data visualization (ggplot2, plotly, Seaborn)
  • Reproducible research (renv, session_info, set.seed, RDS objects)
  • Tools: Galaxy, STAR, featureCounts, Git/GitHub, Posit Cloud, WSL

🌟 Featured Projects

1. RNA-Seq DESeq2 Pipeline + Shiny Dashboard — V2 (True Raw Counts)
→ Repository
Rebuilt the entire analysis from raw SRA FASTQ files (HISAT2 alignment + featureCounts) for a high-impact Immunity (2020) paper on IL-10 receptor-deficient microglia. Applied modern DESeq2 with apeglm shrinkage.
Key Achievements:

  • Identified 1,597 DEGs with clean mutant/control separation (PC1 = 79.7% variance)
  • Fully reproducible pipeline with three layers of reproducibility
  • Deployed interactive Shiny dashboard (volcano plots, PCA, heatmaps, searchable table, CSV download) on Posit Cloud

2. RNA-Seq DESeq2 Pipeline + Shiny Dashboard — V1
→ Same Repository
Initial reproduction of Figure 3E from Shemer et al., Immunity 2020, despite missing raw counts on GEO. Built and publicly deployed a production-grade interactive dashboard.
Impact:

  • Reproduced key findings (1,563 DEGs) and validated core biological markers
  • Generated 6,300+ LinkedIn impressions and engagement from researchers at Illumina, Novartis, Pfizer, and GSK
  • Transparently documented GEO data availability issues affecting the bioinformatics community

The V1 → V2 progression demonstrates my commitment to iterative scientific improvement and methodological rigor.

3. Patient Data Analysis — Heart Disease Clinical Dataset
→ Repository
Exploratory analysis of 1,025 patient records to identify risk factors for heart disease diagnosis.
Key Achievements:

  • Combined Python + SQL workflow (8 standalone SQL queries on SQLite)
  • Built a six-panel diagnostic visualization dashboard
  • Identified maximum heart rate as the strongest differentiator and challenged assumptions around cholesterol levels
  • Produced a fully documented, readable Jupyter Notebook with interpretation

Previous Work

  • Bangladesh Dengue Genomic Surveillance (Shiny Dashboard)
  • Yeast Genome Composition Analysis
  • Palmer Penguins Reproducible Data Science

Explore all repositories


📚 Research & Career Interests

  • RNA-Seq & Multi-omics Data Analysis
  • Infectious Disease Genomics & Variant Surveillance
  • Clinical & Healthcare Data Analysis
  • Development of Reproducible Bioinformatics Tools
  • Computational Biology & Public Health

📫 Let's Connect

Open to collaborations, thesis/internship opportunities, or full-time roles. Happy to discuss how my recent RNA-Seq pipelines or clinical data experience can add value to your team.


⭐️ Thank you for visiting! My recent projects reflect continuous growth in delivering production-quality, reproducible bioinformatics and data analysis work. Feel free to explore the repositories and reach out anytime.

Last updated: May 2026

Pinned Loading

  1. patient-data-analysis patient-data-analysis Public

    Exploratory data analysis of 1,025 clinical patient records to identify risk factors associated with heart disease using Python, SQL, and data visualization.

    Jupyter Notebook

  2. rna-seq-shiny-pipeline rna-seq-shiny-pipeline Public

    RNA-Seq Differential Expression Analysis with DESeq2 + Interactive Shiny Dashboard

    R 2 1

  3. bangladeshi-dengue-strain-tracker bangladeshi-dengue-strain-tracker Public

    Interactive R Shiny dashboard analyzing 13 real DENV-2 sequences from the 2023 Bangladesh dengue outbreak (Dhaka & Chattogram isolates). Includes QC filtering, motif detection, modern visualization…

    R 1

  4. dengue-variant-tracker-prototype dengue-variant-tracker-prototype Public

    Prototype version: General dengue motif & variant tracker using reference strains (NC_001477.1 Dengue Virus 1, complete genome, NC_001474.2, Dengue Virus 2,complete genome)

    R 1

  5. 01-penguins-biological-data-analysis 01-penguins-biological-data-analysis Public

    Morphometric analysis of Palmer Penguins demonstrating data cleaning, statistical testing, and visualization in R

    R