Skip to content

Prokash21/sncRNA-UQ-Australia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Cell-Surface glycoRNA-Based Cancer Detection

         


📌 Project Summary

This project focuses on the development of chip-based cancer detection via surface glycoRNA markers derived from both cancer cell lines and extracellular vesicles (EVs).

  • 📡 Data Source: Raw ncRNA sequencing data from CD Genomics, USA
  • 🧪 Institute: Australian Institute for Bioengineering and Nanotechnology (AIBN), University of Queensland (UQ)
  • 🔍 Objective: Discover and validate glycoRNA markers for cancer detection using RNA-seq, bioinformatics pipelines, and machine learning models

🔬 Workflow Overview

🔄 RNA-Seq Processing Pipeline

# Step 1: Quality Control
fastqc *.fastq.gz

# Step 2: Trimming
trimmomatic PE input_1.fastq.gz input_2.fastq.gz ...

# Step 3: Alignment
STAR --runThreadN 8 --genomeDir path/to/genome --readFilesIn input_1.fastq.gz input_2.fastq.gz

# Step 4: Counting
featureCounts -a annotation.gtf -o counts.txt aligned_reads.bam

📦 R Packages

library(DESeq2)
library(edgeR)
library(ggplot2)
library(pheatmap)
library(clusterProfiler)
library(org.Hs.eg.db)
library(ncRNAtools)
library(RNAcentral)

📚 Project Chapters

📁 CHAPTER 01: Cancer Cell Line Analysis

Comparative Analysis (ADMSC & BMMSC vs Cancer Cell Lines)

🔬 Cancer Cell Lines:

Cancer Type Cell Line(s)
Cervical HeLa
Breast MCF7 (Epithelial), MDA-MB-231 (Mesenchymal)
Lung A549, H1975
Controls ADMSC, BMMSC

📊 Analysis Performed:

  1. Volcano Plots

    • Annotated with Y-RNAs and U-RNAs
    • Comparisons:
      • ADMSC vs HeLa
      • ADMSC vs MCF7
      • ADMSC vs MDA-MB-231
      • BMMSC vs HeLa
      • BMMSC vs MCF7
      • BMMSC vs MDA-MB-231
      • ADMSC vs A549
      • ADMSC vs H1975
      • BMMSC vs A549
      • BMMSC vs H1975
  2. Heatmaps

    • All Y-RNAs and U-RNAs across all cell lines
    • Metrics: Log2 Fold Change, Z-score Expression
  3. Correlation Plots

    • Expression correlation across Y-RNA signatures

📁 CHAPTER 02: Extracellular Vesicles (EVs)

🔬 Focus: Identification of glycoRNA markers in EVs secreted by cancer cell lines.

  • Extracted EV-specific DEGs
  • Annotated with ncRNAtools and RNAcentral
  • Identified diagnostic glycoRNAs shared across multiple cancer types

📌 EVs highlighted as a promising source of non-invasive biomarkers


📁 CHAPTER 03: Metastasis & EMT Transition

🧬 Epithelial–Mesenchymal Transition (EMT)
Focuses on progression from primary tumor to metastatic state in breast cancer.

🧫 Cell Lines:

  • MCF7 – Epithelial breast cancer cell line
  • MDA-MB-231 – Mesenchymal/metastatic breast cancer cell line
  • TM6 – Treated MCF7 (6-day treatment, induced EMT)

🧪 Comparison:

  1. Y-RNA and U-RNA in:

    • TM6 vs ADMSC
    • TM6 vs BMMSC
  2. Marker Signature Comparisons:

    • MDA-MB-231 vs MCF7
    • TM6 vs MCF7
    • TM6 vs MDA-MB-231
  3. Correlation Analysis:

    • TM6 vs MDA-MB-231 to validate metastatic similarity

🚨 EMT is a critical transition step in cancer metastasis, and glycoRNAs here serve as potential metastasis indicators.


🤖 Machine Learning Pipeline

🧠 Goals:

  • Identify cancer-type specific markers
  • Detect common markers across all cancer types
  • Distinguish metastatic (EMT) from non-metastatic profiles

📈 Framework:

  • Models: Random Forest, SVM, Logistic Regression
  • Features: DE glycoRNAs (Y-RNA, U-RNA)
  • Training: Stratified 5-Fold Cross-Validation
  • Evaluation Metrics: AUROC, AUPRC, Accuracy, F1-score

🧬 Output:

  • Ranked glycoRNA markers per cancer type
  • Heatmaps, PR/ROC Curves, Confusion Matrices

🧪 qRT-PCR Validation

  • Designed primers for top-ranked markers (Y-RNA, U-RNA)
  • Validated expression in patient tissue samples
  • Correlated with model-predicted marker rankings

📎 Tools & Technologies

Category Tools/Packages
Sequencing QC FastQC, Trimmomatic
Alignment STAR
Quantification featureCounts
DE Analysis DESeq2, edgeR
ML Modeling scikit-learn, caret (R), custom pipelines
Annotation ncRNAtools, RNAcentral
Visualization ggplot2, pheatmap, EnhancedVolcano

📂 Folder Structure

2022_MPXV_Project/
│
├── data/                  # FASTQ, BAM, and Count files
├── results/
│   ├── chapter1/          # Cell-line analysis outputs
│   ├── chapter2/          # EV-specific markers
│   ├── chapter3/          # EMT and metastasis analysis
│
├── figure/
│   └── git_readme/        # Logos (AIBN, CD Genomics, UQ)
│
├── scripts/               # All preprocessing, analysis, ML scripts
├── qPCR_primers/          # Primer sequences for validation
├── README.md              # ← You are here

👨‍🔬 Acknowledgements

This project was developed under collaboration between:

  • Australian Institute for Bioengineering and Nanotechnology (AIBN)
  • University of Queensland (UQ)

🔗 Contact

For questions or collaborations, please contact:
📧 [[email protected]]

Releases

No releases published

Packages

No packages published

Languages