This project focuses on the development of chip-based cancer detection via surface glycoRNA markers derived from both cancer cell lines and extracellular vesicles (EVs).
- 📡 Data Source: Raw ncRNA sequencing data from CD Genomics, USA
- 🧪 Institute: Australian Institute for Bioengineering and Nanotechnology (AIBN), University of Queensland (UQ)
- 🔍 Objective: Discover and validate glycoRNA markers for cancer detection using RNA-seq, bioinformatics pipelines, and machine learning models
# Step 1: Quality Control
fastqc *.fastq.gz
# Step 2: Trimming
trimmomatic PE input_1.fastq.gz input_2.fastq.gz ...
# Step 3: Alignment
STAR --runThreadN 8 --genomeDir path/to/genome --readFilesIn input_1.fastq.gz input_2.fastq.gz
# Step 4: Counting
featureCounts -a annotation.gtf -o counts.txt aligned_reads.bamlibrary(DESeq2)
library(edgeR)
library(ggplot2)
library(pheatmap)
library(clusterProfiler)
library(org.Hs.eg.db)
library(ncRNAtools)
library(RNAcentral)Comparative Analysis (ADMSC & BMMSC vs Cancer Cell Lines)
| Cancer Type | Cell Line(s) |
|---|---|
| Cervical | HeLa |
| Breast | MCF7 (Epithelial), MDA-MB-231 (Mesenchymal) |
| Lung | A549, H1975 |
| Controls | ADMSC, BMMSC |
-
Volcano Plots
- Annotated with Y-RNAs and U-RNAs
- Comparisons:
- ADMSC vs HeLa
- ADMSC vs MCF7
- ADMSC vs MDA-MB-231
- BMMSC vs HeLa
- BMMSC vs MCF7
- BMMSC vs MDA-MB-231
- ADMSC vs A549
- ADMSC vs H1975
- BMMSC vs A549
- BMMSC vs H1975
-
Heatmaps
- All Y-RNAs and U-RNAs across all cell lines
- Metrics: Log2 Fold Change, Z-score Expression
-
Correlation Plots
- Expression correlation across Y-RNA signatures
🔬 Focus: Identification of glycoRNA markers in EVs secreted by cancer cell lines.
- Extracted EV-specific DEGs
- Annotated with
ncRNAtoolsandRNAcentral - Identified diagnostic glycoRNAs shared across multiple cancer types
📌 EVs highlighted as a promising source of non-invasive biomarkers
🧬 Epithelial–Mesenchymal Transition (EMT)
Focuses on progression from primary tumor to metastatic state in breast cancer.
- MCF7 – Epithelial breast cancer cell line
- MDA-MB-231 – Mesenchymal/metastatic breast cancer cell line
- TM6 – Treated MCF7 (6-day treatment, induced EMT)
-
Y-RNA and U-RNA in:
- TM6 vs ADMSC
- TM6 vs BMMSC
-
Marker Signature Comparisons:
- MDA-MB-231 vs MCF7
- TM6 vs MCF7
- TM6 vs MDA-MB-231
-
Correlation Analysis:
- TM6 vs MDA-MB-231 to validate metastatic similarity
🚨 EMT is a critical transition step in cancer metastasis, and glycoRNAs here serve as potential metastasis indicators.
- Identify cancer-type specific markers
- Detect common markers across all cancer types
- Distinguish metastatic (EMT) from non-metastatic profiles
- Models: Random Forest, SVM, Logistic Regression
- Features: DE glycoRNAs (Y-RNA, U-RNA)
- Training: Stratified 5-Fold Cross-Validation
- Evaluation Metrics: AUROC, AUPRC, Accuracy, F1-score
- Ranked glycoRNA markers per cancer type
- Heatmaps, PR/ROC Curves, Confusion Matrices
- Designed primers for top-ranked markers (Y-RNA, U-RNA)
- Validated expression in patient tissue samples
- Correlated with model-predicted marker rankings
| Category | Tools/Packages |
|---|---|
| Sequencing QC | FastQC, Trimmomatic |
| Alignment | STAR |
| Quantification | featureCounts |
| DE Analysis | DESeq2, edgeR |
| ML Modeling | scikit-learn, caret (R), custom pipelines |
| Annotation | ncRNAtools, RNAcentral |
| Visualization | ggplot2, pheatmap, EnhancedVolcano |
2022_MPXV_Project/
│
├── data/ # FASTQ, BAM, and Count files
├── results/
│ ├── chapter1/ # Cell-line analysis outputs
│ ├── chapter2/ # EV-specific markers
│ ├── chapter3/ # EMT and metastasis analysis
│
├── figure/
│ └── git_readme/ # Logos (AIBN, CD Genomics, UQ)
│
├── scripts/ # All preprocessing, analysis, ML scripts
├── qPCR_primers/ # Primer sequences for validation
├── README.md # ← You are here
This project was developed under collaboration between:
- Australian Institute for Bioengineering and Nanotechnology (AIBN)
- University of Queensland (UQ)
For questions or collaborations, please contact:
📧 [[email protected]]


