Skip to content

SAADAT-Abu/GIMP

Repository files navigation

GIMP: Genomic Imprinting Methylation Patterns

Update universe R Bioconductor License

GIMP Logo

GIMP (Genomic Imprinting Methylation Patterns) is an R package designed for the comprehensive analysis of Imprinting Control Regions (ICRs) from methylation array data. It provides a complete pipeline for extracting imprinted CpGs (iCpGs), computing coverage, and analyzing ICRs at both probe and sample-specific levels.

🆕 NEW in v0.2.0: GIMP now has a shinyApp that supports direct processing of raw IDATs, CSV/EXCEL files or GEO dataset, making it a one-stop solution for genomic imprinting analysis!

Features

Core Capabilities

  • Triple Input Support: Process preprocessed methylation data, raw IDAT files, or GEO datasets
  • Automated GEO Integration: Direct download and processing of GEO methylation datasets
  • Specialized ICR Analysis: Focus on imprinting control regions with curated coordinates
  • Interactive Visualizations: Comprehensive Shiny app with plotly integration
  • Multiple Array Support: 450k, EPIC v1, and EPIC v2 arrays
  • Quality Control: Automatic QC for IDAT data with detailed reporting
  • Flexible Analysis: Beta values, delta-beta, and defect matrix visualizations

Unique Imprinting Features

  • ICR-specific coordinates: Based on Joshi et al. 2016
  • Defect matrix analysis: SD-based detection of imprinting disorders
  • Specialized heatmaps: Designed for imprinting pattern visualization
  • Interactive region explorer: Detailed methylation profiles across ICRs
  • Clinical interpretation: Tools for analyzing imprinting disorders

Installation

Standard Installation

Option 1: From r-universe (Recommended)

# Install from r-universe (pre-compiled binaries)
install.packages("GIMP", repos = c("https://saadat-abu.r-universe.dev", "https://cloud.r-project.org"))

# Load the package
library(GIMP)

Option 2: From GitHub (Latest Development)

# Install devtools if you don't have it
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Install GIMP from GitHub
devtools::install_github("SAADAT-Abu/GIMP")

# Load the package
library(GIMP)

For IDAT Processing (Additional Requirements)

# Install Bioconductor packages for IDAT support
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

# Install required annotation packages
BiocManager::install(c(
  "minfi",
  "IlluminaHumanMethylation450kanno.ilmn12.hg19",
  "IlluminaHumanMethylationEPICanno.ilm10b4.hg19",
  "IlluminaHumanMethylationEPICv2anno.20a1.hg38"
))

# Test IDAT functionality
test_idat_functionality()

Quick Start

Option 1: Using the Shiny App (Recommended for Beginners)

library(GIMP)

# Launch interactive app
GIMP_app()

# For large IDAT files, increase upload limit
GIMP_app(max_upload_size_mb = 1000)  # 1GB limit

Application Interface

GIMP Shiny App - Main Interface

Figure 1: Main interface of the GIMP Shiny App

GIMP Shiny App - Data Upload

Figure 2: Data upload and processing interface

GIMP Shiny App - ICR Analysis

Figure 3: ICR analysis and visualization

GIMP Shiny App - Results

Figure 4: Interactive results and heatmap visualization

Option 2: Using GEO Datasets (NEW!)

library(GIMP)

# Method 1: Auto-process a GEO dataset (with auto-detection)
geo_data <- process_geo_dataset(
  geo_id = "GSE68777",
  normalize_method = "quantile",
  n_cores = 4
)

# Method 2: Validate first, then process
validation <- validate_geo_dataset("GSE68777")
if (validation$valid && validation$has_idats) {
  geo_data <- process_geo_dataset("GSE68777")
}

# Method 3: User-guided processing with custom group mappings
pheno_preview <- get_geo_phenotype_data("GSE68777")
head(pheno_preview$pheno_data)  # Review available columns

# Define your group mappings
group_mappings <- list(
  "tumor" = "Case",
  "normal" = "Control", 
  "adjacent_normal" = "Control",
  "unknown" = "Exclude"
)

geo_data <- process_geo_with_mappings(
  geo_id = "GSE68777",
  group_column = "characteristics_ch1.1",
  group_mappings = group_mappings,
  max_samples = 50
)

# Continue with standard GIMP analysis
ICRcpg <- make_cpgs(Bmatrix = geo_data$beta_matrix, bedmeth = "v1")
df_ICR <- make_ICRs(Bmatrix = geo_data$beta_matrix, bedmeth = "v1")
ICRs_heatmap(df_ICR, sampleInfo = geo_data$sample_groups, plot_type = "beta")

Option 3: Command Line Usage

For Preprocessed Data

library(GIMP)

# Load your beta value matrix
df <- readRDS("methylation_data.rds")

# Standard GIMP workflow
ICRcpg <- make_cpgs(Bmatrix = df, bedmeth = "v1")
df_ICR <- make_ICRs(Bmatrix = df, bedmeth = "v1")

# Generate sample information
sampleInfo <- c(rep("Control", 10), rep("Case", 10))

# Create heatmap
ICRs_heatmap(df_ICR, sampleInfo = sampleInfo, plot_type = "beta")

# Differential analysis
dmps <- iDMPs(data = ICRcpg, sampleInfo = sampleInfo)

For Raw IDAT Files

library(GIMP)

# Process IDAT files from ZIP archive
idat_data <- read_idat_zip(
  zip_file = "methylation_data.zip",
  array_type = "EPIC",
  normalize_method = "quantile"
)

# Process with parallel processing (recommended for large datasets)
idat_data <- read_idat_zip(
  zip_file = "methylation_data.zip",
  array_type = "EPIC",
  normalize_method = "quantile",
  n_cores = 4  # Use 4 CPU cores for faster processing
)

# Extract processed data
beta_matrix <- idat_data$beta_matrix
sample_info <- idat_data$sample_info

# Continue with standard GIMP workflow
ICRcpg <- make_cpgs(Bmatrix = beta_matrix, bedmeth = "v1")
# ... rest of analysis

IDAT File Processing Guide

Required File Structure

Your ZIP file must contain:

methylation_data.zip
├── SampleID1_ChipID_Position_Red.idat
├── SampleID1_ChipID_Position_Grn.idat
├── SampleID2_ChipID_Position_Red.idat
├── SampleID2_ChipID_Position_Grn.idat
├── ...
└── samplesheet.csv

Sample Sheet Requirements

Required Columns

  • Sample_Name: Unique identifier for each sample
  • Sentrix_ID: Chip/slide identifier (e.g., "200123456789")
  • Sentrix_Position: Array position on chip (e.g., "R01C01")

Optional Columns

  • Sample_Group: For automatic group assignment ("Control", "Case")
  • Sample_Plate: Plate information
  • Sample_Well: Well position

Sample Sheet Example

Sample_Name,Sentrix_ID,Sentrix_Position,Sample_Group
Control_01,200123456789,R01C01,Control
Control_02,200123456789,R02C01,Control
Control_03,200123456789,R03C01,Control
Case_01,200123456790,R01C01,Case
Case_02,200123456790,R02C01,Case
Case_03,200123456790,R03C01,Case

Creating a Sample Sheet Template

# Generate template with your sample information
template <- create_sample_sheet_template(
  sample_names = c("Sample_1", "Sample_2", "Sample_3", "Sample_4"),
  sentrix_ids = c("200123456789", "200123456789", "200123456790", "200123456790"),
  sentrix_positions = c("R01C01", "R02C01", "R01C01", "R02C01"),
  groups = c("Control", "Control", "Case", "Case")
)

# Save template
write.csv(template, "samplesheet.csv", row.names = FALSE)

Common IDAT Issues and Solutions

Issue: "Missing IDAT files for samples"

Solution: Check that IDAT file names match your sample sheet

# Diagnose your ZIP file structure
diagnose_idat_structure("your_file.zip")

# Generate sample sheet from actual IDAT files
new_sheet <- generate_samplesheet_from_idats("your_file.zip")

Issue: "Maximum upload size exceeded"

Solution: Increase upload limit

# In Shiny app
GIMP_app(max_upload_size_mb = 1000)  # 1GB

# Or in R console before launching
options(shiny.maxRequestSize = 1000*1024^2)
GIMP_app()

Issue: File size too large

Solutions:

  1. Compress your ZIP file with maximum compression
  2. Remove unnecessary files from the ZIP
  3. Process in smaller batches
  4. Use command line instead of Shiny app

Supported Array Types

Array Type GIMP Parameter Genome Build Typical File Size
450k "450k" hg19 50-150MB (10-50 samples)
EPIC v1 "EPIC" or "v1" hg19 100-500MB (10-50 samples)
EPIC v2 "EPICv2" or "v2" hg38 200-800MB (10-50 samples)

Analysis Workflows

Standard Workflow

  1. Data Upload

    • Upload processed data (CSV/RDS) OR raw IDAT files (ZIP)
    • Assign sample groups (Control vs Case)
  2. CpG Coverage Analysis

    • Visualize probe coverage across ICRs
    • Assess data quality and completeness
  3. ICR Heatmap Analysis

    • Generate methylation heatmaps
    • Choose from beta, delta-beta, or defect matrix views
  4. Differential Methylation

    • Identify significantly different positions
    • Generate volcano plots and summary statistics
  5. Region Explorer

    • Detailed visualization of specific ICRs
    • Interactive plots with DMP highlighting

Analysis Functions

Core Functions

  • make_cpgs(): Extract ICR CpG sites
  • make_ICRs(): Create ICR-level methylation matrix
  • ICRs_heatmap(): Generate ICR heatmaps
  • iDMPs(): Identify differentially methylated positions
  • plot_line_region(): Visualize specific ICR regions

IDAT Functions

  • read_idat_zip(): Process IDAT files from ZIP
  • diagnose_idat_structure(): Analyze ZIP file contents
  • generate_samplesheet_from_idats(): Auto-generate sample sheets
  • test_idat_functionality(): Verify IDAT processing setup

GEO Integration Functions (NEW!)

  • validate_geo_dataset(): Check if GEO dataset has IDAT files
  • process_geo_dataset(): Auto-process GEO dataset with group detection
  • process_geo_with_mappings(): Process GEO dataset with custom group mappings
  • get_geo_phenotype_data(): Preview GEO phenotypic data for group selection
  • diagnose_geo_dataset(): Detailed analysis of GEO dataset structure

Utility Functions

  • plot_cpgs_coverage(): Visualize CpG coverage
  • create_sample_sheet_template(): Generate sample sheet templates
  • GIMP_app(): Launch Shiny application

Advanced Usage

Custom Analysis Parameters

# Advanced heatmap with defect detection
ICRs_heatmap(
  df_ICR = icr_data,
  sampleInfo = sample_groups,
  plot_type = "defect",
  sd_threshold = 2.5,  # More sensitive detection
  order_by = "meth"    # Cluster by methylation patterns
)

# Sensitive DMP detection
dmps <- iDMPs(
  data = ICRcpg,
  sampleInfo = sample_groups,
  pValueCutoff = 0.01  # More stringent threshold
)

Working with Large Datasets

# For large IDAT files
idat_data <- read_idat_zip(
  zip_file = "large_dataset.zip",
  normalize_method = "funnorm",  # Faster normalization
  detection_pval = 0.05,         # Less stringent QC
  remove_failed_samples = TRUE
)

# Batch processing
process_batch <- function(zip_files) {
  results <- list()
  for (zip_file in zip_files) {
    results[[basename(zip_file)]] <- read_idat_zip(zip_file)
  }
  return(results)
}

Troubleshooting

Installation Issues

# Check GIMP installation
library(GIMP)

# Test IDAT functionality
test_idat_functionality()

# Fix common issues
fix_minfi_installation()

# Check system requirements
check_minfi_functions()

Common Error Solutions

"minfi functions not found"

# Reinstall minfi and dependencies
BiocManager::install("minfi", force = TRUE)
.rs.restartR()  # Restart R session

"No ICRs found"

  • Check that your array type matches your data
  • Verify probe IDs are in the correct format
  • Try different bedmeth parameters

"Dimension mismatch errors"

  • Check that sample information matches data dimensions
  • Verify no missing values in critical columns
  • Use diagnose_idat_structure() for IDAT files

"GEO dataset not suitable"

# Diagnose GEO dataset issues
diag <- diagnose_geo_dataset("GSE12345")
print(diag$summary)

# Check if dataset has IDAT files
validation <- validate_geo_dataset("GSE12345")
if (!validation$has_idats) {
  message("Dataset contains only processed data, not raw IDAT files")
}

"GEO group detection failed"

# Preview phenotypic data
pheno_preview <- get_geo_phenotype_data("GSE12345")
head(pheno_preview$pheno_data)

# Use custom mappings
group_mappings <- list("case" = "Case", "control" = "Control")
geo_data <- process_geo_with_mappings("GSE12345", "group_column", group_mappings)

Data Sources

ICR Coordinates

GIMP uses curated ICR coordinates from:

  • Joshi et al. (2016): "Detailed annotation of human Imprinting Control Regions" (DOI)
  • Coordinates available for both hg19 and hg38 genome builds

Compatible Data Sources

  • Illumina methylation arrays: 450k, EPIC v1, EPIC v2
  • GEO datasets: Automated download and processing of public datasets with IDAT files
  • Preprocessed data: From other methylation analysis pipelines (CSV, RDS, Excel)
  • Clinical samples: Hospital/research institution data
  • Raw IDAT files: ZIP archives from array service providers

Contributing

We welcome contributions! Please:

  1. Report bugs: Use GitHub issues with detailed error messages
  2. Suggest features: Describe your use case and proposed functionality
  3. Submit code: Follow R package development best practices
  4. Share datasets: Help us test with diverse methylation data
**Last Updated**: August 2025  
**Version**: 0.2.0  
**Contact**: francesco.cecerengs@gmail.com  
**GitHub**: [https://github.com/ngsFC/GIMP](https://github.com/ngsFC/GIMP)  
**Contact**: saadatabu1996@gmail.com  
**GitHub**: [https://github.com/SAADAT-Abu](https://github.com/SAADAT-Abu)

Support

Getting Help

  • GitHub Issues: For bug reports and feature requests
  • Documentation: ?function_name for detailed help
  • Vignettes: Comprehensive usage examples

Acknowledgments

The GIMP package is maintained by Abu Saadat, with contributions from Francesco Cecere. We gratefully acknowledge:

  • Bioconductor community for methylation analysis infrastructure
  • minfi developers for IDAT processing capabilities
  • GEOquery developers for seamless GEO data access
  • shinyepico for inspiration on user-friendly methylation analysis

License

GIMP is released under the MIT License. See LICENSE file for details.


Keywords: DNA methylation, genomic imprinting, IDAT processing, Illumina arrays, ICR analysis, R package, Bioconductor, shiny application

About

Imprinted regions methylation analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages