GIMP (Genomic Imprinting Methylation Patterns) is an R package designed for the comprehensive analysis of Imprinting Control Regions (ICRs) from methylation array data. It provides a complete pipeline for extracting imprinted CpGs (iCpGs), computing coverage, and analyzing ICRs at both probe and sample-specific levels.
🆕 NEW in v0.2.0: GIMP now has a shinyApp that supports direct processing of raw IDATs, CSV/EXCEL files or GEO dataset, making it a one-stop solution for genomic imprinting analysis!
- Triple Input Support: Process preprocessed methylation data, raw IDAT files, or GEO datasets
- Automated GEO Integration: Direct download and processing of GEO methylation datasets
- Specialized ICR Analysis: Focus on imprinting control regions with curated coordinates
- Interactive Visualizations: Comprehensive Shiny app with plotly integration
- Multiple Array Support: 450k, EPIC v1, and EPIC v2 arrays
- Quality Control: Automatic QC for IDAT data with detailed reporting
- Flexible Analysis: Beta values, delta-beta, and defect matrix visualizations
- ICR-specific coordinates: Based on Joshi et al. 2016
- Defect matrix analysis: SD-based detection of imprinting disorders
- Specialized heatmaps: Designed for imprinting pattern visualization
- Interactive region explorer: Detailed methylation profiles across ICRs
- Clinical interpretation: Tools for analyzing imprinting disorders
Option 1: From r-universe (Recommended)
# Install from r-universe (pre-compiled binaries)
install.packages("GIMP", repos = c("https://saadat-abu.r-universe.dev", "https://cloud.r-project.org"))
# Load the package
library(GIMP)Option 2: From GitHub (Latest Development)
# Install devtools if you don't have it
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install GIMP from GitHub
devtools::install_github("SAADAT-Abu/GIMP")
# Load the package
library(GIMP)# Install Bioconductor packages for IDAT support
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
# Install required annotation packages
BiocManager::install(c(
"minfi",
"IlluminaHumanMethylation450kanno.ilmn12.hg19",
"IlluminaHumanMethylationEPICanno.ilm10b4.hg19",
"IlluminaHumanMethylationEPICv2anno.20a1.hg38"
))
# Test IDAT functionality
test_idat_functionality()library(GIMP)
# Launch interactive app
GIMP_app()
# For large IDAT files, increase upload limit
GIMP_app(max_upload_size_mb = 1000) # 1GB limitlibrary(GIMP)
# Method 1: Auto-process a GEO dataset (with auto-detection)
geo_data <- process_geo_dataset(
geo_id = "GSE68777",
normalize_method = "quantile",
n_cores = 4
)
# Method 2: Validate first, then process
validation <- validate_geo_dataset("GSE68777")
if (validation$valid && validation$has_idats) {
geo_data <- process_geo_dataset("GSE68777")
}
# Method 3: User-guided processing with custom group mappings
pheno_preview <- get_geo_phenotype_data("GSE68777")
head(pheno_preview$pheno_data) # Review available columns
# Define your group mappings
group_mappings <- list(
"tumor" = "Case",
"normal" = "Control",
"adjacent_normal" = "Control",
"unknown" = "Exclude"
)
geo_data <- process_geo_with_mappings(
geo_id = "GSE68777",
group_column = "characteristics_ch1.1",
group_mappings = group_mappings,
max_samples = 50
)
# Continue with standard GIMP analysis
ICRcpg <- make_cpgs(Bmatrix = geo_data$beta_matrix, bedmeth = "v1")
df_ICR <- make_ICRs(Bmatrix = geo_data$beta_matrix, bedmeth = "v1")
ICRs_heatmap(df_ICR, sampleInfo = geo_data$sample_groups, plot_type = "beta")library(GIMP)
# Load your beta value matrix
df <- readRDS("methylation_data.rds")
# Standard GIMP workflow
ICRcpg <- make_cpgs(Bmatrix = df, bedmeth = "v1")
df_ICR <- make_ICRs(Bmatrix = df, bedmeth = "v1")
# Generate sample information
sampleInfo <- c(rep("Control", 10), rep("Case", 10))
# Create heatmap
ICRs_heatmap(df_ICR, sampleInfo = sampleInfo, plot_type = "beta")
# Differential analysis
dmps <- iDMPs(data = ICRcpg, sampleInfo = sampleInfo)library(GIMP)
# Process IDAT files from ZIP archive
idat_data <- read_idat_zip(
zip_file = "methylation_data.zip",
array_type = "EPIC",
normalize_method = "quantile"
)
# Process with parallel processing (recommended for large datasets)
idat_data <- read_idat_zip(
zip_file = "methylation_data.zip",
array_type = "EPIC",
normalize_method = "quantile",
n_cores = 4 # Use 4 CPU cores for faster processing
)
# Extract processed data
beta_matrix <- idat_data$beta_matrix
sample_info <- idat_data$sample_info
# Continue with standard GIMP workflow
ICRcpg <- make_cpgs(Bmatrix = beta_matrix, bedmeth = "v1")
# ... rest of analysisYour ZIP file must contain:
methylation_data.zip
├── SampleID1_ChipID_Position_Red.idat
├── SampleID1_ChipID_Position_Grn.idat
├── SampleID2_ChipID_Position_Red.idat
├── SampleID2_ChipID_Position_Grn.idat
├── ...
└── samplesheet.csv
- Sample_Name: Unique identifier for each sample
- Sentrix_ID: Chip/slide identifier (e.g., "200123456789")
- Sentrix_Position: Array position on chip (e.g., "R01C01")
- Sample_Group: For automatic group assignment ("Control", "Case")
- Sample_Plate: Plate information
- Sample_Well: Well position
Sample_Name,Sentrix_ID,Sentrix_Position,Sample_Group
Control_01,200123456789,R01C01,Control
Control_02,200123456789,R02C01,Control
Control_03,200123456789,R03C01,Control
Case_01,200123456790,R01C01,Case
Case_02,200123456790,R02C01,Case
Case_03,200123456790,R03C01,Case# Generate template with your sample information
template <- create_sample_sheet_template(
sample_names = c("Sample_1", "Sample_2", "Sample_3", "Sample_4"),
sentrix_ids = c("200123456789", "200123456789", "200123456790", "200123456790"),
sentrix_positions = c("R01C01", "R02C01", "R01C01", "R02C01"),
groups = c("Control", "Control", "Case", "Case")
)
# Save template
write.csv(template, "samplesheet.csv", row.names = FALSE)Solution: Check that IDAT file names match your sample sheet
# Diagnose your ZIP file structure
diagnose_idat_structure("your_file.zip")
# Generate sample sheet from actual IDAT files
new_sheet <- generate_samplesheet_from_idats("your_file.zip")Solution: Increase upload limit
# In Shiny app
GIMP_app(max_upload_size_mb = 1000) # 1GB
# Or in R console before launching
options(shiny.maxRequestSize = 1000*1024^2)
GIMP_app()Solutions:
- Compress your ZIP file with maximum compression
- Remove unnecessary files from the ZIP
- Process in smaller batches
- Use command line instead of Shiny app
| Array Type | GIMP Parameter | Genome Build | Typical File Size |
|---|---|---|---|
| 450k | "450k" |
hg19 | 50-150MB (10-50 samples) |
| EPIC v1 | "EPIC" or "v1" |
hg19 | 100-500MB (10-50 samples) |
| EPIC v2 | "EPICv2" or "v2" |
hg38 | 200-800MB (10-50 samples) |
-
Data Upload
- Upload processed data (CSV/RDS) OR raw IDAT files (ZIP)
- Assign sample groups (Control vs Case)
-
CpG Coverage Analysis
- Visualize probe coverage across ICRs
- Assess data quality and completeness
-
ICR Heatmap Analysis
- Generate methylation heatmaps
- Choose from beta, delta-beta, or defect matrix views
-
Differential Methylation
- Identify significantly different positions
- Generate volcano plots and summary statistics
-
Region Explorer
- Detailed visualization of specific ICRs
- Interactive plots with DMP highlighting
make_cpgs(): Extract ICR CpG sitesmake_ICRs(): Create ICR-level methylation matrixICRs_heatmap(): Generate ICR heatmapsiDMPs(): Identify differentially methylated positionsplot_line_region(): Visualize specific ICR regions
read_idat_zip(): Process IDAT files from ZIPdiagnose_idat_structure(): Analyze ZIP file contentsgenerate_samplesheet_from_idats(): Auto-generate sample sheetstest_idat_functionality(): Verify IDAT processing setup
validate_geo_dataset(): Check if GEO dataset has IDAT filesprocess_geo_dataset(): Auto-process GEO dataset with group detectionprocess_geo_with_mappings(): Process GEO dataset with custom group mappingsget_geo_phenotype_data(): Preview GEO phenotypic data for group selectiondiagnose_geo_dataset(): Detailed analysis of GEO dataset structure
plot_cpgs_coverage(): Visualize CpG coveragecreate_sample_sheet_template(): Generate sample sheet templatesGIMP_app(): Launch Shiny application
# Advanced heatmap with defect detection
ICRs_heatmap(
df_ICR = icr_data,
sampleInfo = sample_groups,
plot_type = "defect",
sd_threshold = 2.5, # More sensitive detection
order_by = "meth" # Cluster by methylation patterns
)
# Sensitive DMP detection
dmps <- iDMPs(
data = ICRcpg,
sampleInfo = sample_groups,
pValueCutoff = 0.01 # More stringent threshold
)# For large IDAT files
idat_data <- read_idat_zip(
zip_file = "large_dataset.zip",
normalize_method = "funnorm", # Faster normalization
detection_pval = 0.05, # Less stringent QC
remove_failed_samples = TRUE
)
# Batch processing
process_batch <- function(zip_files) {
results <- list()
for (zip_file in zip_files) {
results[[basename(zip_file)]] <- read_idat_zip(zip_file)
}
return(results)
}# Check GIMP installation
library(GIMP)
# Test IDAT functionality
test_idat_functionality()
# Fix common issues
fix_minfi_installation()
# Check system requirements
check_minfi_functions()# Reinstall minfi and dependencies
BiocManager::install("minfi", force = TRUE)
.rs.restartR() # Restart R session- Check that your array type matches your data
- Verify probe IDs are in the correct format
- Try different
bedmethparameters
- Check that sample information matches data dimensions
- Verify no missing values in critical columns
- Use
diagnose_idat_structure()for IDAT files
# Diagnose GEO dataset issues
diag <- diagnose_geo_dataset("GSE12345")
print(diag$summary)
# Check if dataset has IDAT files
validation <- validate_geo_dataset("GSE12345")
if (!validation$has_idats) {
message("Dataset contains only processed data, not raw IDAT files")
}# Preview phenotypic data
pheno_preview <- get_geo_phenotype_data("GSE12345")
head(pheno_preview$pheno_data)
# Use custom mappings
group_mappings <- list("case" = "Case", "control" = "Control")
geo_data <- process_geo_with_mappings("GSE12345", "group_column", group_mappings)GIMP uses curated ICR coordinates from:
- Joshi et al. (2016): "Detailed annotation of human Imprinting Control Regions" (DOI)
- Coordinates available for both hg19 and hg38 genome builds
- Illumina methylation arrays: 450k, EPIC v1, EPIC v2
- GEO datasets: Automated download and processing of public datasets with IDAT files
- Preprocessed data: From other methylation analysis pipelines (CSV, RDS, Excel)
- Clinical samples: Hospital/research institution data
- Raw IDAT files: ZIP archives from array service providers
We welcome contributions! Please:
- Report bugs: Use GitHub issues with detailed error messages
- Suggest features: Describe your use case and proposed functionality
- Submit code: Follow R package development best practices
- Share datasets: Help us test with diverse methylation data
**Last Updated**: August 2025
**Version**: 0.2.0
**Contact**: francesco.cecerengs@gmail.com
**GitHub**: [https://github.com/ngsFC/GIMP](https://github.com/ngsFC/GIMP)
**Contact**: saadatabu1996@gmail.com
**GitHub**: [https://github.com/SAADAT-Abu](https://github.com/SAADAT-Abu)
- GitHub Issues: For bug reports and feature requests
- Documentation:
?function_namefor detailed help - Vignettes: Comprehensive usage examples
The GIMP package is maintained by Abu Saadat, with contributions from Francesco Cecere. We gratefully acknowledge:
- Bioconductor community for methylation analysis infrastructure
- minfi developers for IDAT processing capabilities
- GEOquery developers for seamless GEO data access
- shinyepico for inspiration on user-friendly methylation analysis
GIMP is released under the MIT License. See LICENSE file for details.
Keywords: DNA methylation, genomic imprinting, IDAT processing, Illumina arrays, ICR analysis, R package, Bioconductor, shiny application




