Skip to content

Latest commit

 

History

History
232 lines (165 loc) · 8.14 KB

File metadata and controls

232 lines (165 loc) · 8.14 KB
# Troubleshooting Guide

This document provides solutions for common issues encountered when running the neoantigen pipeline.

## Installation Issues

### Conda Environment Creation Fails

**Problem**: Error when creating conda environments.

**Solution**:
- Ensure you have a stable internet connection
- Update conda: `conda update -n base conda`
- Check for conflicting channels in your `.condarc` file
- Try installing dependencies one by one to identify problematic packages
- Use `conda clean --all` to clear package caches before retrying

### Missing Dependencies After Installation

**Problem**: The dependency check script reports missing tools despite installation.

**Solution**:
- Ensure you've activated the right conda environment: `conda activate environment_name`
- Check if the tool is in your PATH: `which tool_name`
- Verify paths in config.sh match your actual installation locations
- Some tools may require manual installation steps after conda installation
- Reinstall problematic tools manually outside of conda if necessary

## Data Preprocessing Issues

### TrimGalore or FastQC Fails

**Problem**: Error during trimming or quality control.

**Solution**:
- Verify TrimGalore/FastQC installation: `which trim_galore`, `which fastqc`
- Check input FASTQ file integrity: `gzip -t your_file.fastq.gz`
- Ensure you have sufficient disk space for output files
- Check if input file paths in your commands are correct
- Try running with verbose flags for more detailed error messages

### Out of Memory During Preprocessing

**Problem**: Process killed due to insufficient memory.

**Solution**:
- Reduce number of threads used for processing
- Process samples one at a time rather than in parallel
- Use a machine with more RAM or configure your cluster job to request more memory
- Split large FASTQ files into smaller chunks and process separately

## Alignment Issues

### BWA Alignment Errors

**Problem**: BWA fails with error messages.

**Solution**:
- Verify reference genome path and index files exist: `ls -la /path/to/reference.*`
- Ensure the index files match your reference genome version
- Check if the reference genome was properly indexed: `bwa index reference.fa`
- Examine error messages carefully for clues (e.g., specific read errors)

### Low Alignment Rate

**Problem**: Very few reads aligning to the reference.

**Solution**:
- Check that you're using the correct reference genome version
- Verify sample species matches reference genome species
- Run FastQC to inspect read quality and contamination
- Look for adapter contamination or poor quality bases
- Consider using more sensitive alignment parameters

### Samtools/Picard Errors

**Problem**: Errors during SAM/BAM processing.

**Solution**:
- Verify input BAM/SAM files exist and aren't corrupted
- Check if you have the latest samtools/picard versions
- Ensure you have write permissions to output directories
- Try with increased memory allocation for Picard: `java -Xmx8g -jar picard.jar`
- Check for unsorted input when tools expect sorted files

## Variant Calling Issues

### Mutect2 Fails

**Problem**: GATK Mutect2 exits with errors.

**Solution**:
- Check if input BAM files are properly sorted and indexed
- Verify that read groups in BAM files match those in your command
- Increase Java heap size: `--java-options "-Xmx16g"`
- Ensure reference genome is properly formatted and indexed
- Check for problematic regions that might cause errors (e.g., excessive depth)

### No Variants Detected

**Problem**: Empty or nearly empty VCF files after variant calling.

**Solution**:
- Verify tumor purity (low purity can affect variant detection)
- Check coverage depth in your BAM files: `samtools depth`
- Ensure tumor and normal samples are correctly specified
- Try less stringent filtering parameters
- Manually examine BAM files in IGV to verify variants are present

### VCF Processing Errors

**Problem**: Errors during VCF filtering or manipulation.

**Solution**:
- Check if input VCF is properly formatted: `bcftools check input.vcf`
- Ensure VCF is sorted correctly for the reference genome
- Try bcftools to normalize variants: `bcftools norm`
- If compressed, ensure index files exist: `tabix -p vcf input.vcf.gz`

## Annotation Issues

### VEP Fails

**Problem**: Error during VEP annotation.

**Solution**:
- Verify VEP cache is properly installed and accessible
- Check VEP version compatibility with cache version
- Ensure you have all required plugins installed
- Try offline mode if online database access is problematic
- Check VCF format compatibility with VEP
- Increase memory allocation for VEP

### Missing or Incomplete Annotations

**Problem**: Annotations are incomplete or missing key information.

**Solution**:
- Verify you're using the `--everything` flag with VEP for comprehensive annotation
- Check if the correct plugins are specified and installed
- Ensure reference genome version matches your annotation database version
- Try updating to the latest VEP cache
- Check if variants fall in regions covered by the annotation database

## Neoantigen Prediction Issues

### pVACseq Fails

**Problem**: pVACseq errors out during prediction.

**Solution**:
- Verify VEP annotation format is correct and includes all required fields
- Check HLA format (should be HLA-A*02:01, not A*02:01 or HLA-A02:01)
- Ensure IEDB tools are properly installed and accessible
- Check Python version compatibility with pVACtools
- Verify all necessary prediction algorithms are installed

### No Neoantigens Predicted

**Problem**: Pipeline completes but no neoantigens are identified.

**Solution**:
- Check if somatic variants were detected (prerequisite for neoantigens)
- Verify binding threshold isn't too stringent (try increasing from 500nM to 1000nM)
- Ensure mutations lead to protein changes (missense, frameshift, etc.)
- Check HLA types are correct and supported by the prediction algorithms
- Look for errors in intermediate files in the pVACseq output directory

## HLA Typing Issues

### OptiType Fails

**Problem**: OptiType fails to complete HLA typing.

**Solution**:
- Check if input FASTQ files exist and have sufficient coverage
- Verify razers3 is properly installed and working
- Ensure you have the OptiType reference data available
- Try with different coverage or enumeration parameters
- Check Python version compatibility (OptiType typically requires Python 2.7)

### Inconsistent HLA Types

**Problem**: Different HLA typing tools give different results.

**Solution**:
- This is normal - different tools have different sensitivities
- Use high-coverage data for more accurate typing
- Consider running multiple HLA typing tools and taking a consensus
- Focus on tools with highest accuracy for your data type (WGS, WES, RNA-seq)

## Configuration and System Issues

### Path and Environment Issues

**Problem**: Tools not found despite being installed.

**Solution**:
- Check PATH environment variable: `echo $PATH`
- Ensure conda environment is activated
- Verify config.sh has correct paths to all tools and resources
- Use absolute paths instead of relative paths in commands
- Check file permissions for executables: `chmod +x script.sh`

### Resource Limitations

**Problem**: Jobs fail due to memory or time limits.

**Solution**:
- For cluster jobs, increase requested resources: `-l mem=32gb,walltime=48:00:00`
- Monitor resource usage to identify bottlenecks: `top`, `htop`, or cluster monitoring
- Split large jobs into smaller subtasks
- Consider using SSD storage for temporary files to reduce I/O bottlenecks
- Reduce thread count for memory-intensive steps

## Getting Help

If you encounter issues not covered in this guide:

1. **Check Logs**: Examine log files in the `logs/` directory for specific error messages
2. **Documentation**: Refer to the original tool documentation for specific error codes
3. **Search Online**: Many bioinformatics errors have been encountered by others
4. **GitHub Issues**: Open an issue on our GitHub repository with:
   - Description of the problem
   - Relevant log snippets
   - Your config.sh (with sensitive information removed)
   - Software versions: `conda list` output
   - System information: OS version, CPU, RAM
5. **Contact Developers**: For urgent issues, contact the pipeline maintainers directly