Neoantigen_detection_pipeline/docs/troubleshooting.md at master · iichelhadi/Neoantigen_detection_pipeline

# Troubleshooting Guide

This document provides solutions for common issues encountered when running the neoantigen pipeline.

## Installation Issues

### Conda Environment Creation Fails

**Problem**: Error when creating conda environments.

**Solution**:
- Ensure you have a stable internet connection
- Update conda: `conda update -n base conda`
- Check for conflicting channels in your `.condarc` file
- Try installing dependencies one by one to identify problematic packages
- Use `conda clean --all` to clear package caches before retrying

### Missing Dependencies After Installation

**Problem**: The dependency check script reports missing tools despite installation.

**Solution**:
- Ensure you've activated the right conda environment: `conda activate environment_name`
- Check if the tool is in your PATH: `which tool_name`
- Verify paths in config.sh match your actual installation locations
- Some tools may require manual installation steps after conda installation
- Reinstall problematic tools manually outside of conda if necessary

## Data Preprocessing Issues

### TrimGalore or FastQC Fails

**Problem**: Error during trimming or quality control.

**Solution**:
- Verify TrimGalore/FastQC installation: `which trim_galore`, `which fastqc`
- Check input FASTQ file integrity: `gzip -t your_file.fastq.gz`
- Ensure you have sufficient disk space for output files
- Check if input file paths in your commands are correct
- Try running with verbose flags for more detailed error messages

### Out of Memory During Preprocessing

**Problem**: Process killed due to insufficient memory.

**Solution**:
- Reduce number of threads used for processing
- Process samples one at a time rather than in parallel
- Use a machine with more RAM or configure your cluster job to request more memory
- Split large FASTQ files into smaller chunks and process separately

## Alignment Issues

### BWA Alignment Errors

**Problem**: BWA fails with error messages.

**Solution**:
- Verify reference genome path and index files exist: `ls -la /path/to/reference.*`
- Ensure the index files match your reference genome version
- Check if the reference genome was properly indexed: `bwa index reference.fa`
- Examine error messages carefully for clues (e.g., specific read errors)

### Low Alignment Rate

**Problem**: Very few reads aligning to the reference.

**Solution**:
- Check that you're using the correct reference genome version
- Verify sample species matches reference genome species
- Run FastQC to inspect read quality and contamination
- Look for adapter contamination or poor quality bases
- Consider using more sensitive alignment parameters

### Samtools/Picard Errors

**Problem**: Errors during SAM/BAM processing.

**Solution**:
- Verify input BAM/SAM files exist and aren't corrupted
- Check if you have the latest samtools/picard versions
- Ensure you have write permissions to output directories
- Try with increased memory allocation for Picard: `java -Xmx8g -jar picard.jar`
- Check for unsorted input when tools expect sorted files

## Variant Calling Issues

### Mutect2 Fails

**Problem**: GATK Mutect2 exits with errors.

**Solution**:
- Check if input BAM files are properly sorted and indexed
- Verify that read groups in BAM files match those in your command
- Increase Java heap size: `--java-options "-Xmx16g"`
- Ensure reference genome is properly formatted and indexed
- Check for problematic regions that might cause errors (e.g., excessive depth)

### No Variants Detected

**Problem**: Empty or nearly empty VCF files after variant calling.

**Solution**:
- Verify tumor purity (low purity can affect variant detection)
- Check coverage depth in your BAM files: `samtools depth`
- Ensure tumor and normal samples are correctly specified
- Try less stringent filtering parameters
- Manually examine BAM files in IGV to verify variants are present

### VCF Processing Errors

**Problem**: Errors during VCF filtering or manipulation.

**Solution**:
- Check if input VCF is properly formatted: `bcftools check input.vcf`
- Ensure VCF is sorted correctly for the reference genome
- Try bcftools to normalize variants: `bcftools norm`
- If compressed, ensure index files exist: `tabix -p vcf input.vcf.gz`

## Annotation Issues

### VEP Fails

**Problem**: Error during VEP annotation.

**Solution**:
- Verify VEP cache is properly installed and accessible
- Check VEP version compatibility with cache version
- Ensure you have all required plugins installed
- Try offline mode if online database access is problematic
- Check VCF format compatibility with VEP
- Increase memory allocation for VEP

### Missing or Incomplete Annotations

**Problem**: Annotations are incomplete or missing key information.

**Solution**:
- Verify you're using the `--everything` flag with VEP for comprehensive annotation
- Check if the correct plugins are specified and installed
- Ensure reference genome version matches your annotation database version
- Try updating to the latest VEP cache
- Check if variants fall in regions covered by the annotation database

## Neoantigen Prediction Issues

### pVACseq Fails

**Problem**: pVACseq errors out during prediction.

**Solution**:
- Verify VEP annotation format is correct and includes all required fields
- Check HLA format (should be HLA-A*02:01, not A*02:01 or HLA-A02:01)
- Ensure IEDB tools are properly installed and accessible
- Check Python version compatibility with pVACtools
- Verify all necessary prediction algorithms are installed

### No Neoantigens Predicted

**Problem**: Pipeline completes but no neoantigens are identified.

**Solution**:
- Check if somatic variants were detected (prerequisite for neoantigens)
- Verify binding threshold isn't too stringent (try increasing from 500nM to 1000nM)
- Ensure mutations lead to protein changes (missense, frameshift, etc.)
- Check HLA types are correct and supported by the prediction algorithms
- Look for errors in intermediate files in the pVACseq output directory

## HLA Typing Issues

### OptiType Fails

**Problem**: OptiType fails to complete HLA typing.

**Solution**:
- Check if input FASTQ files exist and have sufficient coverage
- Verify razers3 is properly installed and working
- Ensure you have the OptiType reference data available
- Try with different coverage or enumeration parameters
- Check Python version compatibility (OptiType typically requires Python 2.7)

### Inconsistent HLA Types

**Problem**: Different HLA typing tools give different results.

**Solution**:
- This is normal - different tools have different sensitivities
- Use high-coverage data for more accurate typing
- Consider running multiple HLA typing tools and taking a consensus
- Focus on tools with highest accuracy for your data type (WGS, WES, RNA-seq)

## Configuration and System Issues

### Path and Environment Issues

**Problem**: Tools not found despite being installed.

**Solution**:
- Check PATH environment variable: `echo $PATH`
- Ensure conda environment is activated
- Verify config.sh has correct paths to all tools and resources
- Use absolute paths instead of relative paths in commands
- Check file permissions for executables: `chmod +x script.sh`

### Resource Limitations

**Problem**: Jobs fail due to memory or time limits.

**Solution**:
- For cluster jobs, increase requested resources: `-l mem=32gb,walltime=48:00:00`
- Monitor resource usage to identify bottlenecks: `top`, `htop`, or cluster monitoring
- Split large jobs into smaller subtasks
- Consider using SSD storage for temporary files to reduce I/O bottlenecks
- Reduce thread count for memory-intensive steps

## Getting Help

If you encounter issues not covered in this guide:

1. **Check Logs**: Examine log files in the `logs/` directory for specific error messages
2. **Documentation**: Refer to the original tool documentation for specific error codes
3. **Search Online**: Many bioinformatics errors have been encountered by others
4. **GitHub Issues**: Open an issue on our GitHub repository with:
   - Description of the problem
   - Relevant log snippets
   - Your config.sh (with sensitive information removed)
   - Software versions: `conda list` output
   - System information: OS version, CPU, RAM
5. **Contact Developers**: For urgent issues, contact the pipeline maintainers directly
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls