ABC

Allele-specific Binding from ChIP-Seq

Description

ABC identifies potential allele-specific binding events at known heterozygous positions within the aligned reads of a ChIP-Seq experiment.

ABC requires at a minimum two (2) files:

A sorted BAM/SAM file of a ChIP-Seq experiment
A file containing the position, strand and allele information of heterozygous single nucleotide variants (SNVs) (SNPs and/or mutations)

ABC calls allele-specific binding by identifying a bias in the distribution of the SNV alleles while attempting to control for potential false positives. If you have genomic sequence data you can use the allele ratio in the DNA as the expected frequency to control for chromosome copy number.

Special notes before using ABC

Care should be taken in the selection of SNVs. Homozygous SNVs or SNVs within duplications will appear to have strong allele-specific effects. It is recommended that you filter SNPs prior to running ABC (i.e., those mapping to a motif). In addition, ABC does not filter duplicated reads, therefore the user may wish to remove duplicated reads prior to running ABC.

Installation

Software environment

Perl and R should be installed on the system; instructions regarding their installation can be found from http://www.perl.org/get.html and http://cran.r-project.org, respectively.

Additionally the perl module Statistics::R should be installed using the following command:

cpan Statistics::R

Further information on how to install Perl modules can be found here: http://www.cpan.org/modules/INSTALL.html.

Download ABC

ABC can be downloaded from this repo, using the git clone command.

Usage

perl ABC.pl --align-file <input.sam> --snv-file <snv filename> --out <output filename>

Parameters

Parameter	Description
`--align-file`	Specify the ChIP-Seq BAM/SAM file. Note: the BAM/SAM file must be sorted. If a BAM file is supplied ABC will create a temporary SAM file.
`--snv-file`	A tab-delimited text file containing the list of heterozygous SNPs. Important: the CHR column should match the build used for the alignments (ie. chr# for hg19 or just the # for b37). It is the responsibility of the user to verify the quality of the SNVs (i.e., do they pass Hardy-Weinberg equilibrium, etc...).

The format of the SNV file is as follows (Do not include the header):

SNV_ID	CHR	POSITION	STRAND REF_ALLELE	OBSERVED_ALLELES ALLELE_RATIO_DNA
rs111	chr1	1111	+	A	A/C	0.5
SNV1	chr10	10234	-	C	C/G	0.4

Options

Option	Description
`--bg`	Specify a `.bedgraph` file capturing the ChIP-Seq signal (shifted read pileups) of the BAM/SAM file. This can be useful to prioritize SNPs with low coverage, since they may fall within centre of the +ve and -ve strand peaks. This phenomenon is caused by short reads and is not necessary for longer reads.
`--out`	Specify the output file prefix (default `ABC`; i.e., ABC will create two output files ABC.dist and ABC.align)
`--min-reads`	The minimum number of reads covering the a SNVs (default: 25; ie. ABC will report only those SNPs/mutations with # reads overlapping them).
`--d`	Divide chromosomes into segments until reaching d lines for faster retrieval (default: 2000). A large BAM/SAM file can take a long time to process. You may try to increase the value of `d`. However, very large numbers will not necessarily increase speed. Consider splitting the alignment file by chromosome.
`--mw-thres`	P-value threshold for the Mann-Whitney U test used to test a bias in the read position between the SNV alleles. (default: 0.05). To report all SNVs set this parameter to 0.
`--f-thres`	P-value threshold for the Fisher's exact test used to test for a bias between the strand distribution of the SNV alleles. (default: 0.05). To report all SNVs set this parameter to 0.
`--min-mapq`	Set the minimum allowed MAPQ score for each read (default: 0).
`--verbose`	Print progress and SNV results summary to screen.
`--help`	Prints command line options.

Visualizing SNV Results

Create a figure of the distribution of reads containing a SNV of interest

This step requires that the ABC has finished running. Once ABC is finished a figure can be generated by specifying the output file prefix used in the initial run and the SNV ID as follows:

Usage: perl ABC.pl --out --visualize-snv

Output

A table of the results can be found in the output file with the .dist extension.

Column Name	Description
`SNV`	SNV Identifier
`CHR`	Chromosome
`BP`	Position on chromosome
`REF`	Reference allele
`OBS`	Observed alleles
`A1`	Allele 1
`N_A1`	Number of reads containing `A1`
`F_A1`	Frequency of `A1`
`N_A1_POS`	Number of reads aligned to the +ve strand containing `A1`
`N_A1_NEG`	Number of reads aligned to the -ve strand containing `A1`
`A2`	Allele 2
`N_A2`	Number of reads containing `A2`
`F_A2`	Frequency of `A2`
`N_A2_POS`	Number of reads aligned to the +ve strand containing `A2`
`N_A2_NEG`	Number of reads aligned to the -ve strand containing `A2`
`N_TOTAL`	Total number of reads overlapping a SNV, including: `N_ERRORS`, `N_OMITTED`, `MISSING_N`
`N_ERRORS`	Number of reads containing a nucleotide other than `A1` or `A2`
`N_OMITTED`	Number of reads where the SNV falls within a clipped or deleted region
`MISSING_N`	Number of reads containing an ambigous nucleotide `N`
`MAX`	The maximum value of from the `.bedgraph` file (if specified) within the 2x read length window centered on the SNV
`P_BINOM`	P-value used to call allele-specific binding
`P_MANN_WHIT`	P-value used to call a bias in read position between alleles
`P_FISHER`	P-value used to call a bias in the strand distribution between alleles
`P_CHISQ`	Same as `P_FISHER`. Greater number of reads required
`P_STRAND`	A binomial p-value of the differences in strand abundance
`A1_Position`	Position of A1 within reads
`A1_Strand`	Strand of reads containing A1
`A2_Position`	Position of A2 within reads
`A2_strand`	Strand of reads containing A2

The alignments separated by the alleles of each SNV can be viewed in the output file with the .align extension.

Examples

To test ABC, users should download the test SAM file (766MB compressed using gzip, MD5sum bbbac4164553231c77622e5bd8b40fa0). The file is a sorted SAM file of the aligned ChIP-Seq reads for the FOXA1 in MCF7 cells.

Decompress ERR022033.sorted.sam.gz using gunzip.
Download FOXA1 ChIP-Seq data, GSM631471.
Convert ERR022033.sra to a .fastq file with the sratoolkit using fastq-dump ERR022033.sra.
Align the resulting ERR022033.fastq file to the reference genome using Bowtie2 (Langmead and Salzberg, 2012) and converted to a .bam file with samtools (Li, et al., 2009).

bowtie2 –x /path to index/hg19 –U ERR022033.fastq | samtools view –bS - > ERR022033.bam

Sort the .bam file with samtools and converted back to a .sam file.

samtools sort ERR022033.bam ERR022033.sorted
samtools view -h -o ERR022033.sorted.sam ERR022033.sorted.bam

Create a tab-delimited text file containing the SNV information. In this case study, we have created ABC_SNPs.txt, available in this repo, which contains the following line once decompressed using gunzip.

rs4784227	chr16	52599188	+	C	C/T	0.5

Run ABC using the following command in the terminal.

perl ABC.pl --snv-file ABC_SNPs.txt -–align-file ERR022033.sorted.sam --out ABC_SNPs

Visualize the results (limited to one SNP at a time) with the following command.

perl ABC.pl --out ABC_SNPs --visualize-snv rs4784227

ABC will output a .pdf file (rs4784227.ABC.SNPs.align.pdf) of the read distributions for the specified SNV

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
ABC.pl		ABC.pl
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABC

Description

Special notes before using ABC

Installation

Software environment

Download ABC

Usage

Parameters

Options

Visualizing SNV Results

Create a figure of the distribution of reads containing a SNV of interest

Output

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ABC

Description

Special notes before using ABC

Installation

Software environment

Download ABC

Usage

Parameters

Options

Visualizing SNV Results

Create a figure of the distribution of reads containing a SNV of interest

Output

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages