Query regarding identifying gene level CNVs, autobin and bintest

Hello,

I am using the cnvkit 0.9.9 in order to obtain copy number variants across a tumor-normal cohort.
The tumor and corresponding normal sample libraries were prepared using hybrid capture. I am currently using the cnvkit on 8 tumor-normal pairs and would like to scale up the sample size to ~200 tumor normal pairs. My overall goal is to obtain a list of copy number variants across different genes.

I have used the batch command as suggested on the doc sheet and have a few questions based on some of the observations I have made so far.

I used the following commands to generate the antitarget bed file following which I used the batch command on all of the tumor and normal bam files.
`cnvkit.py antitarget my_target.bed -g data/access-5kb-mappable.hg19.bed -o my_antitargets.bed`
`cnvkit.py batch *tumor.bam -n *normal.bam -t my_target.bed -a my_antitargets.bed -f hg19.fa -m hybrid --segement-method cbs -d output_dir`

As per the cnvkit docs, the `cnvkit.py call`generates the absolute copy numbers across segments that were obtained during the segmentation process.In order to identify gene level CNV, would you suggest I compare the breakpoints predicted by `cnvkit.py breaks` with the call.cns file or would you suggest that I use the bintest.cns file as to my understanding the bintest.cns file also reports the log copy ratios but at the individual bin level and not at the level of a segment. Is my understanding correct? Please correct me if I am wrong. And on the same note, could you please elaborate on the purpose of "bintest.cns"?

I have another question with the autobin script. To my understanding, the autobin helps to identify suitable bin sizes based on the target panel (my_target.bed) being used during the sequencing process. The `cnvkit.py autobin` generated an output which suggested a suitable bin size of 45 bps. Do you know why the suggested bin size was very low? It is to be noted here that the panel we used for sequencing has 793 baits.

Detected file format: bed
Detected file format: bed
Estimated read length 151.0
Wrote /tmp/tmpyo1joxc8.bed with 100 regions
Limiting est. bin size 885876 to given max. 500000
Splitting large targets
Wrote output_dir/bait.target.bed with 4285 regions
Wrote output_dir/bait.antitarget.bed with 6157 regions
                Depth   Bin size
Target:         2225.647        45
Antitarget:     0.113   500000

I compared this against the target.bed generated from the `cnvkit.py batch` and this bed file had bin sizes similar to the my_target.bed with exceptions in case of targets which had sizes >360 bps. I also compared it with the bin sizes in bintest.cns, it appears that in the bintest.cns, the bin sizes were ~260 bps and in cases where the tile sizes in the panel >400 bp, the tiles were broken down to smaller bins of 260 bps.

chromosome	start	end	gene	depth	log2	weight	p_bintest
chr1	2491217	2491455	TNFRSF14	875.437	-0.681199	0.977831	6.49944e-05
chr1	2492049	2492169	TNFRSF14	856.3	-0.836096	0.966052	7.53761e-05
chr1	2493065	2493299	TNFRSF14	1028.92	-0.527241	0.977826	0.00260868
chr1	2494246	2494365	TNFRSF14	1381.59	-1.18558	0.967556	3.7711e-09
chr1	27101401	27101613	ARID1A	2683.78	0.542985	0.978304	0.00166497
chr1	150551303	150551543	MCL1	3119.3	0.833722	0.97338	6.58256e-06

Hence, could you please explain to me as to how I should interpret the results from autobin and why they are so very different from the other bed files that were generated.

Thanks in advance
Regards
Lavanya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query regarding identifying gene level CNVs, autobin and bintest #734

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Query regarding identifying gene level CNVs, autobin and bintest #734

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions