Hello,
I am using the cnvkit 0.9.9 in order to obtain copy number variants across a tumor-normal cohort.
The tumor and corresponding normal sample libraries were prepared using hybrid capture. I am currently using the cnvkit on 8 tumor-normal pairs and would like to scale up the sample size to ~200 tumor normal pairs. My overall goal is to obtain a list of copy number variants across different genes.
I have used the batch command as suggested on the doc sheet and have a few questions based on some of the observations I have made so far.
I used the following commands to generate the antitarget bed file following which I used the batch command on all of the tumor and normal bam files.
cnvkit.py antitarget my_target.bed -g data/access-5kb-mappable.hg19.bed -o my_antitargets.bed
cnvkit.py batch *tumor.bam -n *normal.bam -t my_target.bed -a my_antitargets.bed -f hg19.fa -m hybrid --segement-method cbs -d output_dir
As per the cnvkit docs, the cnvkit.py callgenerates the absolute copy numbers across segments that were obtained during the segmentation process.In order to identify gene level CNV, would you suggest I compare the breakpoints predicted by cnvkit.py breaks with the call.cns file or would you suggest that I use the bintest.cns file as to my understanding the bintest.cns file also reports the log copy ratios but at the individual bin level and not at the level of a segment. Is my understanding correct? Please correct me if I am wrong. And on the same note, could you please elaborate on the purpose of "bintest.cns"?
I have another question with the autobin script. To my understanding, the autobin helps to identify suitable bin sizes based on the target panel (my_target.bed) being used during the sequencing process. The cnvkit.py autobin generated an output which suggested a suitable bin size of 45 bps. Do you know why the suggested bin size was very low? It is to be noted here that the panel we used for sequencing has 793 baits.
Detected file format: bed
Detected file format: bed
Estimated read length 151.0
Wrote /tmp/tmpyo1joxc8.bed with 100 regions
Limiting est. bin size 885876 to given max. 500000
Splitting large targets
Wrote output_dir/bait.target.bed with 4285 regions
Wrote output_dir/bait.antitarget.bed with 6157 regions
Depth Bin size
Target: 2225.647 45
Antitarget: 0.113 500000
I compared this against the target.bed generated from the cnvkit.py batch and this bed file had bin sizes similar to the my_target.bed with exceptions in case of targets which had sizes >360 bps. I also compared it with the bin sizes in bintest.cns, it appears that in the bintest.cns, the bin sizes were ~260 bps and in cases where the tile sizes in the panel >400 bp, the tiles were broken down to smaller bins of 260 bps.
chromosome start end gene depth log2 weight p_bintest
chr1 2491217 2491455 TNFRSF14 875.437 -0.681199 0.977831 6.49944e-05
chr1 2492049 2492169 TNFRSF14 856.3 -0.836096 0.966052 7.53761e-05
chr1 2493065 2493299 TNFRSF14 1028.92 -0.527241 0.977826 0.00260868
chr1 2494246 2494365 TNFRSF14 1381.59 -1.18558 0.967556 3.7711e-09
chr1 27101401 27101613 ARID1A 2683.78 0.542985 0.978304 0.00166497
chr1 150551303 150551543 MCL1 3119.3 0.833722 0.97338 6.58256e-06
Hence, could you please explain to me as to how I should interpret the results from autobin and why they are so very different from the other bed files that were generated.
Thanks in advance
Regards
Lavanya
Hello,
I am using the cnvkit 0.9.9 in order to obtain copy number variants across a tumor-normal cohort.
The tumor and corresponding normal sample libraries were prepared using hybrid capture. I am currently using the cnvkit on 8 tumor-normal pairs and would like to scale up the sample size to ~200 tumor normal pairs. My overall goal is to obtain a list of copy number variants across different genes.
I have used the batch command as suggested on the doc sheet and have a few questions based on some of the observations I have made so far.
I used the following commands to generate the antitarget bed file following which I used the batch command on all of the tumor and normal bam files.
cnvkit.py antitarget my_target.bed -g data/access-5kb-mappable.hg19.bed -o my_antitargets.bedcnvkit.py batch *tumor.bam -n *normal.bam -t my_target.bed -a my_antitargets.bed -f hg19.fa -m hybrid --segement-method cbs -d output_dirAs per the cnvkit docs, the
cnvkit.py callgenerates the absolute copy numbers across segments that were obtained during the segmentation process.In order to identify gene level CNV, would you suggest I compare the breakpoints predicted bycnvkit.py breakswith the call.cns file or would you suggest that I use the bintest.cns file as to my understanding the bintest.cns file also reports the log copy ratios but at the individual bin level and not at the level of a segment. Is my understanding correct? Please correct me if I am wrong. And on the same note, could you please elaborate on the purpose of "bintest.cns"?I have another question with the autobin script. To my understanding, the autobin helps to identify suitable bin sizes based on the target panel (my_target.bed) being used during the sequencing process. The
cnvkit.py autobingenerated an output which suggested a suitable bin size of 45 bps. Do you know why the suggested bin size was very low? It is to be noted here that the panel we used for sequencing has 793 baits.Detected file format: bed
Detected file format: bed
Estimated read length 151.0
Wrote /tmp/tmpyo1joxc8.bed with 100 regions
Limiting est. bin size 885876 to given max. 500000
Splitting large targets
Wrote output_dir/bait.target.bed with 4285 regions
Wrote output_dir/bait.antitarget.bed with 6157 regions
Depth Bin size
Target: 2225.647 45
Antitarget: 0.113 500000
I compared this against the target.bed generated from the
cnvkit.py batchand this bed file had bin sizes similar to the my_target.bed with exceptions in case of targets which had sizes >360 bps. I also compared it with the bin sizes in bintest.cns, it appears that in the bintest.cns, the bin sizes were ~260 bps and in cases where the tile sizes in the panel >400 bp, the tiles were broken down to smaller bins of 260 bps.chromosome start end gene depth log2 weight p_bintest
chr1 2491217 2491455 TNFRSF14 875.437 -0.681199 0.977831 6.49944e-05
chr1 2492049 2492169 TNFRSF14 856.3 -0.836096 0.966052 7.53761e-05
chr1 2493065 2493299 TNFRSF14 1028.92 -0.527241 0.977826 0.00260868
chr1 2494246 2494365 TNFRSF14 1381.59 -1.18558 0.967556 3.7711e-09
chr1 27101401 27101613 ARID1A 2683.78 0.542985 0.978304 0.00166497
chr1 150551303 150551543 MCL1 3119.3 0.833722 0.97338 6.58256e-06
Hence, could you please explain to me as to how I should interpret the results from autobin and why they are so very different from the other bed files that were generated.
Thanks in advance
Regards
Lavanya