Release list

v0.9.14 Latest

Latest

etal released this 01 Jul 18:46

v0.9.14

888b0da

Version 0.9.14

CNVkit now has a purity subcommand for estimating tumor purity and ploidy directly. This new command's output is compatible with the existing call command where purity and ploidy are used for absolute copy number determination. You can also continue to use purity and ploidy values from other tools like PureCN, or histopathology visual estimates.

The non-default segmentation methods by hidden Markov model (segment -m hmm, hmm-tumor, hmm-germline) and Haar wavelet (-m haar) are both greatly improved. Simple benchmarking on the test files in this repo now shows roughly 90% concordance between the three segmentation methods.

The HMM methods are now implemented in pure Python with NumPy/SciPy, eliminating the heavy dependencies pomegranate and pytorch. The installation size and Docker image are consequently smaller, too. The somatic methods (hmm, hmm-tumor) now simultaneously estimate purity and ploidy by grid search over the marginal likelihood on autosomal arms. The emission model for variant b-allele frequency was also upgraded from a Gaussian approximation to a beta-binomial on raw allele counts, improving accuracy when jointly segmenting on coverage (BAM/CRAM) and a given VCF.

The Haar method was further optimized for performance, and two crucial bugs were fixed that previously resulted in over-segmentation. This very fast segmentation method is now recommended for WGS.

This release also includes substantial overhauls of sex-chromosome inference and RNA-based copy number estimation, and a large body of numerical-robustness, packaging, data, and testing improvements.

New features

purity (new command):

New cnvkit.py purity subcommand estimates tumor purity and ploidy from a
.cnr/.cns pair, using the same grid-search likelihood model as the somatic
HMM methods. Purity and ploidy inputs are validated via argparse type
validators.

segment:

HMM segmentation rewritten in pure NumPy/SciPy, including Baum-Welch/Viterbi machinery
and the emission distributions; pomegranate and torch are no longer required. BAF
emission is now a beta-binomial on allele counts.
(#1003)
New fast Haar-based breakpoint detector unions depth and BAF breakpoints to
recover copy-neutral LOH.
BIC-based segment-merging filter for adjacent segments.
Warn when the input has no sample_id.

batch:

Allow a sample to serve as its own reference, and reuse shared coverage across
samples to avoid redundant computation. (#48)
Accept --sample-sex and propagate sex arguments through to call. (#500,
#635)
Expose the --bias-smoother option. (#1028)
Use autobin to choose target and antitarget bin sizes in hybrid mode. (#302)
Allow --fasta together with --reference for CRAM support. (#869)

fix:

Add an -r/--reference flag and clearer error messages when the reference
is missing or malformed. (#894)
Make the antitarget coverage optional, enabling fix for WGS samples without
an antitarget file. (#894)
Add an opt-in LOESS bias smoother as an alternative to the rolling median.
(#1028)

scatter:

Surface LOH and somatic SNV evidence as colored overlays. (#290)
Label only the genes requested with -g, not co-binned neighbors. (#458)
Floor the genome-wide y-axis so deep deletions no longer distort the plot.
(#385)
Genome-agnostic chromosome handling; warn on empty chromosome selections.

diagram:

Add a --gene filter, directional thresholds, and graceful handling of
reversed intervals. (#248)

export vcf:

Emit allele-specific copy number and BAF to represent LOH evidence. (#892)

call / export / import-theta:

Accept non-integer ploidy as input. (#953)

import-rna:

Add --normalize-method size-factors (DESeq2-style median-of-ratios with
leave-one-out).
Add --min-sample-fraction for sparse/single-cell cohorts. (#448)
New cnv_gene_info.py script builds gene-info tables for any genome; load
every gene from the bundled hg38 table.
Wire --diploid-parx-genome through and make it effective.

coverage:

Route bedGraph chromosome-name matching through the shared prefix detector.
Explicitly exclude duplicate reads in the bedcov coverage path. (#689)

reference:

Replace k-means/MCL clustering with k-medoids (PAM) on correlation distance;
add hierarchical clustering for batch-effect detection.

coverage / long chromosomes:

Auto-select CSI indexing for genomes with long chromosomes. (#817)

Bug fixes

Segmentation and numerical robustness:

Fix numerous NaN-propagation bugs across segmentation, weighting, and metrics.
(#436, #900, #908, #1036, #1043)
Fix segment crash on NaN log2 bins via the in-memory path. (#881)
Fix CBS segmentation crash on bins with missing chromosome/start. (#868)
Handle empty .cnr input and inputs where fewer than two bins survive
coverage filters, instead of silently producing no .cnr. (#891)
Defend the Savitzky-Golay path against non-finite log2 and lstsq
LinAlgError. (#508)
Fix the Haar segmenter's FDR calculation and make its weights NaN-safe.

Variants and VCF:

Fix variant re-segmentation crash / mis-slice on unsorted VCF. (#893, #1004)
Harden the VCF reader against missing GT, ./. no-calls, and unparseable
headers; do not count no-call . as a distinct allele.
Clamp purity-rescaled BAF to [0, 1]. (#601)

Intervals and chromosome names:

Fix start > end segments produced by squashing unsorted .cns. (#677)
Harden coverage and autobin against chromosome-name mismatches.
Fix cross-chromosome antitarget subtraction. (#471)
Normalize singleton NaN gene/accession names to - in merge/flatten/squash.

Other:

Fix multiple bugs in guess_baits.py. (#542)
Run serially when only one CPU is usable. (#1103)
Surface R subprocess stderr in call_quiet errors.

Compatibility

Python 3.10 support removed; 3.11 is now the baseline. Python 3.14 is
tested in CI.
Codebase modernized to the 3.11 baseline: PEP 604 unions, PEP 634 match/case,
PEP 618 zip(strict=True), removeprefix/removesuffix, and dict |=.
bioframe adopted for genomic interval arithmetic, replacing bespoke
implementations. (#226, #227, #982)
pysam is now a soft import, so CNVkit can run in place from source without it
installed. (#924)
Minimum biopython raised to 1.87 to address CVE-2025-68463.
Minimum dependency versions raised to align with Ubuntu 26.04 LTS (Resolute).
New skgenome.chromnames and skgenome.genomebuild modules add genome-aware
chromosome handling, including Roman-numeral chromosomes; sex-chromosome
detection is now genome-aware.

Sex-chromosome inference

Resolve sex calls by a representation-invariant ratio-of-residuals statistic
combined with an AND-gate across chrX and chrY evidence, replacing the prior
approach. (#785, #954)
Add a VCF chrX SNP-heterozygosity confirmer (binomial test) to sex inference.
(#341)
Show male chrX gains rather than muting them; reconcile target/antitarget sex
by chrX confidence. (#846, #883)
Default to female when chromosomal sex is indeterminate; stop the alarmist
warning on assemblies without sex chromosomes and report "Unknown" instead.
(#360, #669)

Packaging and infrastructure

HMM rewrite removes the pomegranate/torch dependency chain (see above),
greatly reducing install and image size.
Full mypy type-checking adopted: 387 errors reduced to 0 across the codebase,
and mypy added to CI (targeted at Python 3.12 for NumPy 2.5 stubs).
Linting moved to ruff 0.15 with formatting; migrated security scanning from
safety to pip-audit.
Added Hypothesis property-based tests and pytest-xdist parallel execution;
reorganized the test suite (split the monolithic command tests into focused
classes; marked slow tests). (#1038, #1042)
Docker: split a user-facing "Running CNVkit with Docker" doc from the developer
guide; tag the latest image on release tags rather than on master pushes;
parameterized the CNVkit version in the Dockerfile.
Removed obsolete in-repo Galaxy and WDL wrappers; Galaxy users are pointed to
the IUC tool suite.
Removed the defunct fused-lasso (cghFLasso) segmentation method.
Added FUNDING.yml, CONTRIBUTING.md, and a development section in the README.
CI: bumped GitHub Actions to Node 24 runtimes; trimmed the test matrix.

New contributor

@haoyu-haoyu made their first contribution in #1046.

Full Changelog: v0.9.13...v0.9.14

Contributors

haoyu-haoyu

Assets 2

v0.9.13

etal released this 03 Feb 01:23

v0.9.13

9447d16

Version 0.9.13

Thanks to Wei Gu Lab at Stanford for sponsoring development of this release!

A significant practical improvement to support clinical research is the bedGraph
(.bed.gz) input option to the "batch" and "coverage" commands. With no other change to
the workflow, you can now precalculate the per-base coverage profile of each BAM file,
effectively stripping PHI genomic sequence information before feeding the raw data to
CNVkit, before you or a collaborator perform copy number analysis.

This approach not only reduces HIPAA/IRB/legal risk, but also greatly reduces the size
of the raw data files that need to be stored for CNV calling, and streamlines reanalysis
of samples using different bin sizes and/or excluded genomic regions.

In steps:

Scan each BAM file for per-base coverage depth with e.g. bedtools genomecov -gb or
mosdepth. Output is .bed.gz (a.k.a. bedGraph).
Use the sample .bed.gz file as input to CNVkit's batch and coverage commands, the
same as you would use BAMs. It does not meaningfully affect the rest of the CNVkit
pipeline whether BAM or .bed.gz was used as the original sample input.

This release also includes major improvements to HMM segmentation performance,
packaging, testing, and general infrastructure, and fixes bugs in import-rna and
handling of genomic intervals.

New features

coverage:

Accept bedGraph (.bed.gz) files as input in place of BAM. This enables a
privacy-preserving workflow: extract per-base coverage from sensitive BAM
files once (e.g. with bedtools genomecov -gb), then share only the non-PHI
coverage data for downstream CNV analysis and collaboration. Format is
auto-detected from the file extension. (#984, #985)
Expose the samtools bedcov max-depth option (-d) via a new max_depth
parameter, and correctly parse the extra output column that bedcov emits
when -d is used. Default behavior is unchanged. (#973, #974; thanks
@tobias-beers)

genemetrics:

Implement the same summary statistics as segmetrics: confidence interval (ci),
prediction interval (pi), mean, median, mode, t-test (p_ttest), stdev, MAD, MSE, IQR,
midweight bivariance (bivar). These stats are useful for filtering to reduce
false-positive calls and for building ensemble callers. (#278, #987)
Enable the --smooth-bootstrap option for both genemetrics and segmetrics to give
more accurate CI estimates at genes or segments with a small number of bins (default
10 and below).

segment:

HMM segmentation now uses the pomegranate 1.x API. The minimum pomegranate
version is raised to 1.0.0. (#910)

Bug fixes

batch:

Show the sample name in error messages when a sample fails, instead of
silently swallowing exceptions. Previously, errors during parallel
processing were suppressed, making failures difficult to diagnose. (#971,
#979)
Generate target_bed in output_dir when --output-dir is given.
(Thanks @pontushojer)

segment:

Handle empty .cnr input cleanly instead of crashing. (#970)

import-rna:

Fix several long-standing bugs. Gene ID mismatches between the counts file
and the gene resource are now detected and reported. NaN values from
zero-count genes are replaced with NULL_LOG_COVERAGE instead of
propagating through downstream steps. A new test suite covers these paths.
(#499, #596, #706, #940, #944, #981)

fix / CNA.by_gene:

Fix an indexing bug where iloc/loc confusion caused incorrect slicing
when bin coordinates contained duplicates. (#773, #951, #979)

sniff_region_format:

Fix the known-extensions mapping so that file format detection no longer
always mismatches. (#956; thanks @dlaehnemann)

Compatibility

Minimum Python version raised to 3.10. Python 3.14 is now tested in CI.
argparse.FileType usage removed (pending deprecation in 3.14),
itertools.pairwise and |-union type syntax adopted throughout.
NumPy 2.x compatibility. Removed np.asfarray, np.float_, and
np.string_ usage. (Thanks @mr-c and @suhas-r)
Pandas 3.0 compatibility. Eliminated chained assignment and addressed
FutureWarning messages.
Minimum dependency versions raised to match Ubuntu 25.04 Plucky. Notably:
matplotlib >= 3.9.0, pyfaidx >= 0.8.0, reportlab >= 3.6.13 (security
fix).
Python 3.8 and 3.9 support removed.

Packaging and infrastructure

Scripts shipped with CNVkit (beyond cnvkit.py) are once again installed by
pip install. Argument parsing and invocation setup standardized across
scripts. (#957; thanks @dlaehnemann)
Conda recipe updated with build.run_exports version pin per Bioconda
linting requirements. (#877, #880)
Replaced flake8 with ruff for linting; added ruff formatting.
Added type annotations across most of the codebase.
Added devcontainer configuration for local development and testing.
Added .dockerignore; parameterized CNVkit version in Dockerfile.
Sphinx/ReadTheDocs configuration updated; docstrings converted to NumPy
format throughout the core pipeline for better API docs.
CI: integration tests via test/Makefile, Codecov upload, security
scanning (safety + bandit), tox caching.

New contributors

@tobias-beers made their first contribution in #974
@dlaehnemann made their first contribution in #956
@mr-c and @suhas-r contributed NumPy 2.x compatibility fixes in #934 and #945
@pontushojer fixed batch output directory handling in #940

Full Changelog: v0.9.12...v0.9.13

Contributors

mr-c, dlaehnemann, and 3 other contributors

Assets 2

Version 0.9.12

etal released this 17 Nov 16:28

v0.9.12

dd834b0

This is a bugfix release that addresses installation problems, an occasional issue with segmentation, and a command option that had been non-functional.

Fixes

Re-enable the coverage -q/--min-mapq option. It had stopped working at some point due to a type coercion issue. (#912; thanks @rach-kennedy)
Prevent CBS segmentation failures due to nulls in input .cnr. It's not clear what causes nulls to appear in .cnr files, but when they do, segmentation failed; this is happened silently in batch mode and could be difficult for user to triage when it happened. (#914, #436, #582, maybe #760, #896, #901 and nf-core/sarek#1625).
Raise max pomegranate dependency version from <=0.14.9 to <1.0.0 to avoid conflicts during installation. (#911, #890)

New Contributors

@rach-kennedy made their first contribution in #912

Full Changelog: v0.9.11...v0.9.12

Contributors

rach-kennedy

Assets 2

v0.9.11

etal released this 13 Apr 16:43

v0.9.11

450726e

Version 0.9.11

New features

Most commands include a new option, --diploid-parx-genome, to treat the
pseudoautosomal regions (PAR1/2) of human chromosome X as autosomal, i.e. diploid
regardless of sample sex. The value it takes is a human reference genome ID such as
"grch38". This feature should help reduce false calls on sex chromosomes in human
samples. (Thanks @rollf; #789)
The fix command takes a new option --smoothing-window-fraction to allow manual
tuning of the smoothing window used in GC and other automatic bias corrections.
(Thanks @kkchau; #859)
hg38 refFlat and genome accessibility data files are now included in the source tree.
(Thanks @berguner; #822, #837)

Bug fixes

The Docker image once again includes the additional scripts beyond cnvkit.py.
User-specified sample sex with -x now works properly. (Thanks @28rietd and @ccoo22;
#843, #851)
User-specified smoothing window size now applies in HMM segmentation. (Thanks
@zhuying412; #833, #835)
An error in export vcf has been fixed. (Thanks @pwwang; #818)

Other updates

Dependency versions are updated to match Ubuntu 23.04 Lunar, more or less.
Automated testing is done on Python version 3.8 through 3.12 -- these are the
"supported" versions.
Small documentation fixes.

New Contributors

@pwwang made their first contribution in #818
@berguner made their first contribution in #837
@zhuying412 made their first contribution in #835
@kkchau made their first contribution in #844
@28rietd made their first contribution in #851

Full Changelog: v0.9.10...v0.9.11

Contributors

pwwang, zhuying412, and 5 other contributors

Assets 2

Version 0.9.10

etal released this 24 Feb 18:40

v0.9.10

8d477b0

This long-awaited release includes major plotting enhancements in the heatmap, scatter, and diagram commands, as well as a new export gistic command, thanks to joint work by @tetedange13 and @tskir (see below).

There are also significant infrastructure improvements including bug fixes, modernized packaging, and build/test automation.

New features

diagram:

New options --no-gene-labels to not display gene labels on the plot, and -c / --chromosome to plot a single chromosome (#628, #629, #634; thanks @tetedange13)

heatmap:

New CLI options (#35, #625, #632, #652; thanks @tetedange13 and @tskir):

--vertical: Transpose the plot, displaying the genome axis vertically instead of horizontally
--delimit-samples: Add an delimitation line between each sample row (or column, with --vertical)
--title: Set the plot title

scatter:

New option --fig-size: Set the output image dimensions (#600, #641; thanks @tetedange13 and @tskir)
Show triangles at the bottom of the plot to indicate where segments are hidden below the plotted region by automatic pruning at 'ymin=-5'. Also log a warning when this happens. (#385, #643, #645; thanks @tetedange13, @tskir, and @micknudsen)

export gistic:

New export command to generate an unsegmented "markers" file for use with GISTIC. GISTIC also takes a second input file with corresponding segments in SEG format, which CNVkit can generate with export seg. (#622, #623, #776; thanks @tetedange13, @tskir, @BioComSoftware)

API and CLI changes

Running cnvkit.py without any arguments will now display the full help text instead of an error message.
Supporting scripts (aside from cnvkit.py) are no longer installed automatically. They are still available in the source tree.

Documentation

Clarified bintest usage, provided an example, and explained outputs. (#646; thanks @tetedange13 and @tskir)

Bugfixes

Fixed several errors and warnings due to outdated usage of dependencies, e.g. pandas, pysam.
Fixed the Dockerfile and Docker image to install R packages properly for CNVkit to use internally. (#765; thanks @28rietd)
Made the Makefile example/test workflow more portable across environments. (#661, #666, #695, #699; thanks @tetedange13)
batch: Apply --drop-low-coverage option in the segmetrics step. (#694)
bintest: Include 'probes' column in .cns output so that it is valid .cns (closes #693)
fix: Condense the error message when coordinate set contains duplicate values. (#637, #638; thanks @tskir)
fix: Choose a smoothing window fraction based on the data size to help correct biases better at the extremes of the GC range, where previously some residual GC bias could still be present after correction. (#379)
BED inputs: Handle UCSC BED 'browser' header line, as used in Agilent BED files with a 2-line header. (closes #696, #618)

Internal

Modernized the packaging configuration with pyproject.toml, leaving a stub setup.py for legacy setuptools compatibility. (#790)
Set up automated testing through GitHub Actions (GHA) to verify Python versions 3.7 through 3.10 using pytest and tox. The latter make local testing with multiple Python versions more reliable, too. (#792, #793, #794)
Updated minimum dependency versions to roughly match Ubuntu 22.04 LTS packages; these are used in CI, too.
Applied black and pylint to reformat the codebase consistently and replace deprecated calls to libraries. (#795)
Remove joblib pinning (#589, #770; thanks @DavidCain and @risicle)
Remove networkx pinning (#606, #771; thanks @DavidCain)
Make the extreme-GC filters more easily configurable via params.py (#738, #752, #753, #764; thanks @tetedange13 and @tsivaarumugam)

Contributors

risicle, DavidCain, and 6 other contributors

Assets 2

Version 0.9.9

etal released this 01 Jun 19:43

v0.9.9

aff02f9

This release contains a new script and, more importantly, a volley of bug fixes by @tskir, a new CNVkit collaborator.

New script

genome_instability_index.py

For each given sample (.cnr or .cns, ideally .call.cns), this script reports two values, the number of non-neutral segments and the fraction of the total sequencing-accessible genome that they cover. Together, these values have been described as the Genome Instability Index (G2I) by Bonnet et al. (2012). These numbers are not difficult to calculate directly from .cns files, but they are frequently requested, so here you go.

Bug fixes by @tskir

Installation:

Set NetworkX minimum version to work with pomegranate on Python 3.9. (#614, #606; thanks @auberginekenobi)

genemetrics, diagram, scatter:

Fix an error in iterating over chromosomes during gene-wise operations or gene selection. (#580, #573, #576, #579; thanks @diushiguzhi @eriktoo @hrkemp @drmrgd @HYan-lei)

access:

Fix an error when all chromosomes listed in the exclusion BED file appear only once. (#581, #574; thanks @dajana17)

autobin:

Allow specifying explicit output filenames via -o/--output. If this option is not used, the behavior is the same as before. Some pipeline frameworks such as Snakemake require output filenames to be explicit in wrapped commands. (#608, #607; thanks @enes-ak)
Fix median-size file selection. (#613, #611; thanks @michaelsykes)

coverage:

Fix a potential crash with the -c option; generally make the -c option's results more stable. This changes the results you'd get with coverage -c compared to previous CNVkit versions, but in any case -c isn't recommended
for production use, only for algorithm exploration. (#598, #593; thanks @joys8998)

genemetrics:

Rename column n_bins to probes in output, for compatibility with 'call' and 'export' commands. (#586, #585; thanks @eriktoo)

scatter:

Avoid losing short segments in rasterized PNG output, depending on DPI settings. (#615, #604; thanks @jimmy200340)
Allow NCBI-style chromosome names that contain a ".", e.g. "NC_039902.1". (#603, #602; thanks @amora197)

segment:

Fix an IndexError during smoothing when the signal is shorter than a window, e.g. on chrY where the chromosome contains few bins. (#590, #587; thanks @tetedange13)

Improvements from other contributors

scripts/guess_baits.py: Fix a copy-paste error on script launch. (#588; thanks @sssimonyang)
Documentation: Link to the Debian package alongside other packages. (#562; thanks @mr-c)

Assets 2

Version 0.9.8

etal released this 01 Jun 19:39

v0.9.8

b218280

Continuing a focus on stability and compatibility with other software:

Support for reading CRAM files with an optional user-provided local FASTA
file for the reference genome sequence. (#555; thanks @johnegarza)
Call Rscript subprocess with safer flags for the R environment. Previously,
--vanilla ignored R environments with the library path in a non-default
location specified in the user's .Rprofile. Now, --no-restore and
--no-environ ensure a clean environment but still respect the user's
.Rprofile settings beyond that. (#491; thanks @pablo-gar)
Compatibility with the latest release of pandas. (#502, #523)

This release also fixes some regressions reported since the release of CNVkit
0.9.7 (which introduced a number of new performance optimizations).

scatter: A bug when plotting a region of a chromosome. (#536, #457; thanks tskir)
scatter: An IndexError when plotting entire chromosomes, e.g. chr7. (#541,
#461, #535; thanks @tskir)
fix: A bug that occurred after automatic bias corrections, introducing
NaN-valued rows in placed of rejected bins, leading to a downstream crash in
CBS segmentation. (#551, #436, #547; thanks @johnegarza)

Assets 2

Version 0.9.7

etal released this 05 Jun 13:57

v0.9.7

7472df6

Stable release with only minor changes from the previous beta release 0.9.7.b1.

New contributions:

Cram support: Look for and use .cram + .crai alignment and index file pairs, in addition to .bam + .bai. (#495, #434; thanks @sridhar0605)
Update Docker file to use Python 3 apt packages and pip3 (#493; thanks @keiranmraine)
Documentation fix (#496; thanks @rollf)

Assets 2

Version v0.9.7 beta

etal released this 30 Nov 23:04

v0.9.7.b1

9486712

This release contains several major enhancements particularly relevant to germline analysis. If used in production pipelines, further evaluation and benchmarking would be wise. Highlights:

Control sample clustering: To make better use of larger reference sample pools, reference --cluster will correlate the given normal samples' bin-wise coverage depths to extract clusters to be used as reference profiles. The reference .cnn file produced this way will then contain the log2 and spread summary statistics for each cluster, in addition to the global summary stats. Given this "clustered reference" profile, fix --cluster will then correlate each test sample to each clustered log2 profile in the reference to choose the most relevant control pool for normalization. The batch option --cluster will perform both these steps. Nod to Gambin lab and the authors of ExomeDepth, CoNVaDING, CLAMMS, and others for inspiration. (#308)

Calculation of bin weights has changed. This will change your segmentation results, hopefully for the better. Details below. (#429)

The batch pipeline now performs some segmentation post-processing automatically: calculating and filtering segmentation calls by 50% confidence intervals of the segment mean log2 ratios, in order to reduce false positives, followed by separate bin-level testing to detect small (e.g. exon-size) CNVs that were not caught by segmentation. The bin- and segment-level results are returned as separate .cns files; deciding whether and how to combine or use these results together is left as an exercise for the user.

We've dropped Python 2.7 support. Python version 3.5 or later is now required.

This is a beta release. Please let me know how it works for you via the Issues page. If this release contains any issues that are blocking your work, try installing one of the previous stable versions 0.9.6 or 0.9.5::

conda install cnvkit=0.9.6

Dependencies

Remove all Python 2.7 compatibility shims.
Raise minimum pandas version from 0.20.1 to 0.23.3.
Add scikit-learn (dependency of pomegranate, for HMM segmentation). Remove the older hmmlearn implementation.

Commands

batch:

Post-process segments with segmetrics (50% CI), call (filter by CI, but don't call integer copy number), and bintest.
Return bintest result as a separate, independent .cns output.
Add option '--segment-method', equivalent to segment -m.
Rename option '--method' to '--seq-method' (but '--method' still accepted for now).
Add option --cluster, passed to reference and fix if given. (#308)

bintest:

New command superseding cnv_ztest.py script.
Report p-value as a column p_bintest (previously ztest) in the .cns output.
Fix probabilities for positive log2 values, i.e. gains, which previously always had p-value = 1.0. (#429)

fix:

Change calculation of bin weights to be more consistent with 1-var meaning, with more emphasis on reference spread. It is now simpler, more consistent with import-rna, and particularly improves the accuracy of bintest. (#429)
Squeeze the range of reference-free weights
Drop bins with gc outside [.3, .7]. CLAMMS paper shows these bins carry no useful signal.
With --cluster and a clustered reference input, calculate the test sample's Pearson correlation versus each cluster's log2, and take the best one for normalization.

reference:

With --cluster, do k-means clustering of the sample bin-level read depth correlation matrix, per Kusmirek et al. 2018. Parameter k defaults to the cube root of number of samples. Only clusters of at least 4 samples are kept for emitting summary statistics in the reference profile.

segment:

hmm: Fix pomegranate-based implementation. Use iterative Savitzky-Golay smoothing with a narrow bandwidth.
Use HMM for post-TCN segmentation on VCF allele freqs
Add parameter for smoothing before CBS (thanks @EwaMarek)

segmetrics:

Add 'ttest' option for 1-sample t-test p-value.
Implement & expose --smooth-bootstrap option. For smoothing, KDE bandwidth is based on each bin's weight as a proxy for the SD of its log2 ratio values. To reduce the risk of over-smoothing on larger sample sizes, we use a loose interpretation of Silverman's Rule to reduce the bandwidth as the number of bins in a segment increases (k^-1/4).

API

do_heatmap: Add 'ax' parameter (thanks @fbrundu)
CNA.residuals(): speed; keep index intact in returned pd.Series
smoothing: Linearly roll-off weights in mirrored wings. Affects CNA.smoothed() / savgol, but not rolling median bias correction.
Rename CNA.smoothed() to CNA.smooth_log2(), since it returns the smoothed log2 values, not a new/altered CNA.

Bug fixes

batch: Fix argparse formatting issue (#466)
import-rna: Fix a regression in reading 2-column per-gene counts (-f counts).
reference: Fix sex inference/usage when creating haploid-x reference (#459; thanks @duartemolha)
scatter: Use a safe matplotlib backend on OS X to avoid crash
VariantArray: Fix/streamline indexing of variants by bin/segment

Assets 2

Version 0.9.6

etal released this 21 Mar 21:10

v0.9.6

1c8d69d

Essential maintenance and bug fixes, for the most part. Some key dependencies have changed, though this should be generally painless for you, and one or two regressions introduced by recent optimizations have been fixed.

This will be the last CNVkit version to run on Python 2.7. The next major release of pandas (0.25.0) will remove support for Python 2.7, and once that happens it will become increasingly difficult to install future versions of CNVkit on Python 2.7 -- so we're not going to try.

The segmentation method flasso depends on the R package cghFLasso, which is unmaintained and has been removed from CRAN. For now, segment -m flasso is still supported if you already have cghFLasso installed. But given the above, flasso will be removed from the next CNVkit version in favor of the HMM-based methods.

Dependencies

Raised minimum pandas version from 0.18.1 to 0.20.1, and support up to 0.24.2, resolving some warnings and an error in pandas 0.22+. (#413; thanks @chapmanb)
The soft dependency on hmmlearn is replaced with an explicit dependency on pomegranate for the HMM-based segmentation methods. This dependency will now be pulled in automatically when installing via pip or conda.
The R package cghFLasso has been removed from CRAN, and therefore is no longer a dependency of CNVkit and will not be installed automatically through the standard conda installation method. (#419)

Commands

antitarget:

Be more specific in removing noncanonical chromosomes (e.g. alternate contigs, mitochondria) from the binned regions. This avoids skipping chromosomes of interest in some non-human genomes with non-numeric contig names, like yeast. (#388; credit for regexes to @brentp)

coverage:

With --count-reads, use query aligned length to handle soft-clipped reads properly. Now the results with and without this option should be similar. (#411; thanks @desnar)

segment:

For -m flasso, partition array by chromosome to avoid edge effects. (#409, #412; thanks @giladmishne)
Removed the deprecated option --rlibpath; use --rscript-path instead.
HMM implementations have changed, and results may be different now. Note that the HMM methods are still provisional. A stable, supported version of these methods will be provided in the next CNVkit release.

Python API

do_scatter now returns a figure (#408; thanks @jeremy9959)

Bug fixes

scatter: Whole chromosomes can once again be specified with -c. (In the previous release, a chromosome without coordinates would cause an IndexError.) (#393)
import-rna: Option --max-log2 can now be specified by users. (Previously, only the default value of +3.0 worked.)
VCF I/O (skgenome.tabio): Support GATK 4's VCF files that contain records with empty ALT alleles, substituting zero if ALT AD is missing. (#391; thanks @chapmanb)
Due to a certain versioning-dependent interaction between numpy, pandas, cython, and conda (details here), CNVkit may have printed spurious RuntimeWarning messages which could be safely ignored. The current release attempts to silence these messages if they occur. (#390).

Assets 2

Uh oh!

Releases: etal/cnvkit

Release list

v0.9.14

Version 0.9.14

New features

Bug fixes

Compatibility

Sex-chromosome inference

Packaging and infrastructure

New contributor

Contributors

Uh oh!

v0.9.13

Version 0.9.13

New features

Bug fixes

Compatibility

Packaging and infrastructure

New contributors

Contributors

Uh oh!

Version 0.9.12

Fixes

New Contributors

Contributors

Uh oh!

v0.9.11

Version 0.9.11

New features

Bug fixes

Other updates

New Contributors

Contributors

Uh oh!

Version 0.9.10

New features

API and CLI changes

Documentation

Bugfixes

Internal

Contributors

Uh oh!

Version 0.9.9

New script

Bug fixes by @tskir

Improvements from other contributors

Uh oh!

Version 0.9.8

Uh oh!

Version 0.9.7

Uh oh!

Version v0.9.7 beta

Dependencies

Commands

API

Bug fixes

Uh oh!

Version 0.9.6

Dependencies

Commands

Python API

Bug fixes

Uh oh!