Releases: ml-struct-bio/cryodrgn
v3.5.2: Faster backprojection and downsampling
In this patch release we have introduced batched data processing to voxel-based backprojection to improve runtimes up to 10x, especially on larger datasets, as well as adding I/O threading to downsampling for up to 2x faster runtimes. We have also fixed a small bug in backprojection to make calculation of backprojected volume voxel values more efficient.
The default batch sizes used in our primary reconstruction commands have been doubled from 8 to 16, as larger batch sizes generally result in improved runtimes here as well, and current GPU architectures are generally capable of handling the increased memory requirement. Consider using even larger batch sizes if possible on your own hardware!
v3.5.1: --force-ntilts; fixing tilt-series indexing in interactive filtering; class label scatterplots
This is a patch release meant to address some issues found in #450, as well as to add miscellaneous new functionalities:
- added
--force-ntiltstocryodrgn backproject_voxelto only include particles with at leastxavailable tilts, and to save the corresponding filtering.pklindices to the output folder (#402) - tilt-series with an uneven number of tilts across particles triggered an error when used as inputs to the interactive filtering command
cryodrgn filter; this has been fixed and tilt-series dataset loading in filtering has been made more robust - not all landscape analyses outputs (especially the numberings assigned to recovered states) where 1-indexed to match the new indexing switch from 0-indexing; we have also updated the labels in plots such as
outdir/analyze.<epoch>/kmeans20/umap_hex.pngto be 1-indexed to match the volume file labels - added scatterplots to
cryodrgn_utils plot_classeswhich is now the default set of plots produced; older KDE density plots are available throughplot_classes ... --plot-types kde(or--plot-types kde scatterto produce both) - field names given to data block columns in
.starfiles output by commands such asdownsamplenow include RELION-style numberings:becomesloop_ _rlnMagnification _rlnDefocusU _rlnDefocusV _rlnDefocusAngle _rlnImageNameloop_ _rlnMagnification #1 _rlnDefocusU #2 _rlnDefocusV #3 _rlnDefocusAngle #4 _rlnImageName #5 - improved logging in landscape analysis tools
Let us know if you have any questions or suggestions!
v3.5.0: 1-indexing, autodecoders, new landscape analyses, unified parse_star, automatic analyses
This release introduces a number of new features and tools into cryoDRGN:
1-indexing of output epochs and volumes
CryoDRGN outputs were previously 0-indexed, meaning that the first output epoch was labelled 0 (e.g. outs/weights.0.pkl) and the final epoch was labelled num_epochs-1. Many users found this unintuitive (not least the developers, but see also #151), and having the final epoch be labelled e.g. 49 when using 50 epochs was often confusing for new users.
This also applies to volumes generated by tools such as cryodrgn analyze, which were previously analyze.11/kmeans20/vol_000.mrc, vol_001.mrc, ..., vol_019.mrc, but are now labelled vol_001.mrc, vol_002.mrc, ..., vol_020.mrc. We also updated the use of beta_schedule for the new epoch numbering scheme.
Landscape analysis
We have enhanced our landscape analysis tool analyze_landscape_full using the Leiden clustering techniques developed by the work of @koheisanno.
Reconstruction using an autodecoder (beta)
We have added a new command cryodrgn train_dec, currently in beta development, for training a volume reconstruction autodecoder model in which z-latent-space embeddings are generated through a separate Adam optimizer. This omits the need for an encoder and greatly improves runtimes over cryodrgn train_vae!
Parsing RELION5 tomo files into 2D format
cryoDRGN now includes a new command cryodrgn_utils parse_relion for turning 3D particle stacks output by RELION v5 into the 2D format supported as cryoDRGN input:
cryodrgn_utils parse_relion -t tomograms.star -p particles.star --tilt-dim 4096 4096 -o particles_2d_noctf.star
Unified parse_star
@ryanfeathers contributed cryodrgn parse_star, which merges the functionalities of cryodrgn parse_star_ctf and cryodrgn parse_star_poses, has now been added to the list of commands.
Running analyze automatically at the end of training
The reconstruction commands cryodrgn train_vae and cryodrgn abinit_het now automatically also run cryodrgn analyze on the final epoch once training has been completed. This behavior can be deactivated using the --no-analysis flag to each of these commands!
Other improvements and bug fixes
- support for low-pass filtering and cropping of volumes in evaluation for more manageable runtimes and memory usage
- fixing
np.roundbug in.star->.stardownsampling - fixing saving of inverse selection for interactive filtering with ET dataset outputs (#444)
- simplifying user interface for volume/image generation utilities
eval_volandeval_images - simplifying how output is saved for
graph_traversal - deprecating support for
.pklconfiguration files from previous cryoDRGN versions as well as theview_configutility command
v3.4.4: Support for Python v3.12, fixing batch iteration, analyzing convergence
This is a patch release to fix some issues discovered in batch iteration, such as that in some cases our reconstruction engines would run an extra batch in model pretraining due to while global_it < args.pretrain: instead of while global_it <= args.pretrain:. We have also fixed package dependency issues with the interactive filtering widget used in our analysis Jupyter notebook (#430, #431).
In addition to this, v3.4.4 includes some minor new features. We are introducing the --shuffle-seed argument to reconstruction commands such as train_vae and abinit_het to allow for more fine-grained control of the random number generator used by the particle batch sampler as opposed to the generator used by packages such as numpy and PyTorch for statistical operations. This allows for better reproducibility with past versions of cryoDRGN in which the statistical generator — but not the batch generator — could be controlled by the user. We have also now added support for Python v3.12 (and are no longer supporting v3.9, though it should still work for now in most cases) by fixing incompatibilities in our package with newer versions of packages such as matplotlib necessary for v3.12 (see e.g. updated use of cbar.draw_all()).
At this point we are removing support for Sphinx docs, which were cumbersome to maintain and update, and completing our migration to GitBook, where all the detailed documentation for cryoDRGN now resides!
Finally, we have added an alpha version of the analyze_convergence command, a tool we are still in the process of developing, for studying various properties of the reconstruction models. There will be more updates to this tool as we continue researching the properties of cryoDRGN when applied to various datasets.
Please let us know if you have any questions or concerns by posting to this pull request thread or creating an issue on our GitHub board.
v3.4.3: Making movies, improving filtering interface, and fixes to landscape analysis
This is a minor release in which we are introducing a new utility for making volume movies using model analysis results, as well as making some fixes and improvements to existing features:
New visualizations
There is a new command cryodrgn_utils make_movies that automatically searches through the output folders created by commands such as cryodrgn analyze and cryodrgn analyze_landscape and produces .mp4 movies of reconstructed volumes using ChimeraX (which must be installed separately). For example, if volumes corresponding to k-means clusters were produced by cryodrgn analyze ... --ksample 50, make_movies will add movie.mp4 under analyze.<epoch>/kmeans50/ with an animation across the fifty k-means volumes:
movie.mp4
See cryodrgn_utils make_movies -h for more details! We have also added some new types of plots (scree plots and grid plots of PCA components) to the landscape analysis Jupyter notebooks.
Improving interactive filtering
Thanks to some help and feedback from the folks at Doppio (see #425, #426) we improved the interface for the interactive particle filtering command cryodrgn filter by adding buttons for choosing to save the selection (or not) rather than requiring an additional query step through the command-line:
Addressing known issues
v3.4.2: AMP for ab-initio reconstruction; faster landscape analysis and pose parsing
In this patch release we have drastically improved the runtimes of several existing features, as well as addressed some known issues and bugs:
Improving Runtimes
- extended the use of mixed precision training (as implemented in torch.cuda.amp), already the default for
train_nnandtrain_vae, to the ab-initio reconstruction commandsabinit_homoandabinit_het, resulting in observed speedups of 2-4x - vectorized rotation matrix computation in
parse_pose_starfor a ~100x speedup of this step and a 2x speedup of the command as a whole (#143) - returned volume evaluation in
analyze_landscape_fullto the GPU resulting in 10x speedup (#405)
Fixing Known Issues
- incorrect batch processing causing out-of-memory issues when using chunked output in
downsample(#412) - error when using
--flipinanalyze_landscape_full(#409) parse_mrcbug in landscape analysis notebook (#413)
Please let us know if you have any feedback or comments!
v3.4.1: Support for float16-formatted input
This is a patch release to address some minor issues and improve compatibility of cryoDRGN with the default output number format used by the most recent versions of RELION:
- adding support for
np.float16format input .mrcs files, which are now cast tonp.float32as necessary for Fourier transform operations (#404) models.PositionalDecoder.eval_volume()now keeps volumes in GPU- better progress log messages in
backproject_voxel; improved control over logging using--log-intervalto match other reconstruction commands:
(INFO) (lattice.py) (03-Oct-24 10:52:54) Using circular lattice with radius=150
(INFO) (backproject_voxel.py) (03-Oct-24 10:52:55) fimage 0 — 0.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:54:02) fimage 200 — 4.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:55:10) fimage 400 — 8.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:56:18) fimage 600 — 12.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:57:26) fimage 800 — 16.0% done
filter_csto replacewrite_cs, which is now considered deprecated with a suitable warning message, and fixing issues with filtering .cs files produced by the most recent cryoSPARC versions (#150)- using 0.5 * 0.143-threshold of the “No Mask” FSC curve to start applying phase-randomization correction to the “Tight Mask” FSC curve instead of 0.75 * 0.143-threshold of the “Tight Mask” FSC curve when the tight mask curve never crosses the 0.143 threshold (previously defaulted to the Nyquist limit):
| v3.4.0 | v3.4.1 |
|---|---|
![]() |
![]() |
- fixing bug with relative output paths given to
cryodrgn downsample - addressing
grid_samplewarning messages concerning unspecifiedalign_cornerargument - extending
analyze_landscapeto accept non-binary masks, ensuring compatibility with e.g.cryodrgn_utils gen_mask - harmonizing use of
datadirin .cs files with use for .star files - better error and log messages for mask operations,
filter_pkl - fixing
do_pose_sgderror for interactive filtering - using virtual environments for release GitHub workflow actions
releaseandbeta_release, getting rid of unnecessarywheelupgrading
v3.4.0: Plotting class labels, RELION 3.1 support, and phase-randomization for FSCs
In this minor release we are adding several new features and commands, as well as expanding a few existing ones and introducing some key refactorings to the codebase to make these changes easier to implement.
New features
-
full support for RELION 3.1
.starfiles with optics values stored in a separate grouped table before or after the main table (#241, #40, #10)- refactored
Starfileclass now has properties.apixand.resolutionthat return particle-wise optics values for commonly used parameters, as well as methods.get_optics_values()and.set_optics_values()for any parameter- these methods automatically use the optics table if available
cryodrgn parse_ctf_starcan now load all particle-wise optics values from the .star file itself instead of the current behavior of relying upon user input for parameters such as A/px, resolution, voltage, spherical aberration, etc., or just taking the first value found in the file
- refactored
-
backproject_voxelnow computes FSC threshold values corrected for mask overfitting using high resolution phase randomization as done in cryoSPARC, as well as showing FSC curves and threshold values for various types of masks:

-
cryodrgn_utils plot_classesfor creating plots of cryoDRGN results colored by a given set of particle class labels-
for now, only creates 2D kernel density plots of the latent space embeddings clustered using UMAP and PCA, but more plots will be added in the future:
$ cryodrgn_utils plot_classes 002_train-vae_dim.256 9 --labels published_labels_major.pkl --palette viridis --svg
analyze.9/umap_kde_classes.png
-
Improvements to existing features
-
backproject_voxelalso now creates a new directory using-o/--outdirinto which it places output files, instead of naming all files after the output reconstructed volume-o/--outfile- files within this directory will always have the same names across runs:
backproject.mrcthe full reconstructed volumehalf_map_a.mrc,half_map_b.mrcreconstructed half-maps using an odd/even particle splitfsc-vals.txtall five FSC curves in space-delimited formatfsc-plot.pnga plot of these five FSC curves as shown above
- files within this directory will always have the same names across runs:
-
downsamplecan now downsample each of the individual files in a stack referenced by a .star or .txt file, returning a new .star file or .txt file referencing the new downsampled stack- used by specifying a .star or .txt file as
-o/--outfilewhen using a .star or .txt file as input:cryodrgn downsample my_particle_stack.star -D 128 -o particles.128.star --datadir folder_with_subtilts/ --outdir my_new_datadir/
- used by specifying a .star or .txt file as
-
cryodrgn_utils fsccan now take three volumes as input, in which case the first volume will be used to generate masks to produce cryoSPARC-style FSC curve plots including phase randomization for the “tight” mask (see New features above) -
cryodrgn_utils plot_fscis now more flexible with the types of input files it can accept for plotting, including.txtfiles with the new type of cryoSPARC-style FSC curve output frombackproject_voxel -
cryodrgn filter --forcefor less interactivity after the selection has been made -
filter_mrcsprints both original and new number of particles; generates output file name automatically if not given -
cryodrgn abinit_hetsavesconfigsalongside model weights inweights.pklfor easier access and output checkpoint identification
Addressing bugs and other issues
- better axis labels for FSC plotting, passing Apix values from
backproject_voxel(#385) cryodrgn filterdoesn’t show particle indices in hover text anymore, as this proved visually distracting; we now show these indices in a text box in the corner of the plotcryodrgn filtersaves chosen indices as anp.arrayinstead of Python standardlistto prevent type issues in downstream analysescommands_utils.translate_mrcswas not working (was assumingparticles.images()returned a numpy array instead of a torch Tensor) — this has been fixed and tests added for translations of image stacks- going back to listing modules to be included in the
cryodrgnandcryodrgn_utilscommand line interfaces explicitly, as Python will sometimes install older modules into the corresponding folders which confuses automated scanning for command modules - fixing parsing of 8bit and 16bit .mrc files produced using e.g.
--outmode=int8in EMAN2 (#113) - adding support and continuous integration testing for Python 3.11
Refactoring classes that parse input files
There were some updates we wanted to make to the ImageSource class and its children which was introduced in a refactoring of the processes used to load and parse input datasets in v3.0.0. We also sought to simplify and clean up the code in the methods used to parse .star file and .mrcs file data in cryodrgn.starfile and cryodrgn.mrc respectively.
-
the code for the
ImageSourcebase class and its children classes incryodrgn.sourcehave been cleaned up to improve code style, remove redundancies, and support theStarfileandmrcfilerefactorings described below- more consistent and sensible parsing of filenames with
datadirfor_MRCDataFrameSourceclasses such asTxtFileSourceandStarfileSource(#386)- all of this logic is now contained in a new method
_MRCDataFrameSource.parse_filenamewhich is applied in__init__:- If the
filenameby itself points to a file that exists, usefilename. - Otherwise, if
os.path.join(datadir, newname)exists, use that. - Finally, try
os.path.join(datadir, os.path.basename(newname)). - If that doesn’t exist, throw an error!
- If the
- all of this logic is now contained in a new method
- adding
ImageSource.orig_nattribute which is often useful for accessing the original number of particles in the stack before filtering was applied - adding
ImageSource.write_mrc(), to avoid having to useMRCFile.write()forImageSourceobjects;MRCFile.write()use case for arrays has been replaced bymrcfile.write_mrc(see below)- see use in a refactored
cryodrgn downsamplefor batch writing to.mrcoutput
- see use in a refactored
- adding
MRCFileSource.write(), a wrapper formrcfile.write_mrc() - adding
MRCFileSource.apixproperty for convenient access to header metadata - getting rid of
ArraySource, whose behavior can be subsumed intoImageSourcewithlazy=False - improving error messages in
ImageSource.from_file(),._convert_to_ndarray(),images() ImageSource.lazyis now a property, not an attribute, and is dynamically dependent on whetherself.datahas actually been loaded or not- adding
_MRCDataFrameSource.sourcesconvenience iterator property StarfileSourcenow inherits directly from theStarfileclass (as well as_MRCDataFrameSource) for better access to .star utilities than using aStarfileobject as an attribute (.dfin the old v3.3.3 class)
- more consistent and sensible parsing of filenames with
-
.star file methods have been refactored to establish three clear ways of accessing and manipulating .star data for different levels of features, with RELION3.1 operations now implemented in
Starfileclass methods:-
cryodrgn.starfile.parse_starandwrite_starto get and perform simple operations on the main data table and/or the optics table
e.g. infilter_star:stardf, data_optics = parse_star(args.input) ... write_star(args.o, data=filtered_df, data_optics=new_optics)
-
cryodrgn.starfile.Starfilefor access to .star file utilities like generating optics values for each particle in the main data table using parameters saved in the optics table
e.g. inparse_ctf_star:stardata = Starfile(args.star) logger.info(f"{len(stardata)} particles") apix = stardata.apix resolution = stardata.resolution ... ctf_params[:, i + 2] = ( stardata.get_optics_values(header) if header not in overrides else overrides[header] )
-
cryodrgn.source.StarfileSourcefor access to .star file utilities along with access to the images themselves usingImageSourcemethods like.images() -
see our more detailed write-up for more information:
Starfile Refactor
-
-
for .mrc files, we removed
MRCFileas there are no analogues presently for the kinds of methods supported byStarfile; the operations on the image array requiring data from the image header are presently contained withinMRCFileSource, reflecting the fact that .mrcs files are the image data themselves and not pointers to other files containing the dataMRCFile, which consisted solely of staticparseandwritemethods, has been replaced by the old names of these methods (parse_mrcandwrite_mrc)MRCFile.write(out_mrc, vol)→write_mrc(out_mrc, vol)- in the case of when
volis anImageSourceobject, we now doImageSource.write_mrc()
- in general,
parse_mrcandwrite_mrcare for using the entire image stack as an array, whileMRCFileSourceis for accessing batches of images as tensors mrcmodule is now namedmrcfilefor better verbosity and to matchstarfilemodule which ...
v3.3.3: RELION3.1 .star filtering, interactive tilt series filtering, and fixes to backprojection
This patch release fixes several outstanding issues:
- the
--ntiltsargument tobackproject_voxeldid not do anything, and all tilts were always used; this flag now behaves as expected #379 cryodrgn_utils filter_starnow includes the (filtered) input optics table in the output if present in the input #370cryodrgn filternow accepts experiment outputs using tilt series particles #335- fixing a numerical rounding bug showing up in transformations to poses used by
backproject_voxel#380
We have also done more work to consolidate and expand our CI testing suite, with all of the pytest tests under tests/ now using new data loading fixtures that allow for tests to be run in parallel using pytest-xdist. Datasets used in testing have also been moved from testing/data/ to tests/data/ to reflect that the old tests run using command-line under the former are now deprecated and are being replaced and rewritten as pytest tests in the latter folder.
Finally, we removed some remaining vestiges of the old way of handling large datasets difficult to fit into memory via cryodrgn preprocess (#348) as well as improving the docstrings for several modules.


