Fifth development release
We are pleased to announce the release of Thunder 0.5.0. This release introduces several new features, including a new framework for image registration algorithms, performance improvements for core data conversions, improved EC2 deployment, and many bug fixes. This release requires Spark 1.1.0 or later, and is compatible with the most recent Spark release, 1.3.0.
Major features
- A new image registration API inside the new
thunder.imgprocessingpackage. See the tutorial. - Significant performance improvements to the
ImagestoSeriesconversion, including aBlocksobject as an intermediate stage. The inverse conversion, fromSeriesback toImages, is now supported. - Support for tiff image files as an input format has been expanded and made more robust. Multiple image volumes can now be read from a single input file via the nplanes argument in the loading functions, and files can be read from nested directory trees using the
recursive=Trueflag. - New methods for working with mutli-level indexing on
Seriesobjects, includingselectByIndexandseriesStatByIndex, see the tutorial. - Convenient new getter methods for extracting Individual records or small sets of records using bracket notation, as in
Series[(x,y,z)]orImages[k]. - A new
serializabledecorator to make it easy to save/load small objects (e.g. models) to JSON, including handling of numpy arrays. See saving/loading ofRegistrationModelfor an example.
Minor features
- Parameter files can be loaded from a file with simple JSON schema (useful for working with covariates), using
ThunderContext.loadParams - A new method
ThunderContext.setAWSCredentialshandles AWS credential settings in managed cluster environments (where it may not be possible to modify system config files) - An Images object can be saved to a collection of binary files using
Images.saveAsBinaryImages - Data objects now have a consistent
__repr__method, displaying uniform and informative results when these objects are printed. - Images and Series objects now each offer a
meanByRegions()method, which calculates a mean over one or more regions specified either by a set of indices or a mask image. - TimeSeries has a new
convolve()method. - The
thunderandthunder-submitexecutables have been modified to better expose the options available in the underlyingpysparkandspark-submitSpark executable scripts. - An improved and streamlined
Colorizewith new colorization options. - Load data hosted by the Open Connectome Project with the
loadImagesOCPmethod. - New example data sets available, both for local testing and on S3
- New tutorials: regression, image registration, multi-level indexing
Transition guide
- Some keyword parameters have been changed for consistency with the Thunder style guide naming conventions. Example are
inputformat,startidx, andstopidxparameters on the ThunderContext loading methods, which are nowinputFormat,startIdx, andstopIdx, respectively. We expect minimal future changes in existing method and parameter names. - The Series methods
normalize()anddetrend()have been moved to TimeSeries objects, which can be created by theSeries.toTimeSeries()method. - The default file extension for the binary
stackformat is nowbininstead ofstack. If you need to load files with thestackextension, you can use theext='stack'keyword argument ofloadImages. exportis now a method on theThunderContextinstead of a standalone function, and now supports exporting to S3.- The
loadImagesAsSeriesandconvertImagesToSeriesmethods onThunderContextnow default toshuffle=True, making use of a revised execution path that should improve performance. - The method for loading example data has been renamed from
loadExampleEC2toloadExampleS3
Deployment and development
- Anaconda is now the default Python installation on EC2 deployments, as well as on our Travis server for testing.
- EC2 scripts and unit tests provide quieter and prettier status outputs.
- Egg files now included with official releases, so that a pip install of thunder-python can immediately be deployed on a cluster without cloning the repo and building an egg.
Contributions:
- Andrew Osheroff (data getter improvements)
- Ben Poole (optimized window normalization, image registration)
- Jascha Swisher (images to series conversion, serializable class, tif handling, get and meanBy methods, bug fixes)
- Jason Wittenbach (new series indexing functionality, regression and indexing tutorials, bug fixes)
- Jeremy Freeman (image registration, EC2 deployment, exporting, colorizing, bug fixes)
- Kunal Lillaney (loading from OCP)
- Michael Broxton (serializable class, new series statistics, improved EC2 deployment)
- Noah Young (improved EC2 deployment)
- Tom Sainsbury (image filtering, PNG saving options)
- Uri Laseron (submit scripts, Hadoop versioning)
Roadmap
Moving forward we will do a code freeze and cut a release every three months. The next will be June 30th.
For 0.6.0 we will focus on the following components:
- A source extraction / segmentation API
- New capabilities for regression and GLM model fitting
- New image registration algorithms (including volumetric methods)
- Latent factor and network models
- Improved performance on single-core workflows
- Bug fixes and performance improvements throughout
If you are interested in contributing, let us know! Check out the existing issues or join us in the chatroom.