Third development release
This update adds new functionality for loading data, alongside changes to the API for loading, and a variety of smaller bug fixes.
API changes
- All data loading is performed through the new Thunder Context, a thin wrapper for a Spark Context. This context is automatically created when starting thunder, and has methods for loading data from different input sources.
tsc.loadTextbehaves identically to theloadfrom previous versions.- Example data sets can now be loaded from
tsc.makeExample,tsc.loadExample, andtsc.loadExampleEC2. - Output of the
packoperation now preserves xy definition, but outputs will be transposed relative to previous versions.
New features
- Include design matrix with example data set on EC2
- Faster
nmfimplementation by changing update equation order (#15) - Support for loading local MAT files into RDDs through
tsc.loadMatLocal - Preliminary support for loading binary files from HDFS using
tsc.loadBinary(depends on features currently only available in Spark's master branch)