|
| 1 | +============================== Release Notes: v0.95 ============================== |
| 2 | +Support for new training algorithms: |
| 3 | + - Generative Adversarial Networks (GAN) |
| 4 | + |
| 5 | +Support for new network structures: |
| 6 | + - Variational Autoencoders |
| 7 | + - GAN |
| 8 | + - CycleGAN |
| 9 | + - Combined Autoencoders with CycleGAN |
| 10 | + - Deep Recurrent Attention Model (DRAM), Ba et al. (2015) |
| 11 | + - Video Recurrent Attention Model (VRAM) |
| 12 | + |
| 13 | +Support for new layers: |
| 14 | + - Optimized Top-K accuracy (CPU, GPU) |
| 15 | + - Crop (CPU, GPU) |
| 16 | + - Sort (CPU, GPU) both ascending and descending order |
| 17 | + - Absolute value (CPU, GPU) |
| 18 | + - Mean-squared (CPU, GPU) |
| 19 | + - Top-K categorical accuracy (CPU, GPU) |
| 20 | + - Cross-entropy (CPU, GPU) |
| 21 | + - Stop gradient (CPU, GPU) |
| 22 | + |
| 23 | +Performance optimizations: |
| 24 | + - Use Pinned memory for CPU activations matrices |
| 25 | + - Non-blocking GPU computation of objective functions and metrics |
| 26 | + - Refactored weight matrices and weight initialization |
| 27 | + - Manage GPU workspace buffers with memory pool |
| 28 | + - Slice and concatenation layer emit matrix views if possible |
| 29 | + - Used more fine-grained asynchronous calls when using Aluminum Library |
| 30 | + - Minimized GPU stream synchronization events per call |
| 31 | + - Improved / minimized synchronization events when using a single GPU |
| 32 | + - Fixed GPU workspace size |
| 33 | + - GPU implementation of Adagrad optimizer |
| 34 | + - GPU model-parallel softmax |
| 35 | + - Optimized local CUDA kernel implementations |
| 36 | + - Support for distributed matrices with arbitrary alignment |
| 37 | + |
| 38 | +Model portability & Usability: |
| 39 | + - Keras to LBANN prototext conversion tool |
| 40 | + |
| 41 | +Internals Features: |
| 42 | + - Support for multiple objective functions and metrics per network with arbitrary placement |
| 43 | + - Objective functions represented as layers |
| 44 | + - Metrics represented as layers |
| 45 | + - Introduced evaluation layer construct |
| 46 | + - Ability to freeze specific layers for pre-training / fine-tuning |
| 47 | + - Refactoring tensor setup in setup, forward prop, and back prop |
| 48 | + - Layers store matrices in private smart pointers |
| 49 | + - Model automatically inserts evaluation layers where needed |
| 50 | + - Copy Layer activations between models |
| 51 | + - Annotated GPU profiling output with training phases |
| 52 | + - Fixed initialization of Comm object and Grid objects when using multiple models |
| 53 | + - General code cleanup, refactoring, and various bug fixes. |
| 54 | + - All layers overwrite error signal matrices |
| 55 | + - NCCL backend is now implemented via Aluminum Library |
| 56 | + - MPI calls are routed through the LBANN Comm object into Hydrogen or Aluminum |
| 57 | + - Provide runtime statistics summary from every rank |
| 58 | + - Reworked LBANN to use Hydrogen to manage GPU memory |
| 59 | + - GPU allocations now via CUB memory pool |
| 60 | + - Fixed Spack build interaction with Hydrogen Library |
| 61 | + |
| 62 | +I/O & data readers: |
| 63 | + - Support for Conduit objects with HDF5 formatting |
| 64 | + - In-memory and locally offloaded data store |
| 65 | + - Data Store can hold the entire training set in memory (or node-local storage) |
| 66 | + - Data store will shuffle data samples between epochs and present samples to input layer |
| 67 | + - Updated synthetic data reader |
| 68 | + - Modified data readers to handle bad samples in JAG conduit data |
| 69 | + - Reworked the I/O layers (input and target) so that the input layer produces both the |
| 70 | + sample and label / response if necessary. |
| 71 | + - Target layer is being deprecated |
| 72 | + - Updated image data reader to use cv::imdecode to accelerate image load times |
| 73 | + - Allow users to specify an array of data sources for the independent/dependent |
| 74 | + variables via prototext |
| 75 | + |
| 76 | +============================== Release Notes: v0.94 ============================== |
| 77 | +Support for new training algorithms: |
| 78 | + - Back-Propagation Through Time (BPTT) |
| 79 | + -- Recurrent Neural Networks (RNN) |
| 80 | + -- Long Short-Term Memories (LSTM) |
| 81 | + - Generative Adversarial Networks (GAN) |
| 82 | + - Variational autoencoders |
| 83 | + - Convolutional autoencoders |
| 84 | + - Fine tuning of pretrained networks |
| 85 | + -- Flexible weight freezing |
| 86 | + - Context-prediction network (Siamese network) |
| 87 | + - Livermore Tournament Fast Batch learning (LTFB) |
| 88 | + - Variable mini-batch sizes |
| 89 | + |
| 90 | +Support for new network structures |
| 91 | + - Directed Acyclic Graph (DAG) networks |
| 92 | + - Residual networks |
| 93 | + - Modular and composable objective functions |
| 94 | + - Multiple metrics |
| 95 | + - Shared weight matrices |
| 96 | + - (BETA) New evaluation layer that is attach to any point of DAG |
| 97 | + - Motifs (compound, reused network patterns) |
| 98 | + |
| 99 | +Support for new layers: |
| 100 | + - Learning: |
| 101 | + - Deconvolution |
| 102 | + - Metrics: |
| 103 | + -- Top K Categorical accuracy, Pearson correlation, Mean absolute deviation |
| 104 | + - Loss Functions: |
| 105 | + -- Cross Entropy with Uncertainty, Geometric negative log likelihood |
| 106 | + -- Poisson Negative log likelihood, Polya Negative Log Likelihood |
| 107 | + - Optimizers: |
| 108 | + -- Hypergradient Adam |
| 109 | + - Transform Layers: |
| 110 | + -- Contatenation, Noise, Unpooling, Pooling, Reshape, Slice, Split, Sum |
| 111 | + - Regularizer: |
| 112 | + -- Batch Normalization, Selu Dropout, Local Response Normalization (LRN) |
| 113 | + - Activations: |
| 114 | + -- Leaky Relu, Smooth Relu, Elu, Scaled Elu, Softplus, Atan, |
| 115 | + -- Bent Identity, Exponential |
| 116 | + |
| 117 | +Performance optimizations: |
| 118 | + - GPU acceleration for most layers |
| 119 | + - NCCL 2.X |
| 120 | + - Optimized communication patterns |
| 121 | + - Asynchronous weight updates |
| 122 | + - Asynchronous metric and objective function updates |
| 123 | + - batch normalization (global and local) |
| 124 | + - L2 normalization |
| 125 | + - Adaptive Quantization (inter-model) |
| 126 | + |
| 127 | +Model portability & usability: |
| 128 | + - Portable checkpoints / recovery |
| 129 | + - Distributed checkpoint / recovery |
| 130 | + - Network visualization |
| 131 | + - Export LBANN to TensorFlow format |
| 132 | + |
| 133 | +Internals Features: |
| 134 | + - Gradient checking |
| 135 | + - Network representation using tensor dimensions |
| 136 | + - Bamboo continuous integration (CI) |
| 137 | + - Improved data processing pipeline |
| 138 | + |
| 139 | +New data readers: |
| 140 | + - Numpy |
| 141 | + - CSV |
| 142 | + - Methods for merging multiple features and samples across files |
| 143 | + - CANDLE Pilot 2 |
| 144 | + - CANDLE Pilot 1 Combo |
| 145 | + - ICF JAG |
| 146 | + |
| 147 | +Integration with Hydrogen, an optimized distributed, dense linear algebra |
| 148 | +library. Hydrogen is a fork of the Elemental library. Hydrogen optimizes for: |
| 149 | +distributed matrices with elemental and block distributions, BLAS, LAPACK, |
| 150 | +distributed and local matrix management. |
| 151 | + |
| 152 | +Integration with optimized all-reduce communication library Aluminum. Aluminum |
| 153 | +provides custom reduction patterns, customized CUDA reduction kernels, |
| 154 | +and asynchronous communication operators. It uses MPI, MPI w/GPUdirect, or NCCL |
| 155 | +as back-end libraries. Aluminum enables us to effectively use non-blocking |
| 156 | +all-reduces during backprop/optimization |
| 157 | + |
| 158 | +Additionally, we have added support for an online, distributed data store. When |
| 159 | +enabled, LBANN is able to ingest all of the training data set in a distributed |
| 160 | +method across all ranks. Each data store is then able to serve it's portion of |
| 161 | +a mini-batch, dynamically moving data to the necessary ranks in the model (based |
| 162 | +on the mini-batch data distribution). |
| 163 | + |
| 164 | +============================== Release Notes: v0.93 ============================== |
| 165 | +This release contains a major refactoring / overhaul of the code base. |
| 166 | +Key highlights include: |
| 167 | +- Moving layer design into smaller simpler layers that have a single |
| 168 | + compute behavior per layer. Specifically, linear combination of the |
| 169 | + inputs, non-linear activations, and regularizers now exist as their |
| 170 | + own layers. |
| 171 | +- Layers now have a template parameter that specifies the data layout |
| 172 | + for the distributed matrices. |
| 173 | +- Prototext interface for specifying neural network models and data |
| 174 | + readers is nearly fully functional. |
| 175 | +- Code now adheres to internal coding style as outlined in |
| 176 | + README_coding_style.txt |
| 177 | +- Dead-code has been eliminated and layer file hierarchy has been |
| 178 | + cleaned up. |
| 179 | + |
| 180 | +============================== Release Notes: v0.92 ============================== |
| 181 | +New features include (but are not limited to): |
| 182 | + - Full support for convolutional and pooling layers |
| 183 | + - GPU acceleration of local Elemental GEMM operations |
| 184 | + - Improved network and data reader support |
| 185 | + -- Alexnet |
| 186 | + -- VGG |
| 187 | + -- CIFAR-10 |
| 188 | + - Added a suite of regularizers, objective functions, and metrics, including: |
| 189 | + -- Batch normalization |
| 190 | + -- Drop-out |
| 191 | + -- L2 |
| 192 | + - Dramatically improves the performance of inter-model communication |
| 193 | + - Added suite of image prepossessing routines |
| 194 | + |
| 195 | +============================== Release Notes: v0.91 ============================== |
| 196 | +Incorporates a number of changes through the LBANN code base. In |
| 197 | +particular there is a new build system that tries to have LBANN |
| 198 | +download all of the dependencies into its build tree, and compile them |
| 199 | +locally. Additional improvements include optimizations in the data |
| 200 | +parallel, multiple model training framework, support for convolutional |
| 201 | +layers, and general bug fixes. |
| 202 | + |
| 203 | +============================== Release Notes: v0.90 ============================== |
| 204 | +Initial release of the LBANN toolkit. |
0 commit comments