Skip to content

Commit ffecbef

Browse files
committed
Merge branch 'release-v0.95'
============================== Release Notes: v0.95 ============================== Support for new training algorithms: - Generative Adversarial Networks (GAN) Support for new network structures: - Variational Autoencoders - GAN - CycleGAN - Combined Autoencoders with CycleGAN - Deep Recurrent Attention Model (DRAM), Ba et al. (2015) - Video Recurrent Attention Model (VRAM) Support for new layers: - Optimized Top-K accuracy (CPU, GPU) - Crop (CPU, GPU) - Sort (CPU, GPU) both ascending and descending order - Absolute value (CPU, GPU) - Mean-squared (CPU, GPU) - Top-K categorical accuracy (CPU, GPU) - Cross-entropy (CPU, GPU) - Stop gradient (CPU, GPU) Performance optimizations: - Use Pinned memory for CPU activations matrices - Non-blocking GPU computation of objective functions and metrics - Refactored weight matrices and weight initialization - Manage GPU workspace buffers with memory pool - Slice and concatenation layer emit matrix views if possible - Used more fine-grained asynchronous calls when using Aluminum Library - Minimized GPU stream synchronization events per call - Improved / minimized synchronization events when using a single GPU - Fixed GPU workspace size - GPU implementation of Adagrad optimizer - GPU model-parallel softmax - Optimized local CUDA kernel implementations - Support for distributed matrices with arbitrary alignment Model portability & Usability: - Keras to LBANN prototext conversion tool Internals Features: - Support for multiple objective functions and metrics per network with arbitrary placement - Objective functions represented as layers - Metrics represented as layers - Introduced evaluation layer construct - Ability to freeze specific layers for pre-training / fine-tuning - Refactoring tensor setup in setup, forward prop, and back prop - Layers store matrices in private smart pointers - Model automatically inserts evaluation layers where needed - Copy Layer activations between models - Annotated GPU profiling output with training phases - Fixed initialization of Comm object and Grid objects when using multiple models - General code cleanup, refactoring, and various bug fixes. - All layers overwrite error signal matrices - NCCL backend is now implemented via Aluminum Library - MPI calls are routed through the LBANN Comm object into Hydrogen or Aluminum - Provide runtime statistics summary from every rank - Reworked LBANN to use Hydrogen to manage GPU memory - GPU allocations now via CUB memory pool - Fixed Spack build interaction with Hydrogen Library I/O & data readers: - Support for Conduit objects with HDF5 formatting - In-memory and locally offloaded data store - Data Store can hold the entire training set in memory (or node-local storage) - Data store will shuffle data samples between epochs and present samples to input layer - Updated synthetic data reader - Modified data readers to handle bad samples in JAG conduit data - Reworked the I/O layers (input and target) so that the input layer produces both the sample and label / response if necessary. - Target layer is being deprecated - Updated image data reader to use cv::imdecode to accelerate image load times - Allow users to specify an array of data sources for the independent/dependent variables via prototext
2 parents ca499c9 + 46cf7ba commit ffecbef

File tree

559 files changed

+39363
-19206
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

559 files changed

+39363
-19206
lines changed

CMakeLists.txt

Lines changed: 42 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ if (GIT_REPO)
5454
set(${UPPER_PROJECT_NAME}_VERSION ${GIT_VERSION}
5555
CACHE STRING "LBANN's version string")
5656
else ()
57-
set(${UPPER_PROJECT_NAME}_VERSION v0.94
57+
set(${UPPER_PROJECT_NAME}_VERSION v0.95
5858
CACHE STRING "LBANN's version string")
5959
endif (GIT_REPO)
6060

@@ -72,8 +72,6 @@ option(${UPPER_PROJECT_NAME}_WARNINGS_AS_ERRORS
7272

7373
option(${UPPER_PROJECT_NAME}_WITH_CUDA "Include Nvidia CUDA" OFF)
7474

75-
option(${UPPER_PROJECT_NAME}_WITH_NCCL "Include Nvidia NCCL2" OFF)
76-
7775
option(${UPPER_PROJECT_NAME}_WITH_CUDNN "Include Nvidia cuDNN" ON)
7876

7977
option(${UPPER_PROJECT_NAME}_WITH_CNPY "Include cnpy" ON)
@@ -89,7 +87,7 @@ option(${UPPER_PROJECT_NAME}_WITH_NVPROF
8987
option(${UPPER_PROJECT_NAME}_WITH_TOPO_AWARE
9088
"Enable topology-aware profiling (HWLOC)" ON)
9189

92-
option(${UPPER_PROJECT_NAME}_WITH_ALUMINUM
90+
option(${UPPER_PROJECT_NAME}_WITH_ALUMINUM
9391
"Enable Aluminum all-reduce library" OFF)
9492

9593
option(${UPPER_PROJECT_NAME}_WITH_CONDUIT
@@ -134,14 +132,10 @@ endif ()
134132
set(LBANN_TOPO_AWARE ${${UPPER_PROJECT_NAME}_WITH_TOPO_AWARE})
135133

136134
# Enable parallel random matrix generation, if possible
137-
if (${UPPER_PROJECT_NAME}_SEQUENTIAL_INITIALIZATION)
138-
set(LBANN_SEQUENTIAL_CONSISTENCY TRUE)
139-
set(LBANN_PROCDET_DROPOUT TRUE)
140-
set(LBANN_PARALLEL_RANDOM_MATRICES FALSE)
135+
if (${UPPER_PROJECT_NAME}_DETERMINISTIC)
136+
set(LBANN_DETERMINISTIC TRUE)
141137
else()
142-
set(LBANN_SEQUENTIAL_CONSISTENCY FALSE)
143-
set(LBANN_PROCDET_DROPOUT FALSE)
144-
set(LBANN_PARALLEL_RANDOM_MATRICES TRUE)
138+
set(LBANN_DETERMINISTIC FALSE)
145139
endif ()
146140

147141
#
@@ -170,6 +164,29 @@ include(SetupElemental)
170164
find_package(OpenCV REQUIRED)
171165
set(LBANN_HAS_OPENCV ${OpenCV_FOUND})
172166

167+
if (LBANN_WITH_ALUMINUM)
168+
find_package(Aluminum)
169+
set(LBANN_HAS_ALUMINUM ${Aluminum_FOUND})
170+
if (NOT LBANN_HAS_ALUMINUM)
171+
message(FATAL_ERROR
172+
"Requested LBANN_WITH_ALUMINUM but Aluminum not found. "
173+
"Aluminum is now disabled. "
174+
"Try specifying ALUMINUM_DIR as the root of an ALUMINUM install. "
175+
"Alternatively, build with LBANN_WITH_ALUMINUM=OFF.")
176+
set(LBANN_WITH_ALUMINUM OFF)
177+
endif(NOT LBANN_HAS_ALUMINUM)
178+
179+
if (AL_HAS_CUDA AND NOT LBANN_WITH_CUDA)
180+
message(WARNING
181+
"Aluminum has CUDA but LBANN is configured with LBANN_WITH_CUDA=OFF")
182+
endif ()
183+
184+
option(LBANN_BUILT_WITH_SPECTRUM "LBANN was built with Spectrum MPI" OFF)
185+
if (LBANN_BUILT_WITH_SPECTRUM)
186+
set(LBANN_ALUMINUM_MPI_PASSTHROUGH ON)
187+
endif (LBANN_BUILT_WITH_SPECTRUM)
188+
endif (LBANN_WITH_ALUMINUM)
189+
173190
# Setup some additional CUDA-y things
174191
if (LBANN_HAS_CUDA)
175192
if (NOT LBANN_WITH_CUDNN)
@@ -184,18 +201,12 @@ if (LBANN_HAS_CUDA)
184201

185202
set(LBANN_HAS_CUDNN ${CUDNN_FOUND})
186203

187-
if (LBANN_WITH_NCCL)
188-
find_package(NCCL 2.0.0 REQUIRED)
189-
set(LBANN_HAS_NCCL2 ${NCCL_FOUND})
190-
if (NOT LBANN_HAS_NCCL2)
191-
message(FATAL_ERROR
192-
"Requested LBANN_WITH_NCCL but NCCL not found. "
193-
"NCCL is now disabled. "
194-
"Try specifying NCCL_DIR as the root of a NCCL install. "
195-
"Alternatively, build with LBANN_WITH_NCCL=OFF.")
196-
set(LBANN_WITH_NCCL OFF)
197-
endif (NOT LBANN_HAS_NCCL2)
198-
endif (LBANN_WITH_NCCL)
204+
if (LBANN_HAS_ALUMINUM AND AL_HAS_NCCL)
205+
set(LBANN_HAS_NCCL2 TRUE)
206+
else ()
207+
set(LBANN_HAS_NCCL2 FALSE)
208+
endif ()
209+
199210
endif (LBANN_HAS_CUDA)
200211

201212
# This shouldn't be here, but is ok for now. This will occasionally be
@@ -219,6 +230,10 @@ if (LBANN_WITH_VTUNE)
219230
include(SetupVTune)
220231
endif ()
221232

233+
if (LBANN_WITH_NVPROF)
234+
set(LBANN_NVPROF TRUE)
235+
endif ()
236+
222237
if (LBANN_WITH_CNPY)
223238
find_package(CNPY)
224239
set(LBANN_HAS_CNPY ${CNPY_FOUND})
@@ -246,23 +261,6 @@ if (LBANN_TOPO_AWARE)
246261
endif (NOT HWLOC_FOUND)
247262
endif (LBANN_TOPO_AWARE)
248263

249-
if (LBANN_WITH_ALUMINUM)
250-
find_package(ALUMINUM)
251-
set(LBANN_HAS_ALUMINUM ${ALUMINUM_FOUND})
252-
if (NOT LBANN_HAS_ALUMINUM)
253-
message(FATAL_ERROR
254-
"Requested LBANN_WITH_ALUMINUM but Aluminum not found. "
255-
"Aluminum is now disabled. "
256-
"Try specifying ALUMINUM_DIR as the root of an ALUMINUM install. "
257-
"Alternatively, build with LBANN_WITH_ALUMINUM=OFF.")
258-
set(LBANN_WITH_ALUMINUM OFF)
259-
endif(NOT LBANN_HAS_ALUMINUM)
260-
option(LBANN_BUILT_WITH_SPECTRUM "LBANN was built with Spectrum MPI" OFF)
261-
if (LBANN_BUILT_WITH_SPECTRUM)
262-
set(LBANN_ALUMINUM_MPI_PASSTHROUGH ON)
263-
endif (LBANN_BUILT_WITH_SPECTRUM)
264-
endif (LBANN_WITH_ALUMINUM)
265-
266264
if (LBANN_WITH_CONDUIT)
267265
find_package(CONDUIT)
268266
set(LBANN_HAS_CONDUIT ${CONDUIT_FOUND})
@@ -276,7 +274,7 @@ if (LBANN_WITH_CONDUIT)
276274
endif (LBANN_WITH_CONDUIT)
277275

278276
# Handle the documentation
279-
add_subdirectory(doc)
277+
add_subdirectory(docs)
280278

281279
################################################################
282280
# Build LBANN
@@ -322,7 +320,7 @@ if (LBANN_TOPO_AWARE)
322320
endif ()
323321

324322
if (LBANN_HAS_ALUMINUM)
325-
target_link_libraries(lbann PUBLIC ALUMINUM::ALUMINUM)
323+
target_link_libraries(lbann PUBLIC ${Aluminum_LIBRARIES})
326324
endif ()
327325

328326
if (LBANN_HAS_CONDUIT)
@@ -334,8 +332,7 @@ endif ()
334332
if (LBANN_HAS_CUDA)
335333
target_link_libraries(lbann PUBLIC ${CUDA_LIBRARIES})
336334
target_link_libraries(lbann PUBLIC cuda::toolkit)
337-
if (WITH_NVPROF)
338-
add_definitions(-DLBANN_NVPROF)
335+
if (LBANN_WITH_NVPROF)
339336
target_link_libraries(lbann PUBLIC ${NVTX_LIBRARIES})
340337
endif ()
341338
target_link_libraries(lbann PUBLIC ${cuBLAS_LIBRARIES})
@@ -360,7 +357,6 @@ target_link_libraries(lbann PUBLIC ${DL_LIBRARY})
360357
# Add the rest of the things
361358
add_subdirectory(model_zoo)
362359
add_subdirectory(model_zoo/tests)
363-
add_subdirectory(model_zoo/historical)
364360
add_subdirectory(tests)
365361

366362
################################################################
@@ -429,6 +425,7 @@ message(" LBANN_HAS_PROTOBUF: ${LBANN_HAS_PROTOBUF}")
429425
message(" LBANN_HAS_CNPY: ${LBANN_HAS_CNPY}")
430426
message(" LBANN_HAS_TBINF: ${LBANN_HAS_TBINF}")
431427
message(" LBANN_HAS_VTUNE: ${LBANN_HAS_VTUNE}")
428+
message(" LBANN_NVPROF: ${LBANN_NVPROF}")
432429
message(" LBANN_HAS_DOXYGEN: ${LBANN_HAS_DOXYGEN}")
433430
message(" LBANN_HAS_LBANN_PROTO:${LBANN_HAS_LBANN_PROTO}")
434431
message(" LBANN_HAS_ALUMINUM: ${LBANN_HAS_ALUMINUM}")

ReleaseNotes.txt

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
============================== Release Notes: v0.95 ==============================
2+
Support for new training algorithms:
3+
- Generative Adversarial Networks (GAN)
4+
5+
Support for new network structures:
6+
- Variational Autoencoders
7+
- GAN
8+
- CycleGAN
9+
- Combined Autoencoders with CycleGAN
10+
- Deep Recurrent Attention Model (DRAM), Ba et al. (2015)
11+
- Video Recurrent Attention Model (VRAM)
12+
13+
Support for new layers:
14+
- Optimized Top-K accuracy (CPU, GPU)
15+
- Crop (CPU, GPU)
16+
- Sort (CPU, GPU) both ascending and descending order
17+
- Absolute value (CPU, GPU)
18+
- Mean-squared (CPU, GPU)
19+
- Top-K categorical accuracy (CPU, GPU)
20+
- Cross-entropy (CPU, GPU)
21+
- Stop gradient (CPU, GPU)
22+
23+
Performance optimizations:
24+
- Use Pinned memory for CPU activations matrices
25+
- Non-blocking GPU computation of objective functions and metrics
26+
- Refactored weight matrices and weight initialization
27+
- Manage GPU workspace buffers with memory pool
28+
- Slice and concatenation layer emit matrix views if possible
29+
- Used more fine-grained asynchronous calls when using Aluminum Library
30+
- Minimized GPU stream synchronization events per call
31+
- Improved / minimized synchronization events when using a single GPU
32+
- Fixed GPU workspace size
33+
- GPU implementation of Adagrad optimizer
34+
- GPU model-parallel softmax
35+
- Optimized local CUDA kernel implementations
36+
- Support for distributed matrices with arbitrary alignment
37+
38+
Model portability & Usability:
39+
- Keras to LBANN prototext conversion tool
40+
41+
Internals Features:
42+
- Support for multiple objective functions and metrics per network with arbitrary placement
43+
- Objective functions represented as layers
44+
- Metrics represented as layers
45+
- Introduced evaluation layer construct
46+
- Ability to freeze specific layers for pre-training / fine-tuning
47+
- Refactoring tensor setup in setup, forward prop, and back prop
48+
- Layers store matrices in private smart pointers
49+
- Model automatically inserts evaluation layers where needed
50+
- Copy Layer activations between models
51+
- Annotated GPU profiling output with training phases
52+
- Fixed initialization of Comm object and Grid objects when using multiple models
53+
- General code cleanup, refactoring, and various bug fixes.
54+
- All layers overwrite error signal matrices
55+
- NCCL backend is now implemented via Aluminum Library
56+
- MPI calls are routed through the LBANN Comm object into Hydrogen or Aluminum
57+
- Provide runtime statistics summary from every rank
58+
- Reworked LBANN to use Hydrogen to manage GPU memory
59+
- GPU allocations now via CUB memory pool
60+
- Fixed Spack build interaction with Hydrogen Library
61+
62+
I/O & data readers:
63+
- Support for Conduit objects with HDF5 formatting
64+
- In-memory and locally offloaded data store
65+
- Data Store can hold the entire training set in memory (or node-local storage)
66+
- Data store will shuffle data samples between epochs and present samples to input layer
67+
- Updated synthetic data reader
68+
- Modified data readers to handle bad samples in JAG conduit data
69+
- Reworked the I/O layers (input and target) so that the input layer produces both the
70+
sample and label / response if necessary.
71+
- Target layer is being deprecated
72+
- Updated image data reader to use cv::imdecode to accelerate image load times
73+
- Allow users to specify an array of data sources for the independent/dependent
74+
variables via prototext
75+
76+
============================== Release Notes: v0.94 ==============================
77+
Support for new training algorithms:
78+
- Back-Propagation Through Time (BPTT)
79+
-- Recurrent Neural Networks (RNN)
80+
-- Long Short-Term Memories (LSTM)
81+
- Generative Adversarial Networks (GAN)
82+
- Variational autoencoders
83+
- Convolutional autoencoders
84+
- Fine tuning of pretrained networks
85+
-- Flexible weight freezing
86+
- Context-prediction network (Siamese network)
87+
- Livermore Tournament Fast Batch learning (LTFB)
88+
- Variable mini-batch sizes
89+
90+
Support for new network structures
91+
- Directed Acyclic Graph (DAG) networks
92+
- Residual networks
93+
- Modular and composable objective functions
94+
- Multiple metrics
95+
- Shared weight matrices
96+
- (BETA) New evaluation layer that is attach to any point of DAG
97+
- Motifs (compound, reused network patterns)
98+
99+
Support for new layers:
100+
- Learning:
101+
- Deconvolution
102+
- Metrics:
103+
-- Top K Categorical accuracy, Pearson correlation, Mean absolute deviation
104+
- Loss Functions:
105+
-- Cross Entropy with Uncertainty, Geometric negative log likelihood
106+
-- Poisson Negative log likelihood, Polya Negative Log Likelihood
107+
- Optimizers:
108+
-- Hypergradient Adam
109+
- Transform Layers:
110+
-- Contatenation, Noise, Unpooling, Pooling, Reshape, Slice, Split, Sum
111+
- Regularizer:
112+
-- Batch Normalization, Selu Dropout, Local Response Normalization (LRN)
113+
- Activations:
114+
-- Leaky Relu, Smooth Relu, Elu, Scaled Elu, Softplus, Atan,
115+
-- Bent Identity, Exponential
116+
117+
Performance optimizations:
118+
- GPU acceleration for most layers
119+
- NCCL 2.X
120+
- Optimized communication patterns
121+
- Asynchronous weight updates
122+
- Asynchronous metric and objective function updates
123+
- batch normalization (global and local)
124+
- L2 normalization
125+
- Adaptive Quantization (inter-model)
126+
127+
Model portability & usability:
128+
- Portable checkpoints / recovery
129+
- Distributed checkpoint / recovery
130+
- Network visualization
131+
- Export LBANN to TensorFlow format
132+
133+
Internals Features:
134+
- Gradient checking
135+
- Network representation using tensor dimensions
136+
- Bamboo continuous integration (CI)
137+
- Improved data processing pipeline
138+
139+
New data readers:
140+
- Numpy
141+
- CSV
142+
- Methods for merging multiple features and samples across files
143+
- CANDLE Pilot 2
144+
- CANDLE Pilot 1 Combo
145+
- ICF JAG
146+
147+
Integration with Hydrogen, an optimized distributed, dense linear algebra
148+
library. Hydrogen is a fork of the Elemental library. Hydrogen optimizes for:
149+
distributed matrices with elemental and block distributions, BLAS, LAPACK,
150+
distributed and local matrix management.
151+
152+
Integration with optimized all-reduce communication library Aluminum. Aluminum
153+
provides custom reduction patterns, customized CUDA reduction kernels,
154+
and asynchronous communication operators. It uses MPI, MPI w/GPUdirect, or NCCL
155+
as back-end libraries. Aluminum enables us to effectively use non-blocking
156+
all-reduces during backprop/optimization
157+
158+
Additionally, we have added support for an online, distributed data store. When
159+
enabled, LBANN is able to ingest all of the training data set in a distributed
160+
method across all ranks. Each data store is then able to serve it's portion of
161+
a mini-batch, dynamically moving data to the necessary ranks in the model (based
162+
on the mini-batch data distribution).
163+
164+
============================== Release Notes: v0.93 ==============================
165+
This release contains a major refactoring / overhaul of the code base.
166+
Key highlights include:
167+
- Moving layer design into smaller simpler layers that have a single
168+
compute behavior per layer. Specifically, linear combination of the
169+
inputs, non-linear activations, and regularizers now exist as their
170+
own layers.
171+
- Layers now have a template parameter that specifies the data layout
172+
for the distributed matrices.
173+
- Prototext interface for specifying neural network models and data
174+
readers is nearly fully functional.
175+
- Code now adheres to internal coding style as outlined in
176+
README_coding_style.txt
177+
- Dead-code has been eliminated and layer file hierarchy has been
178+
cleaned up.
179+
180+
============================== Release Notes: v0.92 ==============================
181+
New features include (but are not limited to):
182+
- Full support for convolutional and pooling layers
183+
- GPU acceleration of local Elemental GEMM operations
184+
- Improved network and data reader support
185+
-- Alexnet
186+
-- VGG
187+
-- CIFAR-10
188+
- Added a suite of regularizers, objective functions, and metrics, including:
189+
-- Batch normalization
190+
-- Drop-out
191+
-- L2
192+
- Dramatically improves the performance of inter-model communication
193+
- Added suite of image prepossessing routines
194+
195+
============================== Release Notes: v0.91 ==============================
196+
Incorporates a number of changes through the LBANN code base. In
197+
particular there is a new build system that tries to have LBANN
198+
download all of the dependencies into its build tree, and compile them
199+
locally. Additional improvements include optimizations in the data
200+
parallel, multiple model training framework, support for convolutional
201+
layers, and general bug fixes.
202+
203+
============================== Release Notes: v0.90 ==============================
204+
Initial release of the LBANN toolkit.

0 commit comments

Comments
 (0)