Skip to content

Commit 018018b

Browse files
committed
Merge branch 'release-v0.99'
============================== Release Notes: v0.99 ============================== Support for new training algorithms: - Improvements to LTFB infrastructure (including transfer of SGD and Adam hyperparameters) Support for new network structures: - Support for Wide ResNets Support for new layers: Python front-end: - Python front-end for generating neural network architectures (lbann namespace): including layers, objective functions, callbacks, metrics, and optimizers. - Python interface for launching (SLURM or LSF) jobs on HPC systems - Support for running LBANN experiments and capturing experimental output - Network templates for AlexNet, LeNet, arbitrary ResNet models, and Wide ResNet models - Python scripts for LeNet, AlexNet, and (Wide) ResNets in model zoo. Performance optimizations: - GPU implementation of RMSprop optimizer. - cuDNN convolution algorithms are determined by empirically measuring performance rather than using heuristics. - Avoid setting up unused bias weights. - Perform gradient accumulations in-place when possible. Model portability & usability: Internal features: - Weight gradient allreduces are in-place rather than on a staging buffer. - Fully connected and convolution layers only create bias weights when needed. - Optimizer exposes gradient buffers so they can be updated in-place. - Added callback support to explicitly save model - Min-max metric for reporting on multiple LTFB trainers - Cleanup of Hydrogen interface to match Hydrogen v1.2.0 - Added type-erased matrix class for internal refactoring - Make CUB always log performance critical events I/O & data readers: - Python data reader that interacts with an embedded Python session. - Optimized data store to provide preload option - Extended data store to operate with Cosmoflow-numpy data reader Build system: - Added documentation for how users can use Spack to install LBANN either directly or via environments. - Conduit is a required dependency. - Provided Spack environment for installing LBANN as a user - Improved documentation on lbann.readthedocs.io - CMake installs a module file in the installation directory that sets up PATH and PYTHONPATH variables appropriately Bug fixes: - Models can now be copied or setup multiple times. - Fixed incorrect weight initialization with multiple trainers. - Updated I/O random number generators to be C++ thread safe (rather than OpenMP) - Added an I/O random number generator for preprocessing that is independent of the data sequence RNG. - Fixed initialization order of RNGs and multiple models / trainers. - General fixes for I/O and LTFB interaction. Retired features: - "Zero" layer (hack for early GAN implementation). - Removed data reader specific implementations of data store (in favor of Conduit-based data store)
2 parents 321c436 + a8e0635 commit 018018b

File tree

850 files changed

+25225
-18065
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

850 files changed

+25225
-18065
lines changed

.readthedocs.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# .readthedocs.yml
2+
3+
build:
4+
image: latest
5+
6+
python:
7+
version: 3.7

CMakeLists.txt

Lines changed: 157 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
cmake_minimum_required(VERSION 3.8)
1+
cmake_minimum_required(VERSION 3.12)
22

33
project(LBANN CXX)
44

@@ -48,8 +48,8 @@ endif ()
4848
#
4949

5050
set(LBANN_VERSION_MAJOR 0)
51-
set(LBANN_VERSION_MINOR 98)
52-
set(LBANN_VERSION_PATCH 1)
51+
set(LBANN_VERSION_MINOR 99)
52+
set(LBANN_VERSION_PATCH 0)
5353

5454
set(LBANN_VERSION "${LBANN_VERSION_MAJOR}.${LBANN_VERSION_MINOR}.${LBANN_VERSION_PATCH}")
5555

@@ -100,7 +100,7 @@ option(LBANN_WITH_ALUMINUM "Enable Aluminum all-reduce library" OFF)
100100

101101
option(LBANN_WITH_CNPY "Include cnpy" ON)
102102

103-
option(LBANN_WITH_CONDUIT "Enable Conduit library" OFF)
103+
option(LBANN_WITH_CONDUIT "Enable Conduit library" ON)
104104

105105
option(LBANN_WITH_CUDNN "Include Nvidia cuDNN" ON)
106106

@@ -110,12 +110,17 @@ option(LBANN_WITH_HWLOC
110110
option(LBANN_WITH_NVPROF
111111
"Enable NVTX-based instrumentation for nvprof" OFF)
112112

113-
option(LBANN_WITH_TBINF "Include Tensorboard interface" ON)
113+
option(LBANN_WITH_PYTHON
114+
"Install Python frontend and enable embedded Python" ON)
114115

116+
option(LBANN_WITH_TBINF "Include Tensorboard interface" ON)
115117

116118
option(LBANN_WITH_VTUNE
117119
"Link the Intel VTune profiling library" OFF)
118120

121+
option(LBANN_WITH_UNIT_TESTING
122+
"Enable the unit testing framework (requires Catch2)" OFF)
123+
119124
# Enable parallel random matrix generation, if possible
120125
option(LBANN_DETERMINISTIC
121126
"Use deterministic algorithms as much as possible." OFF)
@@ -167,12 +172,12 @@ set(LBANN_HAS_CEREAL ${CEREAL_FOUND})
167172
# The imported target is just called "cereal". Super.
168173

169174
# Setup the linear algebra library
170-
find_package(Hydrogen 1.1.0 NO_MODULE QUIET
175+
find_package(Hydrogen 1.2.0 NO_MODULE QUIET
171176
HINTS ${Hydrogen_DIR} ${HYDROGEN_DIR} $ENV{Hydrogen_DIR} $ENV{HYDROGEN_DIR}
172177
PATH_SUFFIXES lib/cmake/hydrogen
173178
NO_DEFAULT_PATH)
174179
if (NOT Hydrogen_FOUND)
175-
find_package(Hydrogen 1.1.0 NO_MODULE QUIET REQUIRED)
180+
find_package(Hydrogen 1.2.0 NO_MODULE QUIET REQUIRED)
176181
endif ()
177182
message(STATUS "Found Hydrogen: ${Hydrogen_DIR}")
178183
set(LBANN_HAS_HYDROGEN ${Hydrogen_FOUND})
@@ -209,13 +214,13 @@ endif ()
209214
if (LBANN_WITH_ALUMINUM)
210215
# Aluminum may have already been found by Hydrogen
211216
if (NOT Aluminum_FOUND)
212-
find_package(Aluminum NO_MODULE QUIET
217+
find_package(Aluminum 0.2.0 NO_MODULE QUIET
213218
HINTS ${Aluminum_DIR} ${ALUMINUM_DIR} ${AL_DIR}
214219
$ENV{Aluminum_DIR} $ENV{ALUMINUM_DIR} $ENV{AL_DIR}
215220
PATH_SUFFIXES lib64/cmake/aluminum lib/cmake/aluminum
216221
NO_DEFAULT_PATH)
217222
if (NOT Aluminum_FOUND)
218-
find_package(Aluminum NO_MODULE QUIET)
223+
find_package(Aluminum 0.2.0 NO_MODULE QUIET)
219224
endif ()
220225
endif ()
221226
set(LBANN_HAS_ALUMINUM ${Aluminum_FOUND})
@@ -287,6 +292,29 @@ if (LBANN_WITH_TBINF)
287292
add_subdirectory(external/TBinf)
288293
endif ()
289294

295+
# Find Python
296+
# Note: This uses the Python module in cmake/modules, not the module
297+
# that comes included with CMake. See the file for a discussion of the
298+
# differences.
299+
if (LBANN_WITH_PYTHON)
300+
find_package(Python REQUIRED)
301+
set(LBANN_HAS_PYTHON "${Python_FOUND}")
302+
if (NOT Python_VERSION_MAJOR EQUAL 3)
303+
set(LBANN_HAS_PYTHON FALSE)
304+
message(FATAL_ERROR "Python 2 is not supported.")
305+
endif ()
306+
307+
# Setup the installation stuff
308+
set(PYTHON_INSTALL_PREFIX "${CMAKE_INSTALL_PREFIX}"
309+
CACHE PATH "The prefix for the python installation")
310+
311+
set(CMAKE_INSTALL_PYTHONDIR
312+
"lib/python${Python_VERSION_MAJOR}.${Python_VERSION_MINOR}/site-packages"
313+
CACHE PATH
314+
"Relative path from PYTHON_INSTALL_PREFIX to the python package install")
315+
316+
endif (LBANN_WITH_PYTHON)
317+
290318
if (LBANN_WITH_VTUNE)
291319
find_package(VTune MODULE)
292320

@@ -305,7 +333,7 @@ if (LBANN_WITH_VTUNE)
305333
endif (VTune_FOUND)
306334
endif (LBANN_WITH_VTUNE)
307335

308-
if (LBANN_WITH_NVPROF)
336+
if (LBANN_WITH_CUDA AND LBANN_WITH_NVPROF)
309337
set(LBANN_NVPROF TRUE)
310338
endif ()
311339

@@ -336,15 +364,15 @@ if (LBANN_WITH_CONDUIT)
336364
message(STATUS "Found HDF5: ${HDF5_DIR}")
337365
endif ()
338366

339-
find_package(CONDUIT CONFIG QUIET
340-
HINTS ${CONDUIT_DIR} $ENV{CONDUIT_DIR}
367+
find_package(Conduit CONFIG QUIET
368+
HINTS ${Conduit_DIR} $ENV{Conduit_DIR} ${CONDUIT_DIR} $ENV{CONDUIT_DIR}
341369
PATH_SUFFIXES lib64/cmake lib/cmake
342370
NO_DEFAULT_PATH)
343-
if (NOT CONDUIT_FOUND)
344-
find_package(CONDUIT CONFIG QUIET REQUIRED
371+
if (NOT Conduit_FOUND)
372+
find_package(Conduit CONFIG QUIET REQUIRED
345373
PATH_SUFFIXES lib64/cmake lib/cmake)
346374
endif ()
347-
message(STATUS "Found CONDUIT: ${CONDUIT_DIR}")
375+
message(STATUS "Found CONDUIT: ${Conduit_DIR}")
348376

349377
# Ugh. I don't like that this requires intimate knowledge of
350378
# specific targets that CONDUIT exports. It should support
@@ -402,9 +430,28 @@ if (LBANN_WITH_CONDUIT)
402430
"${_conduit_interface_link_libs}")
403431

404432
set(CONDUIT_LIBRARIES conduit::conduit)
405-
set(LBANN_HAS_CONDUIT ${CONDUIT_FOUND})
433+
set(LBANN_HAS_CONDUIT ${Conduit_FOUND})
406434
endif (LBANN_WITH_CONDUIT)
407435

436+
if (LBANN_WITH_UNIT_TESTING)
437+
find_package(Catch2 2.0.0 CONFIG QUIET
438+
HINTS ${CATCH2_DIR} $ENV{CATCH2_DIR} ${CATCH_DIR} $ENV{CATCH_DIR}
439+
PATH_SUFFIXES lib64/cmake/Catch2 lib/cmake/Catch2
440+
NO_DEFAULT_PATH)
441+
if (NOT Catch2_FOUND)
442+
find_package(Catch2 2.0.0 CONFIG QUIET REQUIRED)
443+
endif ()
444+
message(STATUS "Found Catch2: ${Catch2_DIR}")
445+
446+
# Now that Catch2 has been found, start adding the unit tests
447+
include(CTest)
448+
include(Catch)
449+
add_subdirectory(src/utils/unit_test)
450+
451+
# Add this one last
452+
add_subdirectory(unit_test)
453+
endif (LBANN_WITH_UNIT_TESTING)
454+
408455
# Handle the documentation
409456
add_subdirectory(docs)
410457

@@ -430,6 +477,10 @@ target_include_directories(lbann PUBLIC
430477
$<BUILD_INTERFACE:${CMAKE_SOURCE_DIR}/include>
431478
$<INSTALL_INTERFACE:${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR}>)
432479

480+
if (LBANN_HAS_PYTHON)
481+
target_include_directories(lbann PUBLIC ${Python_INCLUDE_DIRS})
482+
endif ()
483+
433484
# Use the IMPORTED targets when possible.
434485
target_link_libraries(lbann PUBLIC LbannProto)
435486
target_link_libraries(lbann PUBLIC cereal)
@@ -460,6 +511,10 @@ if (LBANN_HAS_VTUNE)
460511
target_link_libraries(lbann PUBLIC ${VTUNE_STATIC_LIB})
461512
endif ()
462513

514+
if (LBANN_HAS_PYTHON)
515+
target_link_libraries(lbann PUBLIC ${Python_LIBRARIES})
516+
endif ()
517+
463518
if (TARGET LBANN_CXX_FLAGS_werror)
464519
target_link_libraries(lbann PUBLIC LBANN_CXX_FLAGS_werror)
465520
endif ()
@@ -516,8 +571,8 @@ export(EXPORT LBANNTargets NAMESPACE LBANN:: FILE LBANNTargets.cmake)
516571

517572
# Write the configure file for the install tree
518573
set(INCLUDE_INSTALL_DIRS include)
519-
set(LIB_INSTALL_DIR lib)
520-
set(CMAKE_INSTALL_DIR lib/cmake/lbann)
574+
set(LIB_INSTALL_DIR ${CMAKE_INSTALL_LIBDIR})
575+
set(CMAKE_INSTALL_DIR ${LIB_INSTALL_DIR}/cmake/lbann)
521576
set(EXTRA_CMAKE_MODULE_DIR)
522577
configure_package_config_file(cmake/configure_files/LBANNConfig.cmake.in
523578
"${CMAKE_BINARY_DIR}/LBANNConfig.cmake.install"
@@ -559,6 +614,64 @@ install(
559614
FILES "${PROJECT_BINARY_DIR}/lbann_config.hpp"
560615
DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}")
561616

617+
# Install Python frontend
618+
# Note (tym): Python best practices are to put setup.py at the package
619+
# root and setuptools only accepts relative paths. However, we need to
620+
# insert a config file containing install-specific file paths and make
621+
# sure setup.py can pick it up. I see three approaches for the build
622+
# process:
623+
# 1) Inject the config file into a known location in the source
624+
# directory so that setup.py can pick it up.
625+
# 2) Copy the Python source tree into the build directory and insert
626+
# setup.py and the config file.
627+
# 3) Create setup.py and the config file in the build directory and
628+
# pass the source directory as a relative path.
629+
# We go for option 3 since it's simple and lightweight, but it runs
630+
# counter to the intent of setuptools. If we learn about any nicer
631+
# approaches, we should use them.
632+
if (LBANN_HAS_PYTHON)
633+
634+
# Construct config file
635+
# NOTE (trb): python_config.ini is installed by setup.py
636+
set(_PYTHON_CONFIG_INI ${CMAKE_BINARY_DIR}/python_config.ini)
637+
set(_LBANN_PB2_PY ${PYTHON_INSTALL_PREFIX}/${CMAKE_INSTALL_PYTHONDIR}/lbann_pb2.py)
638+
set(_LBANN_EXE ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/lbann)
639+
configure_file(
640+
"${CMAKE_SOURCE_DIR}/cmake/configure_files/python_config.ini.in"
641+
"${_PYTHON_CONFIG_INI}"
642+
@ONLY)
643+
644+
# Construct setup.py
645+
set(_SETUP_PY ${CMAKE_BINARY_DIR}/setup.py)
646+
set(_LBANN_PYTHON_DIR "${CMAKE_SOURCE_DIR}/python")
647+
configure_file(
648+
"${CMAKE_SOURCE_DIR}/cmake/configure_files/setup.py.in"
649+
"${_SETUP_PY}"
650+
@ONLY)
651+
652+
# Install Python package with setuptools
653+
set(_PY_INSTALL_DIR "${PYTHON_INSTALL_PREFIX}/${CMAKE_INSTALL_PYTHONDIR}")
654+
set(_SETUP_PY_ARGS
655+
"${_SETUP_PY_ARGS} --root ${_PY_INSTALL_DIR} --install-lib . --install-data .")
656+
install(CODE
657+
"execute_process(COMMAND ${Python_EXECUTABLE} ${_SETUP_PY} install ${_SETUP_PY_ARGS})")
658+
659+
set(_PY_INSTALL_MSG
660+
"
661+
\n**********************************************************************
662+
663+
A Python package has been installed to ${_PY_INSTALL_DIR}. To use
664+
this package, be sure to add this directory to your PYTHONPATH, e.g.:
665+
666+
export PYTHONPATH=${_PY_INSTALL_DIR}:\\$\{PYTHONPATH\}
667+
668+
**********************************************************************\n
669+
")
670+
install(CODE
671+
"execute_process(COMMAND ${CMAKE_COMMAND} -E echo \"${_PY_INSTALL_MSG}\")")
672+
673+
endif (LBANN_HAS_PYTHON)
674+
562675
# Install contributor list, license, readme
563676
install(
564677
FILES "${PROJECT_SOURCE_DIR}/CONTRIBUTORS"
@@ -583,8 +696,10 @@ macro(append_str_tf STRING_VAR)
583696
math(EXPR _num_spaces "${_max_length} - ${_var_length}")
584697
lbann_get_space_string(_spaces ${_num_spaces})
585698
if (${var})
699+
set(${var} "TRUE")
586700
string(APPEND ${STRING_VAR} " ${var}:" "${_spaces}" "TRUE\n")
587701
else ()
702+
set(${var} "FALSE")
588703
string(APPEND ${STRING_VAR} " ${var}:" "${_spaces}" "FALSE\n")
589704
endif ()
590705
endforeach()
@@ -632,10 +747,33 @@ append_str_tf(_str
632747
LBANN_HAS_DOXYGEN
633748
LBANN_HAS_LBANN_PROTO
634749
LBANN_HAS_ALUMINUM
635-
LBANN_HAS_CONDUIT)
750+
LBANN_HAS_CONDUIT
751+
LBANN_HAS_PYTHON)
636752
string(APPEND _str
637753
"\n== End LBANN Configuration Summary ==\n")
638754

639755
# Output to stdout
640756
execute_process(COMMAND ${CMAKE_COMMAND} -E echo "${_str}")
641757
set(_str)
758+
759+
#
760+
# Write a basic modulefile
761+
#
762+
set(LBANN_MODULEFILE_NAME "lbann-${LBANN_VERSION}.lua"
763+
CACHE STRING
764+
"The name of the LBANN modulefile to install. Must end in .lua.")
765+
766+
if (NOT (LBANN_MODULEFILE_NAME MATCHES ".+\.lua"))
767+
message(WARNING
768+
"LBANN_MODULEFILE_NAME must have extension \".lua\". Appending.")
769+
set(LBANN_MODULEFILE_NAME "${LBANN_MODULEFILE_NAME}.lua"
770+
CACHE STRING "" FORCE)
771+
endif ()
772+
773+
configure_file(
774+
"${CMAKE_SOURCE_DIR}/cmake/configure_files/lbann_module.lua.in"
775+
"${CMAKE_BINARY_DIR}/lbann_module.lua.install"
776+
@ONLY)
777+
install(FILES "${CMAKE_BINARY_DIR}/lbann_module.lua.install"
778+
RENAME "${LBANN_MODULEFILE_NAME}"
779+
DESTINATION "${CMAKE_INSTALL_SYSCONFDIR}/modulefiles")

LICENSE

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Copyright (c) 2014-2016, Lawrence Livermore National Security, LLC.
2-
Produced at the Lawrence Livermore National Laboratory.
1+
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC.
2+
Produced at the Lawrence Livermore National Laboratory.
33
Written by the LBANN Research Team (B. Van Essen, et al.) listed in
44
the CONTRIBUTORS file. <[email protected]>
55

@@ -8,7 +8,7 @@ All rights reserved.
88

99
This file is part of LBANN: Livermore Big Artificial Neural Network
1010
Toolkit. For details, see http://software.llnl.gov/LBANN or
11-
https://github.com/LLNL/LBANN.
11+
https://github.com/LLNL/LBANN.
1212

1313
Licensed under the Apache License, Version 2.0 (the "Licensee"); you
1414
may not use this file except in compliance with the License. You may
@@ -21,4 +21,3 @@ distributed under the License is distributed on an "AS IS" BASIS,
2121
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
2222
implied. See the License for the specific language governing
2323
permissions and limitations under the license.
24-

README.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,17 @@ methods.
2121

2222

2323
## Building LBANN
24-
A few options for building LBANN are documented
25-
[here](docs/BuildingLBANN.md#top).
24+
The preferred method for LBANN users to install LBANN is to use
25+
[Spack](https://github.com/llnl/spack). After some system
26+
configuration, this should be as straightforward as
2627

28+
```bash
29+
spack install lbann
30+
```
31+
32+
More detailed instructions for building and installing LBANN are
33+
available at the [main LBANN
34+
documentation](https://lbann.readthedocs.io/en/latest/index.html).
2735

2836
## Running LBANN
2937
The basic template for running LBANN is
@@ -42,8 +50,12 @@ optimized for the case in which one assigns one GPU per MPI
4250
the MPI launcher.
4351

4452
More details about running LBANN are documented
45-
[here](docs/RunningLBANN.md#top).
53+
[here](https://lbann.readthedocs.io/en/latest/running_lbann.html).
54+
55+
## Publications
4656

57+
A list of publications, presentations and posters are shown
58+
[here](https://lbann.readthedocs.io/en/latest/publications.html).
4759

4860
## Reporting issues
4961
Issues, questions, and bugs can be raised on the [Github issue

0 commit comments

Comments
 (0)