-
Notifications
You must be signed in to change notification settings - Fork 145
3rd EasyBuild hackathon meeting minutes day 1
boegel edited this page Mar 15, 2013
·
11 revisions
(Monday Mar. 11th 2013, 10am-6pm)
The first day of the 3rd EasyBuild hackathon consisted of presentations, discussions and initial hands-on experience with EasyBuild for attendees new to the tool. These notes were mainly taken by Kenneth and Jens, with contributions by Fotis.
- Kenneth Hoste (HPC-UGent, EasyBuild developer and release manager)
- Jens Timmerman (HPC-UGent, EasyBuild developer)
- Fotis Georgatos (University of Luxembourg, HPC sysadmin and active contributor)
- Jens Wiegand (The Cyprus Institute, LinkSCEEM project manager)
- Thekla Loizou (The Cyprus Institute, HPC user support)
- George Tsouloupas (The Cyprus Institute, HPC sysadmin and user support)
- Stelios Erotokritou (The Cyprus Institute, HPC user support/PRACE)
- Mohamed Gafaar (Bibliotheca Alexandrina, HPC sysadmin/user support)
- Dina Mahmoud Ibrahim (Cairo University, HPC sysadmin/user support)
- Alan O'Cais (Jülich Supercomputing Centre, HPC user support & LinkSCEEM)
- Alexander Schnurpfeil (Jülich Supercomputing Center, HPC user suppot)
- Nicolas ?? (??, HPC user (OpenFOAM)) FIXME
- George Fanourgakis (The Cyprus Institute, HPC user, molecular dynamics)
- Demetris Charalambous (Cyprus Meteorological Servicei, HPC user support?, weather forcecasting (WRF, ...)) (or OpenFOAM??) FIXME
- Ioanna Kalvari (University of Cyprus (bioinformatics), HPC user)
- Ioannis Kirmitzoglou (University of Cyprus (bioinformatics), HPC user)
- Adam DeConinck (NVIDIA Corporation, HPC sysadmin) [remote via Skype]
- [10am-10.10am] presentation on LinkSCEEM project
- [10.10am-10.15am] introduction round: who's who?
- [10.15am-12am] presentation on EasyBuild: Building Software With Ease
- [1.30pm - 1.45pm] presentation by Jülich Supercomputing Cente (JSC) on current activities and plans with EasyBuild
- [1.45pm - 2.30pm] presentation by Cyprus Institute on current activities and plans with EasyBuild
- [2.30pm - 5pm] discussions + initial hands-on experience with EasyBuild
- [5pm - 6pm] presentation by NVIDIA: Introduction to the CUDA Toolkit for Building Applications
- [6pm - 8pm] aftermath: discussions w.r.t. CUDA support in EasyBuild
- goal of LinkSCEEM project: establish HPC ecosystem in Eastern Midde-Terranean
- resources, training, expertise, connectivity, ...
- online training content is very important!
- with support from NCSA, Jülich, ..
- www.isgtw.org
- CSC2013 PRACE conference in Cyprus
- Alan's subject: performance analysis and optimisation of community codes
-
--download-only
command line option is missing [feature request] - can be done indirectly now:
- using
--stop fetch
, but will fail after first download failed - using
--regtest
, and 'breaking'--job
so no jobs are submitted for the builds - EasyBuild bootstrap script [question]
- why not include fixed Python 2.7 version in bootstrap procedure, so we have full control of Python version being used
- add Python module as a dependency for EasyBuild
- why does something have to be part of a toolchain, and not just a dependency? [question]
- can toolchain be extended dynamically?
- e.g. include dependencies in toolchain as well?
-
ictce
=>ictcee
(e
for extended)? - creating yet another toolchain means that whole stack of dependencies needs to be rebuilt, which is a pain [question]
- and results in further explosion of the set of available modules
- create big fat toolchains and filter stuff out in
toolchainopts
withfilter
option? - are existing modules reused if they're needed? [answer: yes]
- supporting alternative module naming schemes
- basically just an alternative view on the existing modules (?)
- flat or hierarchical
- hierarchical can be top-down (compiler->libraries->apps) or vice versa (software on top, what is what the users care about)
- setting up mirrors for sources
-
--try-amend
source URL should be supported via EasyBuild configuration file - currently already supported via
EASYBUILD_TRY_AMEND
env var - Trilinos should be in bold on slide with supported software
- can
build_in_install_dir
be specified in easyconfig file? -
ictce/3.2.2.u3
toolchain sources are no longer available, so use other toolchain in examples (WRF) - chroot into installation prefix, as an alternative to loading modules
- customization of modules, sed through existing modules and change what it needed
- can EasyBuild cope with existing modules (e.g., OpenMPI), or do they need to be rebuilt?
- will not work out of the box, because those modules will be missing things like EBROOT, EBVERSION
- rebuilding them is the best idea, and will enable you to roll out software again after reinstall of system with different OS
- adding support for BlueGene Q system
- perfect timeframe since it's quite new
- specific characteristic: crosscompilation, IBM XLC compiler, ...
- on BlueGene systems, running of tests will need to be skipped or done differently (remotely)
- skipping can be done with e.g.
--try-amend=skipsteps=test,test_cases
-
OpenFOAM
is a pain because of large difference in system characteristics - error/warning log parser now spits out lots of false positives
- regular expression used needs to be documented well
- need to enhance regex to reduce amount of false positives
*
bbcp
can never work if the required ports are not open - add a test case for this?
- support a way of spitting out a warning about this at the end of the installation
- configuration file (and .eb files) will not be executed anymore => needs to be followed up (w/ Stelios)
- more stuff needs to be shoved into toolchains: Python (George T.), zlib, ...
- document where to put source files
- supercomputing training portal: http://linksceem.eu/ls2/component/content/article/198
- big fat training cluster via VMs w/ terminal emulation in web browser
- rely on EasyBuild to get software installed on this
- should function well in a low-bandwidth environment (important for LinkSCEEM)
- documentation portal w.r.t. supercomputers, just collect links to useful documentation around the web
- try and get a toolchain working for BlueGene Q systems
- categories of software used in PRACE
- strong commitment from Cyprus Institute to EasyBuild
- was really useful to quickly set up a software stack
- GPU software, even though not all apps were there
- useful for conformity across LinkSCEEM institutions
- also for setting up post-processing nodes (w/ different OS)
- robot should not depend on fileame, but on contents of easyconfig
- need support for multiple source paths (and dependencies? => Jens T.)
-
--download-only
(from mirrors) - document jail tool (and add it to bootstrap)
- goolf: OpenMPI 1.5, OpenBLAS, FFTW w/
--enable-avx
- custom variables in module files (see
modextravars
) - CUDA support: in toolchain or not?
- different CUDA versions, dependency chain, ...
- PGI toolchain is also important for CI
- OFED vs no-OFED: easy way to get rid of
-no-OFED
version suffix? via EB config file? - user environment: multiple source paths, custom version suffix for 'tagging' your own builds
- FFTW single/double precision
- separate module (and thus separate toolchains) vs 'fat' build
- support both single/double requires running configure/make/make install twice
- local climate group requirements: Ferret, ... (see George F.)
- Python as a part of the toolchain?
- managing multiple EB versions
- need a guidelines and best practices for EasyBuild
- what should sites customize? what should be left untouched (e.g. becaue it'll break in the future with new EB versions)
- how to override default use of
/tmp
(set$TMPDIR
, which will be picked up by Python) - explosion of available modules
- we need a good way to handle that
- in combination with flexible module naming scheme?
- figure out exact build options that were used
- document querying log for e.g. configure options
- or provide tools for it (via
eb
) - issue of OS dependencies: portable way of specifying them
- making sure we catch all dependencies (jail tool provided by HashDist)
-
goalf
/ictce
versioning schemes need to be documented - e.g. add
--enable-avx
to FFTW but keep toolchain version the same? - sidenote:
--enable-avx
with FFTW apparently is bad for GROMACS, up to 20% perf loss - will need to bump ATLAS anyway (because of Sandy Bridge support), hence
goalf
as well - keep versions fixed but tweaking builds is not a good idea
- OpenMPI version bump to v1.5
- breaks ABI
- part of new goalf (v2.x?)
- GROMACS on top of new
goalf
(goolf
) - (Fotis took over for the remainder)
- PRACE production environment
- mainly set of environment variables missing
- EasyBuild enables you to set up a PRACE environment in user space
- support for custom site-specific environment variables
- HPC-BIOS pitch: standardization policy, see [URL] FIXME
- provide "standard" working environment for e.g. climate science
- [Alan] documentation for building CUDA applications provided by NIVIDA is very useful and hard to come by!
- NVIDIA CUDA with OpenMPI: K20 + Mellanox IB
- 'drop-in' libraries: cuBLAS
- actually a misnomer, since it provides different function names
- CUDA (C)
- compilers + tools
-
nvcc
should always be used, not with likempicc
which is optional (can handle linking with MPI libs yourself) - runtime API (
libcudart.so
) - IDE, visual profiler
- collection of libraries (CUBLAS, CUFFT, Thrust, ...)
- quite similar to e.g. MPI
- usually there's a configure option like
--enable-cuda
, but no standard - similar for installation prefix for CUDA, e.g.
CUDA_HOME
(quite similar across apps) - some apps ship their own runtime
- be careful with setting
LD_LIBRARY_PATH
in a CUDA context -
nvcc
treats C code like C++ (!) -
-use-fast-math
mostly targets single-precision stuff - can be broken up into parts
-
-Xptxas=iv
can always be set for having debug info - most MPI implementations support CUDA (except for Intel MPI)
- things may break for older versions
- significant performance gain if communication layer supports DMA to GPU memory (GPU Direct, requires IB QDR/DFR and Mellanox)
- test examples:
matrixMul
,simpleMPI
- CMake 2.8.7+ for good CUDA support
- object code for correct device architecture should be used for best performance
- make sure PTX is not being used, which is JIT-compiled and thus leads to startup performance loss
- default is to target oldest PTX/object code:
--gencode
default tocompute_10,code=sm_10
- usually something is set in the application Makefile
OPENCCFLAGS=gencode...
PTXARGS=--arch,...
- can CUDA toolkit be queried for what options are optimal for current device architecture?
- possible options for
gencode
depend on CUDA compiler version - build for "everything under the sun"
- will fail on older
nvcc
systems - larger binary
- OpenACC: very similar to OpenMP (only PGI, Cray, CAPS compilers for now)
- for PGI:
-
-acc
-> use OpenACC -
-Minfo=accel
-> compile for accelerator (not CPU, which is also possible) -
-ta=nvidia
(vs AMD or Intel accelerators) - default architectures targetted are current GPU + major versions (1.0, 2.0, 3.0), but can be tuned via command line options
- CUDA compiler commands depend on compiler being used, e.g.
pgfortran
(PGI) for CUDA Fortran -
nvcc
is actually an LLVM frontend - developer.nvidia.com/llvm
- http://docs.nvidia.com
- http://developer.nvidia.com/nvidia-registered-developer-program to get GPUs
- adeconinck [at] nvidia.com for questions
- follow-up conf. call
- packaging of CUDA toolkit (
redhat
,fedora
,ubuntu
, ...) - reason to not have a monolitic install is OS-specific stuff like paths for files, driver, etc.
- "let user provide CUDA toolkit installation instead of through EasyBuild" (not a good idea)
- use
-silent
for only toolkit (not driver) - OS-independent install package is being looked into
- can notes for NAMD be provided as well?
- CUDA vs Xeon Phi build process
- Xeon Phi is via magic options in Intel compilers (cfr. PGI)
- different modes: native mode (all on Xeon Phi), offload mode (host + Phi as accelerator), OpenACC (via pragmas in code), x86-only (run x86 binary on Phi)
- open questions
- standard variables for CUDA compilers (e.g.
CUDA_CC
) - Intel compilers + CUDA?
- which
gencode
options should be set for CUDA-enabled toolchain? - performance issues with 'fat' binaries (multiple device architectures) due to instruction cache bottleneck?
- can compiled binary be queried for which options were used? (George T.)
- environment modules may be Tcl-only version, which only provides
modulecmd.tcl
- EasyBuild needs to be able to handle that
-
modulecmd
may not be in path, but hardcoded inmodule
- make bootstrap script work offline too, i.e. add option to supply it the required source tarballs
- goolf v1.5.10
- GCC 4.7.2
- OpenMPI 1.6.3 (1.7rc8 not production ready + requires GCC > v4.8!)
- OpenBLAS 0.2.6
- LAPACK 3.4.2
- FFTW 3.3.3 (single/double)
- ScaLAPACK 2.0.2
- problems with EasyBuild bootstrap script during training exercises
- use
modulecmd help
instead of-H
(latter doesn't work with Tcl environment modules?) - warn about installing with root
-
lib
vslib64
- offline mode for bootstrap script required, e.g. login nodes can not go online (Mohamed)
- but workernodes can :)