Open
Description
TCLB was part of LUMI (https://lumi-supercomputer.eu/lumis-second-pilot-phase-in-full-swing/) Pilot Program which is now ending.
Apart from performance results, there are some issues that might be worth consideration. LUMI is a brand new CRAY/HPE computer with AMD Instinct MI250X 128GB HBM2e cards.
- Performance: is not there yet, while scalability up to 1024 GPUs works nice, per-GPU performance is unsatisfactory. Some profiling data is available, we could get more. This is a subject for another talk/issue
- rinside: HPE/CRAY decided that R is statically linked which prevents from building rcpp/rinside. HPC centers will resist installing non-HPE versions of R, so user will need to build it themselves. This makes life harder, especially on Cray :/
- GPU per Core/Process allocation should be done in some specific order. This could be handled by shell script and ROCM_VISIBLE_GPU variable but TCLB requires "gpu_oversubscribe="True" in XML to run. I think that configure flag to disable it would be nice. Imagine you copy XML from other system, and it will fail after 2 days in queue because of that.
- More generic in-situ processing would be nice, like ADIOS. I could work on that, but let me know if you are interested
As for results, I made 0.8e9 lattice dissolution simulation for AGU :D That is around half of the 12cm experimental core at 30um resolution.
I still have few days left - if you want to check something we could do it.