Skip to content

Hotstart overwrites previous output #56

Open
@saltynexus

Description

Bug description

during hotstart, GPUSPH fails to write output to specified directory

Summary

I'm currently running GPUSPH on a cluster, which uses SLURM scheduling. The cluster scheduling is configured to give priority to certain users. In one instance, my job was killed during execution. I therefore attempted to resume the job using a hotstart file. GPUSPH successfully read the hotstart file and the simulation carried on as expected.

After the job finished, I check my output directory and noticed that there was no output generated following the hotstart. The only output provided was that associated with the initial simulation, prior to the job being killed.

This is the command that I executed in the initial job submission

./GPUSPH --deltap 0.005 --dir /home/user/nfs_fs02/high_res

This is the command that I executed after the job was killed to resume

./GPUSPH --deltap 0.005 --dir /home/user/nfs_fs02/high_res --resume /home/user/nfs_fs02/high_res/data/hot_00082.bin

The simulation is a modified version of the "WaveTank" example test case provided with the GPUSPH source code downloaded from here (github master branch). The only thing that I changed was removal of the slope in the experiment. I've run it in the past and it works as intended, so I'm 99.9% sure it has nothing to do with the specific application.

I suspect that the bug might be related to me specifying the output directory (non default). Somewhere in the hotstart procedure, it fails to properly identify that output is requested and where it is to be generated.

Details

Here is my error log

WARNING: dt 5e-05 will be used only for the first iteration because adaptive dt is enabled
Successfully restored hot start file 1 / 1
HotFile( version=1, pc=17086279, bc=1)
Restarting from t=8.20035, iteration=139290, dt=5.90909e-05
WARNING: simulation has rigid bodies and/or moving boundaries, resume will not give identical results

and here is my output log

 * No devices specified, falling back to default (0)...
GPUSPH version v5.0+custom
Release version without fastmath for compute capability 7.5
Chrono : enabled
HDF5   : enabled
MPI    : disabled
Catalyst : disabled
Compiled for problem "MY_WaveTank"
[Network] rank 0 (1/1), host 
 tot devs = 1 (1 * 1)

paddle_amplitude (radians): 0.218669
Info stream: GPUSPH-776718
Initializing...
Water level not set, autocomputed: 0.4525
Max particle speed not set, autocomputed from max fall: 2.97136
Expected maximum shear rate: 3076.92 1/s
dt = 5e-05 (CFL conditions from soundspeed: 6.5e-05, from gravity 0.00514816, from viscosity 5.28125)
Using computed max neib list size 128
Using computed neib bound pos 127
Artificial viscosity epsilon is not set, using default value: 4.225000e-07
Problem calling set grid params
Influence radius / neighbor search radius / expected cell side	: 0.013 / 0.013 / 0.013
Autocomputed SPS Smagorinsky factor 3.6e-07 from C_s = 0.12, ∆p = 0.005
Autocomputed SPS isotropic factor 1.1e-07 from C_i = 0.0066, ∆p = 0.005
 - World origin: 0 , 0 , 0
 - World size:   12 x 1.2 x 1
 - Cell size:    0.0130011 x 0.0130435 x 0.0131579
 - Grid size:    923 x 92 x 76 (6,453,616 cells)
 - Cell linearization: y,z,x
 - Dp:   0.005
 - R0:   0.005
Generating problem particles...
Hot starting from /home/user/nfs_fs02/high_res/data/hot_00082.bin...
VTKWriter will write every 0.1 (simulated) seconds
HotStart checkpoints every 0.1 (simulated) seconds
	will keep the last 8 checkpoints
Allocating shared host buffers...
Numbodies : 1
Numforcesbodies : 0
numOpenBoundaries : 0
  allocated 1.27 GiB on host for 17,086,280 particles (17,086,279 active)
read buffer header: Position
read buffer header: Velocity
read buffer header: Info
read buffer header: Hash
Restoring body #0 ...
RB First/Last Index:
Preparing the problem...
Body: 0
	 Cg grid pos: 13 46 25
	 Cg pos: -0.00144029 -0.00652174 0.00613915
 - device at index 0 has 17,086,279 particles assigned and offset 0
Integrator predictor/corrector instantiated.
Starting workers...
number of forces rigid bodies particles = 0
thread 0x2b93acd3c700 device idx 0: CUDA device 0/1, PCI device 0000:1b:00.0: GeForce RTX 2080 Ti
Device idx 0: free memory 10821 MiB, total memory 10989 MiB
Estimated memory consumption: 400B/particle
Device idx 0 (CUDA: 0) allocated 0 B on host, 6.1 GiB on device
  assigned particles: 17,086,279; allocated: 17,086,280
GPUSPH: initialized
Performing first write...
Letting threads upload the subdomains...
Thread 0 uploading 17086279 Position items (260.72 MiB) on device 0 from position 0
Thread 0 uploading 17086279 Velocity items (260.72 MiB) on device 0 from position 0
Thread 0 uploading 17086279 Info items (130.36 MiB) on device 0 from position 0
Thread 0 uploading 17086279 Hash items (65.18 MiB) on device 0 from position 0
Entering the main simulation cycle
Simulation time t=8.200351e+00s, iteration=139,290, dt=5.909090e-05s, 17,086,279 parts (0, cum. 0 MIPPS), maxneibs 83+0
Simulation time t=8.300006e+00s, iteration=140,977, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 91+0
Simulation time t=8.400047e+00s, iteration=142,670, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 91+0
Simulation time t=8.500029e+00s, iteration=144,362, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 91+0
Simulation time t=8.600003e+00s, iteration=146,054, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 92+0
Simulation time t=8.700042e+00s, iteration=147,747, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=8.800022e+00s, iteration=149,439, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=8.900055e+00s, iteration=151,134, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=9.000036e+00s, iteration=152,826, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=9.100010e+00s, iteration=154,518, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=9.200050e+00s, iteration=156,211, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=9.300029e+00s, iteration=157,903, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=9.400006e+00s, iteration=159,595, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 96+0
Simulation time t=9.500047e+00s, iteration=161,288, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 97+0
Simulation time t=9.600018e+00s, iteration=162,980, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 97+0
Simulation time t=9.700022e+00s, iteration=164,674, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 97+0
Simulation time t=9.800039e+00s, iteration=166,367, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 97+0
Simulation time t=9.900003e+00s, iteration=168,059, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 97+0
Simulation time t=1.000004e+01s, iteration=169,752, dt=5.909090e-05s, 17,086,279 parts (14, cum. 14 MIPPS), maxneibs 97+0
Elapsed time of simulation cycle: 3.7e+04s
Peak particle speed was ~2.30357 m/s at 9.50005 s -> can set maximum vel 2.5 for this problem
Simulation end, cleaning up...
Deallocating...

The "git_branch.txt" output is

v5.0+custom
* master ec5e7b1 [origin/master] Further do generation fixes

The "make_show.txt" output is

GPUSPH version:  v5.0+custom
Platform:        Linux
Architecture:    x86_64
Current dir:     /home/user/gpusph
This Makefile:   /home/user/gpusph/Makefile
Used Makefiles:   Makefile Makefile.conf Makefile.local dep/command_type.d dep/HDF5SphReader.d dep/pugixml.d dep/simframework.d dep/GPUWorker.d dep/Synchronizer.d dep/VTUReader.d dep/ProblemCore.d dep/mai
n.d dep/Writer.d dep/base64.d dep/ParticleSystem.d dep/Options.d dep/GPUSPH.d dep/vector_print.d dep/Reader.d dep/Integrator.d dep/buffer_traits.d dep/debugflags.d dep/predcorr_alloc_policy.d dep/XYZReade
r.d dep/cuda/cudautil.d dep/geometries/Cube.d dep/geometries/Torus.d dep/geometries/STLMesh.d dep/geometries/Cylinder.d dep/geometries/Point.d dep/geometries/TopoCube.d dep/geometries/Object.d dep/geometr
ies/Vector.d dep/geometries/Disk.d dep/geometries/EulerParameters.d dep/geometries/Cone.d dep/geometries/Sphere.d dep/geometries/Rect.d dep/geometries/Plane.d dep/integrators/RepackingIntegrator.d dep/int
egrators/PredictorCorrectorIntegrator.d dep/problem_api/ProblemAPI_1.d dep/writers/UDPWriter.d dep/writers/CustomTextWriter.d dep/writers/CallbackWriter.d dep/writers/CommonWriter.d dep/writers/HotFile.d 
dep/writers/VTKWriter.d dep/writers/VTKLegacyWriter.d dep/writers/TextWriter.d dep/writers/HotWriter.d dep/NetworkManager.d dep/problems/BuoyancyTest.d dep/problems/ProblemExample.d dep/problems/WaveTank.
d dep/problems/user/MY_WaveTank.d dep/BuoyancyTest.gen.d dep/ProblemExample.gen.d dep/WaveTank.gen.d dep/MY_WaveTank.gen.d
Problem:         
Linearization:   yzx
Snapshot file:   ./GPUSPH-v5.0+custom-2019-06-13.tgz
Last problem:    MY_WaveTank
Sources dir:     src src/adaptors src/cuda src/geometries src/integrators src/problem_api src/problems src/writers
Options dir:     options
Objects dir:     build build/adaptors build/cuda build/geometries build/integrators build/problem_api build/problems build/problems/user build/writers
Scripts dir:     scripts
Docs dir:        docs
Doxygen conf:    
Verbose:         
Debug:           0
CXX:             g++
CXX version:     g++ (GCC) 6.3.0
MPICXX:          g++
nvcc:            /opt/apps/software/system/CUDA/10.1.105/bin/nvcc -ccbin=g++
nvcc version:    10.1
LINKER:          /opt/apps/software/system/CUDA/10.1.105/bin/nvcc -ccbin=g++
Compute cap.:    75
Fastmath:        0
USE_MPI:         0
USE_HDF5:        1
USE_CHRONO:      1
default paths:   /home/user/gpusph/as /home/user/gpusph/it /home/user/gpusph/is /home/user/gpusph/a /home/user/gpusph/non-system /home/user/gpusph/directory /home/user/gpusph/t
hat /home/user/gpusph/duplicates /home/user/gpusph/a /home/user/gpusph/system /home/user/gpusph/directory /home/user/gpusph/as /home/user/gpusph/it /home/user/gpusph/is /home/t
royheit/gpusph/a /home/user/gpusph/non-system /home/user/gpusph/directory /home/user/gpusph/that /home/user/gpusph/duplicates /home/user/gpusph/a /home/user/gpusph/system /home/tro
yheit/gpusph/directory /opt/apps/software/data/HDF5/1.10.5-iimpi-2018.4.274/include /opt/apps/software/tools/Szip/2.1.1-GCCcore-6.3.0/include /opt/apps/software/lib/zlib/1.2.11/include /opt/apps/software/
mpi/impi/2018.4.274-iccifort-2018.5.274-GCC-6.3.0-2.26/include64 /opt/apps/software/lib/libfabric/1.7.1/include /opt/apps/software/compiler/ifort/2018.5.274-GCC-6.3.0-2.26/include /opt/apps/software/compi
ler/icc/2018.5.274-GCC-6.3.0-2.26/compilers_and_libraries_2018.5.274/linux/tbb/include /opt/apps/software/tools/binutils/2.26-GCCcore-6.3.0/include /opt/apps/software/mpi/OpenMPI/3.1.2-GCC-8.2.0-2.31.1/in
clude /opt/apps/software/system/hwloc/1.11.11-GCCcore-8.2.0/include /opt/apps/software/system/libpciaccess/0.14-GCCcore-8.2.0/include /opt/apps/software/lib/libxml2/2.9.8-GCCcore-8.2.0/include/libxml2 /op
t/apps/software/lib/libxml2/2.9.8-GCCcore-8.2.0/include /opt/apps/software/tools/XZ/5.2.4-GCCcore-8.2.0/include /opt/apps/software/tools/numactl/2.0.12-GCCcore-8.2.0/include /opt/apps/software/system/CUDA
/10.1.105/nvvm/include /opt/apps/software/system/CUDA/10.1.105/extras/CUPTI/include /opt/apps/software/system/CUDA/10.1.105/include /opt/apps/software/devel/ncurses/6.1-GCCcore-7.3.0/include /opt/apps/sof
tware/math/Eigen/3.3.7/include /opt/apps/software/compiler/GCCcore/6.3.0/include/c++/6.3.0 /opt/apps/software/compiler/GCCcore/6.3.0/include/c++/6.3.0/x86_64-pc-linux-gnu /opt/apps/software/compiler/GCCco
re/6.3.0/include/c++/6.3.0/backward /opt/apps/software/compiler/GCCcore/6.3.0/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include /opt/apps/software/compiler/GCCcore/6.3.0/include /opt/apps/software/compiler/GCCcor
e/6.3.0/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include-fixed /usr/include
INCPATH:          -Isrc -Isrc/adaptors -Isrc/cuda -Isrc/geometries -Isrc/integrators -Isrc/problem_api -Isrc/problems -Isrc/writers -Isrc/problems -Isrc/problems/user -Ioptions -isystem /home/user/chr
ono/include -isystem /home/user/chrono/include -isystem /home/user/chrono/include/chrono -isystem /home/user/chrono/include/chrono/collision/bullet
LIBPATH:          -L/usr/local/lib -L/opt/apps/software/system/CUDA/10.1.105/lib64 -L/home/user/chrono/lib
LIBS:             -lcudart -L/opt/apps/software/data/HDF5/1.10.5-iimpi-2018.4.274/lib -lhdf5 -lsz -lz   -lpthread -lrt -lChronoEngine
LDFLAGS:          --linker-options -rpath,/home/user/chrono/lib  -L/usr/local/lib -L/opt/apps/software/system/CUDA/10.1.105/lib64 -L/home/user/chrono/lib -arch=sm_75
CPPFLAGS:          -Isrc -Isrc/adaptors -Isrc/cuda -Isrc/geometries -Isrc/integrators -Isrc/problem_api -Isrc/problems -Isrc/writers -Isrc/problems -Isrc/problems/user -Ioptions -isystem /home/user/ch
rono/include -isystem /home/user/chrono/include -isystem /home/user/chrono/include/chrono -isystem /home/user/chrono/include/chrono/collision/bullet -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MAC
ROS -D_GLIBCXX_USE_C99_MATH -DUSE_HDF5=1 -I/opt/apps/software/data/HDF5/1.10.5-iimpi-2018.4.274/include   -D__COMPUTE__=75
CXXFLAGS:         -m64 -std=c++11   -O3
CUFLAGS:          -arch=sm_75 --generate-line-info -std=c++11 --compiler-options -m64,-O3


The "summary.txt" output is

Simulation parameters:
 deltap = 0.005
 sfactor = 1.3
 slength = 0.0065
 kerneltype: 3 (Wendland)
 kernelradius = 2
 influenceRadius = 0.013
 SPH formulation: 1 (F1)
 multi-fluid support: disabled
 Rheology: Newtonian
	Turbulence model: Sub-particle scale
	Computational viscosity type: Kinematic
	Viscous model operator: Morris 1997
	Viscous averaging operator: Harmonic
	(constant viscosity optimizations)
 periodicity: 0 (none)
 initial dt = 5e-05
 simulation end time = 10
 neib list construction every 10 iterations
 Shepard filter every 20 iterations
 adaptive time stepping enabled
    safety factor for adaptive time step = 0.2
 internal energy computation disabled
 XSPH correction disabled
 moving bodies disabled
 open boundaries disabled
 water depth computation disabled
 time-dependent gravity disabled
 geometric boundaries: 
   DEM: disabled
   planes: enabled, 6 defined

Physical parameters:
 gravity = (0, 0, -9.81) [9.81] fixed
 numFluids = 1
 rho0[ 0 ] = 1000
 B[ 0 ] = 57142.9
 gamma[ 0 ] = 7
 sscoeff[ 0 ] = 20
 sspowercoeff[ 0 ] = 3
 sound speed[ 0 ] = 2.00601e+10
 partsurf = 0
 Lennard-Jones boundary parameters:
	r0 = 0.005
	d = 22.0725
	p1 = 12
	p2 = 6
Newtonian rheology with Sub-particle scale turbulence model. Parameters:
	Smagfactor = 3.6e-07
	kSPSfactor = 1.1e-07
	kinematicvisc[ 0 ] = 1e-06 (m^2/s)
	visc_consistency[ 0 ] = 0.001 (Pa^n s)
	visccoeff[ 0 ] = 1e-06 (m^2/s)

Comman-line options:
 problem: MY_WaveTank
 dem: 
 dir: /home/user/nfs_fs02/high_res
 deltap: 0.005
 tend: nan
 dt: nan
 hosts: 0
 saving enabled
 GPUDirect disabled
 striping disabled
 async network transfers disabled
 Other options:

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions