Description
Summary of Problem
Description:
This was originally reported by a user in a more interesting and complicated code. In that code, the error would occur usually after running for 1-3 hours. This greatly simplified version of the code takes much longer to encounter the failure due to not creating nearly as many bigints, but will still encounter the failure eventually.
The problem seems to:
- be limited to ARM Macs
- The same program and settings on linux did not seem to fail.
- only occur when both
CHPL_TARGET_MEM=cstdlib
andCHPL_TASKS=fifo
- Using qthreads+cstdlib, or jemalloc+fifo does not exhibit the behavior
The problem also did not seem to occur when compiling with ASAN.
I believe the problem is independent of the method of initializing bigints - I tried replacing all mpz_init_set*(...);
calls in BigInteger.chpl's initializers with mpz_init(...); mpz_set*(...);
, but the problem still occurred (with the user's program). I also tried commenting out our call to mp_set_memory_functions
so that we were relying more purely on GMP's memory handling, and that did not cause the problem to go away (with the user's program).
Output:
bigintBug(55415,0x16fe0c000) malloc: *** error for object 0x600001e14010: pointer being freed was not allocated
bigintBug(55415,0x16fe14000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe04000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe30000) malloc: *** error for object 0x600001e14000: pointer being freed was not allocated
bigintBug(55415,0x16fe30000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe0c000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe2c000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe04000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe14000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe2c000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe18000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe18000) malloc: *** set a breakpoint in malloc_error_break to debug
Is this issue currently blocking your progress?
no, other settings can be used so far as I am aware
Steps to Reproduce
Source Code:
use BigInteger;
config const n = 22;
config const numTasks = here.numPUs();
proc foo(n: int) {
coforall worker in 0..numTasks-1 {
while true {
var someBigint: bigint = n + worker;
}
}
}
proc main() {
foo(n);
}
Compile command:
chpl --fast bigintBug.chpl
Note that I use --fast
to make the problem appear more quickly. The problem still occurs without --fast
.
Execution command:
./bigintBug
Associated Future Test(s):
I do not intend to file a future test, given how long it takes to exhibit the failure behavior (multiple hours for the user's program, and up to multiple days sometimes with this simplified version).
Configuration Information
- Output of
chpl --version
: chpl 2.4.0 pre-release and chpl 2.5.0 pre-release. Has been consistently reproducible since mid-February, but is likely an older issue. - Output of
$CHPL_HOME/util/printchplenv --anonymize
:
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang
CHPL_TARGET_ARCH: arm64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: fifo *
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_TARGET_MEM: cstdlib *
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: none
CHPL_RE2: bundled
CHPL_LLVM: none *
CHPL_AUX_FILESYS: none - Back-end compiler and version, e.g.
gcc --version
orclang --version
: Apple clang version 16.0.0 (clang-1600.0.26.3) - (For Cray systems only) Output of
module list
: N/A
Activity
lydia-duncan commentedon Mar 28, 2025
My next step in investigating this problem is to see if I can make a pure C program that reproduces the issue.