Skip to content

[Bug]: Bigints eventually run into heap corruption errors when large numbers of them are created and destroyed in some configurations #27014

Open
@lydia-duncan

Description

@lydia-duncan

Summary of Problem

Description:
This was originally reported by a user in a more interesting and complicated code. In that code, the error would occur usually after running for 1-3 hours. This greatly simplified version of the code takes much longer to encounter the failure due to not creating nearly as many bigints, but will still encounter the failure eventually.

The problem seems to:

  • be limited to ARM Macs
    • The same program and settings on linux did not seem to fail.
  • only occur when both CHPL_TARGET_MEM=cstdlib and CHPL_TASKS=fifo
    • Using qthreads+cstdlib, or jemalloc+fifo does not exhibit the behavior

The problem also did not seem to occur when compiling with ASAN.

I believe the problem is independent of the method of initializing bigints - I tried replacing all mpz_init_set*(...); calls in BigInteger.chpl's initializers with mpz_init(...); mpz_set*(...);, but the problem still occurred (with the user's program). I also tried commenting out our call to mp_set_memory_functions so that we were relying more purely on GMP's memory handling, and that did not cause the problem to go away (with the user's program).

Output:

bigintBug(55415,0x16fe0c000) malloc: *** error for object 0x600001e14010: pointer being freed was not allocated
bigintBug(55415,0x16fe14000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe04000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe30000) malloc: *** error for object 0x600001e14000: pointer being freed was not allocated
bigintBug(55415,0x16fe30000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe0c000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe2c000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe04000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe14000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe2c000) malloc: *** set a breakpoint in malloc_error_break to debug
bigintBug(55415,0x16fe18000) malloc: Heap corruption detected, free list is damaged at 0x600001e13ff0
*** Incorrect guard value: 0
bigintBug(55415,0x16fe18000) malloc: *** set a breakpoint in malloc_error_break to debug

Is this issue currently blocking your progress?
no, other settings can be used so far as I am aware

Steps to Reproduce

Source Code:

use BigInteger;

config const n = 22;
config const numTasks = here.numPUs();

proc foo(n: int) {
  coforall worker in 0..numTasks-1 {
    while true {
      var someBigint: bigint = n + worker;
    }
  }
}

proc main() {
  foo(n);
}

Compile command:
chpl --fast bigintBug.chpl

Note that I use --fast to make the problem appear more quickly. The problem still occurs without --fast.

Execution command:
./bigintBug

Associated Future Test(s):
I do not intend to file a future test, given how long it takes to exhibit the failure behavior (multiple hours for the user's program, and up to multiple days sometimes with this simplified version).

Configuration Information

  • Output of chpl --version: chpl 2.4.0 pre-release and chpl 2.5.0 pre-release. Has been consistently reproducible since mid-February, but is likely an older issue.
  • Output of $CHPL_HOME/util/printchplenv --anonymize:
    CHPL_TARGET_PLATFORM: darwin
    CHPL_TARGET_COMPILER: clang
    CHPL_TARGET_ARCH: arm64
    CHPL_TARGET_CPU: native
    CHPL_LOCALE_MODEL: flat
    CHPL_COMM: none
    CHPL_TASKS: fifo *
    CHPL_LAUNCHER: none
    CHPL_TIMERS: generic
    CHPL_UNWIND: none
    CHPL_TARGET_MEM: cstdlib *
    CHPL_ATOMICS: cstdlib
    CHPL_GMP: bundled
    CHPL_HWLOC: none
    CHPL_RE2: bundled
    CHPL_LLVM: none *
    CHPL_AUX_FILESYS: none
  • Back-end compiler and version, e.g. gcc --version or clang --version: Apple clang version 16.0.0 (clang-1600.0.26.3)
  • (For Cray systems only) Output of module list: N/A

Activity

lydia-duncan

lydia-duncan commented on Mar 28, 2025

@lydia-duncan
MemberAuthor

My next step in investigating this problem is to see if I can make a pure C program that reproduces the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      [Bug]: Bigints eventually run into heap corruption errors when large numbers of them are created and destroyed in some configurations · Issue #27014 · chapel-lang/chapel