We are using go-tree-sitter as a library to parse a stream of source files and we've noticed that the process' memory appears to grow over time, in particular when encountering certain files. We sometimes see the memory (RSS) of the process jump to 700MiB - memory surges that do not appear to ever be reclaimed by the operating system.
Also, looking at pprof the memory is not tracked by the Go runtime which makes me suspect that there is a memory leak somewhere in the CGO interactions.
I've attached a small program and testfiles to reproduce the issue. Unzipping the attached zip should make it simple to reproduce.
leak-reproducer.zip
The program parses an input file N number of times (--N being a flag). It seems like memory is not always reclaimed when a parsed Tree is Closed. But the behavior is not consistent. The leak only seems to manifest with certain (bigger?) files.
For example, parsing a small file 100000 times only shows a memory use (RSS)
of around 15-20MiB on my system. There is also no obvious growth in memory use over time.
make
./leak-reproducer -N=100000 --wait testdata/small.go
However, trying to parse a larger file 100 times shows memory quickly grow to around 200-300MiB. The memory does not grow linearly but seems to make sudden jumps.
make
./leak-reproducer -N=100 --wait testdata/large.go
Profiling
This example uses the memleak tool from https://github.com/iovisor/bcc installed on ubuntu as memleak-bpfcc.
To run and profile use something like:
make
sudo memleak-bpfcc -c './leak-reproducer -N=30 --wait-start --wait testdata/large.go'
# 1. Wait for memleak-bpfcc to start outputting "Top 10 stacks with outstanding allocations:"
# before pressing RETURN (and starting the run).
# 2. Wait for all iterations (-N) to complete.
# 3. When you see "Press RETURN to terminate ..." the program is
# done and all memory should be reclaimed.
# 4. Inspect the "Top 10 stacks with outstanding allocations:" output
# before pressing RETURN to terminate.
What I see is output similar to the below.
I must admit that I'm not well-versed in profiling CGO code, but I'm hoping that this is helpful to (1) identify it there actually is a memory leak and (2) if so help pinpoint where memory is being leaked.
[12:46:04] Top 10 stacks with outstanding allocations:
262144 bytes in 1 allocations from stack
0x00000000008eea0a x_cgo_mmap+0x45 [leak-reproducer]
0x000000000049ec38 runtime.callCgoMmap.abi0+0x38 [leak-reproducer]
0x000000000041d35f runtime.mmap.func1+0x3f [leak-reproducer]
0x000000000041d29a runtime.mmap+0x5a [leak-reproducer]
0x000000000042fe33 runtime.sysAllocOS+0x33 [leak-reproducer]
0x000000000042faaa runtime.sysAlloc+0x4a [leak-reproducer]
0x0000000000429f5f runtime.persistentalloc1+0xff [leak-reproducer]
0x0000000000429e48 runtime.persistentalloc.func1+0x28 [leak-reproducer]
0x0000000000429e05 runtime.persistentalloc+0x45 [leak-reproducer]
0x0000000000433777 runtime.(*fixalloc).alloc+0x77 [leak-reproducer]
0x00000000004478c9 runtime.(*mheap).allocMSpanLocked+0xa9 [leak-reproducer]
0x0000000000447c12 runtime.(*mheap).allocSpan+0x232 [leak-reproducer]
0x000000000044743f runtime.(*mheap).allocManual+0x3f [leak-reproducer]
0x0000000000477c25 runtime.stackalloc+0x125 [leak-reproducer]
0x000000000046751f runtime.malg.func1+0x1f [leak-reproducer]
0x00000000004674a5 runtime.malg+0x65 [leak-reproducer]
0x000000000045855d runtime.mpreinit+0x1d [leak-reproducer]
0x000000000045fee5 runtime.mcommoninit+0xc5 [leak-reproducer]
0x0000000000462232 runtime.allocm+0xb2 [leak-reproducer]
0x0000000000462cba runtime.newm+0x3a [leak-reproducer]
0x000000000046330b runtime.startm+0x16b [leak-reproducer]
0x0000000000495fc7 runtime.wakep+0x87 [leak-reproducer]
0x000000000046547c runtime.resetspinning+0x3c [leak-reproducer]
0x00000000004658a5 runtime.schedule+0x145 [leak-reproducer]
0x0000000000465c19 runtime.park_m+0x1d9 [leak-reproducer]
0x000000000049b595 runtime.mcall+0x55 [leak-reproducer]
524288 bytes in 2 allocations from stack
0x00000000008eea0a x_cgo_mmap+0x45 [leak-reproducer]
0x000000000049ec38 runtime.callCgoMmap.abi0+0x38 [leak-reproducer]
0x000000000041d35f runtime.mmap.func1+0x3f [leak-reproducer]
0x000000000041d29a runtime.mmap+0x5a [leak-reproducer]
0x000000000042fe33 runtime.sysAllocOS+0x33 [leak-reproducer]
0x000000000042faaa runtime.sysAlloc+0x4a [leak-reproducer]
0x0000000000429f5f runtime.persistentalloc1+0xff [leak-reproducer]
0x0000000000429e48 runtime.persistentalloc.func1+0x28 [leak-reproducer]
0x000000000049b60a runtime.systemstack.abi0+0x4a [leak-reproducer]
0x0000000000429e05 runtime.persistentalloc+0x45 [leak-reproducer]
0x0000000000454b49 runtime.(*spanSetBlockAlloc).alloc+0x49 [leak-reproducer]
0x00000000004545f3 runtime.(*spanSet).push+0x153 [leak-reproducer]
0x00000000004446de runtime.(*sweepLocked).sweep+0x75e [leak-reproducer]
0x0000000000443cc5 runtime.sweepone+0xc5 [leak-reproducer]
0x0000000000443a77 runtime.bgsweep+0xb7 [leak-reproducer]
0x0000000000433a77 runtime.gcenable.gowrap1+0x17 [leak-reproducer]
0x000000000049d1e1 runtime.goexit.abi0+0x1 [leak-reproducer]
786432 bytes in 3 allocations from stack
0x00000000008eea0a x_cgo_mmap+0x45 [leak-reproducer]
0x000000000049ec38 runtime.callCgoMmap.abi0+0x38 [leak-reproducer]
0x000000000041d35f runtime.mmap.func1+0x3f [leak-reproducer]
0x000000000041d29a runtime.mmap+0x5a [leak-reproducer]
0x000000000042fe33 runtime.sysAllocOS+0x33 [leak-reproducer]
0x000000000042faaa runtime.sysAlloc+0x4a [leak-reproducer]
0x0000000000429f5f runtime.persistentalloc1+0xff [leak-reproducer]
0x0000000000429e48 runtime.persistentalloc.func1+0x28 [leak-reproducer]
0x0000000000429e05 runtime.persistentalloc+0x45 [leak-reproducer]
0x0000000000433777 runtime.(*fixalloc).alloc+0x77 [leak-reproducer]
0x00000000004478c9 runtime.(*mheap).allocMSpanLocked+0xa9 [leak-reproducer]
0x0000000000447c12 runtime.(*mheap).allocSpan+0x232 [leak-reproducer]
0x000000000044743f runtime.(*mheap).allocManual+0x3f [leak-reproducer]
0x0000000000445e4d runtime.getempty.func1+0x2d [leak-reproducer]
0x0000000000445d3d runtime.getempty+0xfd [leak-reproducer]
0x0000000000445474 runtime.(*gcWork).init+0x14 [leak-reproducer]
0x000000000044558c runtime.(*gcWork).putObj+0xcc [leak-reproducer]
0x000000000043b3e5 runtime.greyobject+0x1e5 [leak-reproducer]
0x000000000043aeab runtime.scanblock+0x14b [leak-reproducer]
0x0000000000438948 runtime.markrootBlock+0x88 [leak-reproducer]
0x000000000043871c runtime.markroot+0x3fc [leak-reproducer]
0x000000000043a9c6 runtime.gcDrain+0x4a6 [leak-reproducer]
0x000000000043a3c5 runtime.gcDrainMarkWorkerDedicated+0x25 [leak-reproducer]
0x000000000043638c runtime.gcBgMarkWorker.func2+0x8c [leak-reproducer]
0x000000000049b60a runtime.systemstack.abi0+0x4a [leak-reproducer]
0x000000000043619f runtime.gcBgMarkWorker+0x1bf [leak-reproducer]
0x0000000000435f97 runtime.gcBgMarkStartWorkers.gowrap1+0x17 [leak-reproducer]
0x000000000049d1e1 runtime.goexit.abi0+0x1 [leak-reproducer]
8388608 bytes in 2 allocations from stack
0x00000000008eea0a x_cgo_mmap+0x45 [leak-reproducer]
0x000000000049ec38 runtime.callCgoMmap.abi0+0x38 [leak-reproducer]
0x000000000041d35f runtime.mmap.func1+0x3f [leak-reproducer]
0x000000000041d29a runtime.mmap+0x5a [leak-reproducer]
0x000000000043045d runtime.sysMapOS+0x3d [leak-reproducer]
0x000000000042fda5 runtime.sysMap+0x45 [leak-reproducer]
0x00000000004486a5 runtime.(*mheap).grow+0x325 [leak-reproducer]
0x0000000000447b9e runtime.(*mheap).allocSpan+0x1be [leak-reproducer]
0x00000000004473df runtime.(*mheap).alloc.func1+0x5f [leak-reproducer]
0x000000000049b60a runtime.systemstack.abi0+0x4a [leak-reproducer]
0x0000000000447337 runtime.(*mheap).alloc+0x57 [leak-reproducer]
0x000000000042cec8 runtime.(*mcache).allocLarge+0x88 [leak-reproducer]
0x000000000042972d runtime.mallocgcLarge+0x6d [leak-reproducer]
0x0000000000493dd6 runtime.mallocgc+0x116 [leak-reproducer]
0x0000000000498106 runtime.makeslice+0x86 [leak-reproducer]
0x0000000000517c7a os.readFileContents+0xba [leak-reproducer]
0x0000000000517936 os.ReadFile+0x1f6 [leak-reproducer]
0x00000000007c3a49 main.parseFile+0x69 [leak-reproducer]
0x00000000007c385e main.main+0x3be [leak-reproducer]
0x000000000045e587 runtime.main+0x267 [leak-reproducer]
0x000000000049d1e1 runtime.goexit.abi0+0x1 [leak-reproducer]
8392704 bytes in 1 allocations from stack
0x000070441249d537 __pthread_create_2_1+0x977 [libc.so.6]
0x00000000008ee733 _cgo_try_pthread_create+0x3d [leak-reproducer]
0x00000000008ee927 _cgo_sys_thread_start+0xac [leak-reproducer]
0x00000000008eee53 x_cgo_thread_start+0x7a [leak-reproducer]
0x000000000049ceca runtime.asmcgocall.abi0+0xaa [leak-reproducer]
0x0000000000462e73 runtime.newm1+0x93 [leak-reproducer]
0x0000000000462d7c runtime.newm+0xfc [leak-reproducer]
0x000000000046166a runtime.startTheWorldWithSema+0x12a [leak-reproducer]
0x0000000000434557 runtime.gcStart.func4+0x37 [leak-reproducer]
0x000000000049b60a runtime.systemstack.abi0+0x4a [leak-reproducer]
0x00000000004343df runtime.gcStart+0x53f [leak-reproducer]
0x0000000000429814 runtime.mallocgcLarge+0x154 [leak-reproducer]
0x0000000000493dd6 runtime.mallocgc+0x116 [leak-reproducer]
0x000000000047b397 runtime.slicebytetostring+0x77 [leak-reproducer]
0x00000000007c14d9 github.com/tree-sitter/go-tree-sitter.readUTF8+0x159 [leak-reproducer]
0x00000000007c2d2c _cgoexp_ee3fac1ee088_readUTF8+0x6c [leak-reproducer]
0x000000000041dc08 runtime.cgocallbackg1+0x2a8 [leak-reproducer]
0x000000000041d87e runtime.cgocallbackg+0x11e [leak-reproducer]
0x000000000049f629 runtime.cgocallbackg.abi0+0x29 [leak-reproducer]
0x000000000049cfac runtime.cgocallback.abi0+0xcc [leak-reproducer]
0x00000000005c90e1 crosscall2+0x41 [leak-reproducer]
0x00007ffd05d93430 [unknown]
0x00000000008bf21e readUTF8+0xaa [leak-reproducer]
0x00000000008c73da ts_lexer__get_chunk+0x51 [leak-reproducer]
0x00000000008c80a7 ts_lexer_start+0x89 [leak-reproducer]
0x00000000008cda78 ts_parser__lex+0x551 [leak-reproducer]
0x00000000008d16d8 ts_parser__advance+0x144 [leak-reproducer]
0x00000000008d351b ts_parser_parse+0x616 [leak-reproducer]
0x00000000008d3956 ts_parser_parse_with_options+0x78 [leak-reproducer]
0x00000000008c18f9 _cgo_ee3fac1ee088_Cfunc_ts_parser_parse_with_options+0x78 [leak-reproducer]
0x000000000049ce84 runtime.asmcgocall.abi0+0x64 [leak-reproducer]
0x000000000049315f runtime.cgocall+0x7f [leak-reproducer]
0x00000000007bfd0a github.com/tree-sitter/go-tree-sitter._Cfunc_ts_parser_parse_with_options.abi0+0x4a [leak-reproducer]
0x00000000007c1d48 github.com/tree-sitter/go-tree-sitter.(*Parser).ParseWithOptions.func2+0x248 [leak-reproducer]
0x00000000007c1a4e github.com/tree-sitter/go-tree-sitter.(*Parser).ParseWithOptions+0x38e [leak-reproducer]
0x00000000007c1232 github.com/tree-sitter/go-tree-sitter.(*Parser).Parse+0xb2 [leak-reproducer]
0x00000000007c3e6a main.parseFile+0x48a [leak-reproducer]
0x00000000007c385e main.main+0x3be [leak-reproducer]
0x000000000045e587 runtime.main+0x267 [leak-reproducer]
0x000000000049d1e1 runtime.goexit.abi0+0x1 [leak-reproducer]
8392704 bytes in 1 allocations from stack
0x000070441249d537 __pthread_create_2_1+0x977 [libc.so.6]
0x00000000008ee733 _cgo_try_pthread_create+0x3d [leak-reproducer]
0x00000000008ee927 _cgo_sys_thread_start+0xac [leak-reproducer]
0x00000000008eee53 x_cgo_thread_start+0x7a [leak-reproducer]
0x000000000049ceca runtime.asmcgocall.abi0+0xaa [leak-reproducer]
0x0000000000462e73 runtime.newm1+0x93 [leak-reproducer]
0x0000000000462d7c runtime.newm+0xfc [leak-reproducer]
0x000000000046330b runtime.startm+0x16b [leak-reproducer]
0x0000000000495fc7 runtime.wakep+0x87 [leak-reproducer]
0x000000000046174a runtime.startTheWorldWithSema+0x20a [leak-reproducer]
0x0000000000434557 runtime.gcStart.func4+0x37 [leak-reproducer]
0x000000000049b60a runtime.systemstack.abi0+0x4a [leak-reproducer]
0x00000000004343df runtime.gcStart+0x53f [leak-reproducer]
0x0000000000429814 runtime.mallocgcLarge+0x154 [leak-reproducer]
0x0000000000493dd6 runtime.mallocgc+0x116 [leak-reproducer]
0x000000000047b397 runtime.slicebytetostring+0x77 [leak-reproducer]
0x00000000007c14d9 github.com/tree-sitter/go-tree-sitter.readUTF8+0x159 [leak-reproducer]
0x00000000007c2d2c _cgoexp_ee3fac1ee088_readUTF8+0x6c [leak-reproducer]
0x000000000041dc08 runtime.cgocallbackg1+0x2a8 [leak-reproducer]
0x000000000041d87e runtime.cgocallbackg+0x11e [leak-reproducer]
0x000000000049f629 runtime.cgocallbackg.abi0+0x29 [leak-reproducer]
0x000000000049cfac runtime.cgocallback.abi0+0xcc [leak-reproducer]
0x00000000005c90e1 crosscall2+0x41 [leak-reproducer]
0x00007ffd05d93430 [unknown]
0x00000000008bf21e readUTF8+0xaa [leak-reproducer]
0x00000000008c73da ts_lexer__get_chunk+0x51 [leak-reproducer]
0x00000000008c80a7 ts_lexer_start+0x89 [leak-reproducer]
0x00000000008cda78 ts_parser__lex+0x551 [leak-reproducer]
0x00000000008d16d8 ts_parser__advance+0x144 [leak-reproducer]
0x00000000008d351b ts_parser_parse+0x616 [leak-reproducer]
0x00000000008d3956 ts_parser_parse_with_options+0x78 [leak-reproducer]
0x00000000008c18f9 _cgo_ee3fac1ee088_Cfunc_ts_parser_parse_with_options+0x78 [leak-reproducer]
0x000000000049ce84 runtime.asmcgocall.abi0+0x64 [leak-reproducer]
0x000000000049315f runtime.cgocall+0x7f [leak-reproducer]
0x00000000007bfd0a github.com/tree-sitter/go-tree-sitter._Cfunc_ts_parser_parse_with_options.abi0+0x4a [leak-reproducer]
0x00000000007c1d48 github.com/tree-sitter/go-tree-sitter.(*Parser).ParseWithOptions.func2+0x248 [leak-reproducer]
0x00000000007c1a4e github.com/tree-sitter/go-tree-sitter.(*Parser).ParseWithOptions+0x38e [leak-reproducer]
0x00000000007c1232 github.com/tree-sitter/go-tree-sitter.(*Parser).Parse+0xb2 [leak-reproducer]
0x00000000007c3e6a main.parseFile+0x48a [leak-reproducer]
0x00000000007c385e main.main+0x3be [leak-reproducer]
0x000000000045e587 runtime.main+0x267 [leak-reproducer]
0x000000000049d1e1 runtime.goexit.abi0+0x1 [leak-reproducer]
8392704 bytes in 1 allocations from stack
0x000070441249d537 __pthread_create_2_1+0x977 [libc.so.6]
0x00000000008ee733 _cgo_try_pthread_create+0x3d [leak-reproducer]
0x00000000008ee927 _cgo_sys_thread_start+0xac [leak-reproducer]
0x00000000008eee53 x_cgo_thread_start+0x7a [leak-reproducer]
0x000000000049ceca runtime.asmcgocall.abi0+0xaa [leak-reproducer]
0x0000000000462e73 runtime.newm1+0x93 [leak-reproducer]
0x0000000000462d7c runtime.newm+0xfc [leak-reproducer]
0x000000000046330b runtime.startm+0x16b [leak-reproducer]
0x0000000000463425 runtime.handoffp+0x45 [leak-reproducer]
0x000000000046372a runtime.stoplockedm+0x6a [leak-reproducer]
0x000000000046579a runtime.schedule+0x3a [leak-reproducer]
0x0000000000465fb3 runtime.preemptPark+0xf3 [leak-reproducer]
0x000000000047932b runtime.newstack+0x3cb [leak-reproducer]
0x000000000049b71d runtime.morestack.abi0+0x7d [leak-reproducer]
16785408 bytes in 2 allocations from stack
0x000070441249d537 __pthread_create_2_1+0x977 [libc.so.6]
0x00000000008ee733 _cgo_try_pthread_create+0x3d [leak-reproducer]
0x00000000008ee927 _cgo_sys_thread_start+0xac [leak-reproducer]
0x00000000008eee53 x_cgo_thread_start+0x7a [leak-reproducer]
0x000000000049ceca runtime.asmcgocall.abi0+0xaa [leak-reproducer]
0x0000000000462e73 runtime.newm1+0x93 [leak-reproducer]
0x0000000000462d7c runtime.newm+0xfc [leak-reproducer]
0x000000000046330b runtime.startm+0x16b [leak-reproducer]
0x0000000000495fc7 runtime.wakep+0x87 [leak-reproducer]
0x000000000046547c runtime.resetspinning+0x3c [leak-reproducer]
0x00000000004658a5 runtime.schedule+0x145 [leak-reproducer]
0x0000000000465c19 runtime.park_m+0x1d9 [leak-reproducer]
0x000000000049b595 runtime.mcall+0x55 [leak-reproducer]
134217728 bytes in 2 allocations from stack
0x00007044124aa034 alloc_new_heap+0x84 [libc.so.6]
0x00007044124aa599 arena_get2.part.0+0x299 [libc.so.6]
0x00007044124aceb9 tcache_init.part.0+0xa9 [libc.so.6]
0x00007044124ade52 __libc_free+0x102 [libc.so.6]
0x00000000008ee9a4 threadentry+0x37 [leak-reproducer]
0x000070441249caa4 start_thread+0x384 [libc.so.6]
0x0000704412529c6c __GI___clone3+0x2c [libc.so.6]
268435456 bytes in 2 allocations from stack
0x00007044124aa0dc alloc_new_heap+0x12c [libc.so.6]
0x00007044124aa599 arena_get2.part.0+0x299 [libc.so.6]
0x00007044124aceb9 tcache_init.part.0+0xa9 [libc.so.6]
0x00007044124ade52 __libc_free+0x102 [libc.so.6]
0x00000000008ee9a4 threadentry+0x37 [leak-reproducer]
0x000070441249caa4 start_thread+0x384 [libc.so.6]
0x0000704412529c6c __GI___clone3+0x2c [libc.so.6]
We are using
go-tree-sitteras a library to parse a stream of source files and we've noticed that the process' memory appears to grow over time, in particular when encountering certain files. We sometimes see the memory (RSS) of the process jump to 700MiB - memory surges that do not appear to ever be reclaimed by the operating system.Also, looking at
pprofthe memory is not tracked by the Go runtime which makes me suspect that there is a memory leak somewhere in the CGO interactions.I've attached a small program and testfiles to reproduce the issue. Unzipping the attached zip should make it simple to reproduce.
leak-reproducer.zip
The program parses an input file N number of times (
--Nbeing a flag). It seems like memory is not always reclaimed when a parsedTreeisClosed. But the behavior is not consistent. The leak only seems to manifest with certain (bigger?) files.For example, parsing a small file 100000 times only shows a memory use (
RSS)of around 15-20MiB on my system. There is also no obvious growth in memory use over time.
However, trying to parse a larger file 100 times shows memory quickly grow to around 200-300MiB. The memory does not grow linearly but seems to make sudden jumps.
Profiling
This example uses the memleak tool from https://github.com/iovisor/bcc installed on ubuntu as
memleak-bpfcc.To run and profile use something like:
What I see is output similar to the below.
I must admit that I'm not well-versed in profiling CGO code, but I'm hoping that this is helpful to (1) identify it there actually is a memory leak and (2) if so help pinpoint where memory is being leaked.