Skip to content

Conversation

@c-dilks
Copy link
Member

@c-dilks c-dilks commented Nov 29, 2023

resolve #151

  • epics_xy
    • MYA payload is too large; downsample by using l option in myquery
    • Fits are unstable -> use histogram stats instead of a fit
  • ftof_tdcadc_p1a, ftof_tdcadc_p1b, and ftof_tdcadc_p2
    • problem: memory-resident histograms are too high resolution
      • out of memory occurs after 419 files * 6 sectors * 3 timelines = 7532 histograms, each with ~16,000 bins
      • increase memory allocation pool from 1GB to 2GB makes RG-C work
      • default number of threads is 8, so this is a 16 GB allocation... maybe too much
      • can't really reduce the histogram range or binning, since the bin width must remain as is and the range is set such that outliers are captured (e.g., early RG-A sp18)
    • solution:
      • new function HistoUtil.zoomHisto truncates the "empty" parts of a histogram, focusing on the region with filled bins only; for a test histogram, this zoomed histogram has around 45% of the bins
      • if we still use too much memory for large datasets, run those timelines with the new option set $TIMELINE_JAVA_OPTS_HIGHMEM, which will increase the mempool allocation (defined in this PR, but not yet used)
  • FTOF ftof_tdcadc_p2: now that they work, we see the fits are bad
    • stabilized by fixing initial RMS (as p1a and p1b do) and reducing the fit range

@c-dilks
Copy link
Member Author

c-dilks commented Dec 6, 2023

FTOF memory fixed, however, P2 fits are often bad

@c-dilks c-dilks changed the title fix: memory leaks for FTOF and EPICS fix: OutOfMemoryError for FTOF and EPICS Dec 6, 2023
@c-dilks
Copy link
Member Author

c-dilks commented Dec 6, 2023

FTOF memory fixed, however, P2 fits are often bad

fixed

@c-dilks c-dilks marked this pull request as ready for review December 6, 2023 16:18
@c-dilks c-dilks enabled auto-merge (squash) December 6, 2023 16:18
@c-dilks c-dilks merged commit 222d60d into main Dec 6, 2023
@c-dilks c-dilks deleted the fix-leak branch December 6, 2023 16:27
@Sangbaek
Copy link
Collaborator

Sangbaek commented May 2, 2025

I had a similar issue and I just solved that by removing the maximum memory request (98d14b4)
Is this a bad idea?

@c-dilks
Copy link
Member Author

c-dilks commented May 3, 2025

Step 1 runs on Slurm nodes, so we have to keep the heap allocation max under the memory request size, otherwise Slurm will kill jobs that go over the limit.

Step 2 runs on the interactive node (though #290 can do it on Slurm), which we have to share with many other users, so it's wise to keep an upper limit and not hog resources.

I don't know if there's a smarter way, maybe one can improve the garbage collection; we also use -Xmx in practically every coatjava/bin script which wraps a java command.

@Sangbaek
Copy link
Collaborator

Thanks. I reverted my change for the memory. (c02841d)

Indeed, 1.5 G was enough to run the monitoring and timeline. I had to optimize the codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

OutOfMemoryError for some EPICS and FTOF timelines

3 participants