Skip to content

Conversation

@jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Oct 21, 2025

Contributes to #6231

valgrind errors can be "suppressed" (treated as false positives") via a "suppression file". This proposes running valgrind with option --gen-suppressions=all, so that alongside each error it finds it'll also print configuration that could be added to such a file to suppress it.

Notes for Reviewers

Opening this from my fork to also test #7035 (comment)

How I tested this

After merging #7072, triggered a run like this:

gh workflow run \
    --repo Microsoft/LightGBM \
    r_valgrind.yml \
    -f pr-branch='jameslamb:ci/valgrind-suppressions' \
    -f pr-number=7068

That worked as expected: #7068 (comment)

And I see the valgrind suppression stuff in the logs, like this:

==6069== 368 bytes in 1 blocks are possibly lost in loss record 176 of 2,718
==6069==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==6069==    by 0x40147D9: calloc (rtld-malloc.h:44)
==6069==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==6069==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==6069==    by 0x4DAD7B4: allocate_stack (allocatestack.c:430)
==6069==    by 0x4DAD7B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==6069==    by 0x75B8328: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==6069==    by 0x157F2A34: std::thread::thread<LightGBM::PipelineReader::Read(char const*, int, std::function<unsigned long (char const*, unsigned long)> const&)::{lambda()#1}, , void>(LightGBM::PipelineReader::Read(char const*, int, std::function<unsigned long (char const*, unsigned long)> const&)::{lambda()#1}&&) (std_thread.h:163)
==6069==    by 0x157F1E4D: LightGBM::PipelineReader::Read(char const*, int, std::function<unsigned long (char const*, unsigned long)> const&) (pipeline_reader.h:56)
==6069==    by 0x157F5737: LightGBM::TextReader<int>::ReadAllAndProcess(std::function<void (int, char const*, unsigned long)> const&) (text_reader.h:103)
==6069==    by 0x157F3315: LightGBM::TextReader<int>::ReadAllLines() (text_reader.h:162)
==6069==    by 0x157E897F: LightGBM::DatasetLoader::LoadTextDataToMemory[abi:cxx11](char const*, LightGBM::Metadata const&, int, int, int*, std::vector<int, std::allocator<int> >*) (dataset_loader.cpp:976)
==6069==    by 0x157E3E90: LightGBM::DatasetLoader::LoadFromFile(char const*, int, int) (dataset_loader.cpp:238)
==6069==    by 0x15D1B673: LightGBM::DatasetLoader::LoadFromFile(char const*) (dataset_loader.h:26)
==6069==    by 0x15D079A3: LGBM_DatasetCreateFromFile (c_api.cpp:1051)
==6069== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: possible
   fun:calloc
   fun:calloc
   fun:allocate_dtv
   fun:_dl_allocate_tls
   fun:allocate_stack
   fun:pthread_create@@GLIBC_2.34
   fun:_ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE
   fun:_ZNSt6threadC1IZN8LightGBM14PipelineReader4ReadEPKciRKSt8functionIFmS4_mEEEUlvE_JEvEEOT_DpOT0_
   fun:_ZN8LightGBM14PipelineReader4ReadEPKciRKSt8functionIFmS2_mEE
   fun:_ZN8LightGBM10TextReaderIiE17ReadAllAndProcessERKSt8functionIFviPKcmEE
   fun:_ZN8LightGBM10TextReaderIiE12ReadAllLinesEv
   fun:_ZN8LightGBM13DatasetLoader20LoadTextDataToMemoryB5cxx11EPKcRKNS_8MetadataEiiPiPSt6vectorIiSaIiEE
   fun:_ZN8LightGBM13DatasetLoader12LoadFromFileEPKcii
   fun:_ZN8LightGBM13DatasetLoader12LoadFromFileEPKc
   fun:LGBM_DatasetCreateFromFile
}

@jameslamb
Copy link
Collaborator Author

Tried triggering it like this:

image

Which produced this run: https://github.com/microsoft/LightGBM/actions/runs/18674203178

Let's see how it goes.

@github-actions

This comment was marked as resolved.

@StrikerRUS
Copy link
Collaborator

Seems that something is wrong with Optional check:

Run # ref: https://docs.github.com/en/actions/reference/workflows-and-actions/contexts#github-context
checking status for branch 'ci/valgrind-suppressions'
Searching for latest run of 'r_valgrind.yml' on branch 'ci/valgrind-suppressions'
No runs of 'r_valgrind.yml' found on branch 'ci/valgrind-suppressions'

https://github.com/microsoft/LightGBM/actions/runs/18674187612/job/53240748669?pr=7068

I guess that the check was run before R valgrind workflow.

This ordering problem of workflows was the reason for the following code in previous ChatOps implementation:

all-optional-checks-successful:
timeout-minutes: 120

while True:
status = get_status(get_runs(trigger_phrase))
if status != "in-progress":
break
sleep(60)

@jameslamb
Copy link
Collaborator Author

I don't think it's an ordering problem. Polling like that should never be necessary because the valgrind workflow imperatively re-triggers the optional_checks workflow after it's done testing.

- name: Run tests with valgrind
shell: bash
run: ./.ci/test-r-package-valgrind.sh

- name: Rerun workflow-indicator
if: ${{ always() }}
run: |
bash $GITHUB_WORKSPACE/.ci/rerun-workflow.sh \
"optional_checks.yml" \
"${{ inputs.pr-number }}" \
"${{ inputs.pr-branch }}" \
|| true

I think that the optional-checks workflow was never just re-run.

The earliest log message for the Optional Checks run here is from 2025-10-21T05:46:12.3329454Z (build logs).

The valgrind run started about 1 minute later, at 2025-10-21T05:47:06.2482264Z (build logs).

So there is probably an issue with re-triggering the optional_checks workflow.

Thankfully with this new pattern, we can test that without needing a merge to master. I'll put up a separate LightGBM (non-fork) branch for testing the changes.

@jameslamb
Copy link
Collaborator Author

I think there's a secondary problem here... for runs from forks, the workflow run shows up as "on" master.

Screenshot 2025-10-21 at 11 53 07 PM

ref: https://github.com/microsoft/LightGBM/actions/workflows/r_valgrind.yml

That property comes from the "use workflow from:" dropdown here:

Screenshot 2025-10-21 at 11 55 00 PM

I hadn't noticed this in #7035 because there I was using a microsoft/LightGBM branch, so the "use workflow from:" branch always matched the PR branch.

But that won't work for forks, where we'll want "use workflow from:" to be a LightGBM branch but the actual run is happening on a branch from a fork. I'm not sure if the previous ChatOps implementation also had this limitation, there aren't any runs of this workflow on PRs from forks still visible in the history.

I'll try to find a fix :/

@StrikerRUS
Copy link
Collaborator

I remember that our complex old mechanism was exactly the consequence of forks proceedings...

In addition, now optional workflows will be triggered by ordinary comments in PRs, not a review ones. This change is required due to secret values limitation described above. GitHub emits runs triggered by review comments associated with a fork PR made from, and runs "belong" to upstream repo in case of issue comment.

#3740 (comment)

@github-actions
Copy link
Contributor

Workflow R valgrind tests has been triggered! 🚀

https://github.com/microsoft/LightGBM/actions/runs/18799045915

Status: success ✔️

@github-actions

This comment was marked as resolved.

@microsoft microsoft deleted a comment from github-actions bot Nov 10, 2025
@github-actions

This comment was marked as resolved.

```shell
gh workflow run \
--repo microsoft/LightGBM \
--ref ci/fix-rerun-workflow \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a mistake when I included it in #7072, sorry.

--ref is the CLI equivalent of the "use workflow from:" dropdown in the GitHub Actions UI (see screenshot in #7068 (comment)).

It controls which branch to source r_valgrind.yml from.

In most cases, that should just be the default branch in the repo. The --ref ci/fix-rerun-workflow here was left over from my testing on #7072.

@github-actions
Copy link
Contributor

Workflow R valgrind tests has been triggered! 🚀

https://github.com/microsoft/LightGBM/actions/runs/19257757316

Status: success ✔️

@jameslamb jameslamb changed the title WIP: [ci] generate valgrind suppressions in logs [ci] generate valgrind suppressions in logs Nov 11, 2025
@jameslamb
Copy link
Collaborator Author

Ok this now seems to be working. Sorry this has required so much reviewing effort, but I think it's worth it to now be able to test changes to workflows like this without needing a merge to master.

@jameslamb jameslamb marked this pull request as ready for review November 11, 2025 18:09
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@StrikerRUS StrikerRUS merged commit ddb3c0e into microsoft:master Nov 12, 2025
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants