Skip to content

burn-dataset: Catch import.py unsuccessful exits #3236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025

Conversation

drozdziak1
Copy link
Contributor

@drozdziak1 drozdziak1 commented May 27, 2025

Pull Request Template

Checklist

  • Confirmed that cargo run-checks command has been executed.
    • Note: these failed for me, but after looking at their code, I don't think it's related to the change.
      failures:
          tests::tensor::display::tests::test_display_2d_bool_tensor
          tests::tensor::display::tests::test_display_2d_float_tensor
          tests::tensor::display::tests::test_display_2d_int_tensor
          tests::tensor::display::tests::test_display_3d_tensor
          tests::tensor::display::tests::test_display_4d_tensor
          tests::tensor::display::tests::test_display_precision
          tests::tensor::display::tests::test_display_tensor_summarize_1
          tests::tensor::display::tests::test_display_tensor_summarize_2
          tests::tensor::display::tests::test_display_tensor_summarize_3
          types::tests::should_support_dual_byte_bridge
      
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

n/a (searched for "exit", "status", "sqlite")

Changes

Problem

I was chasing an unrelated issue that manifested by SqliteDataset being unable to open the database file created by importer.py. After running the script manually with the args that burn used, it turned out to be a segfault. After looking at downloader.rs I noticed that only the wait() result is validated, but not the ExitStatus. This resulted in Ok returned despite import.py segfault and later caused issues to SqliteDataset.

Solution

Check for !ExitStatus::success(), use ImporterError::Unknown like existing code does.

Note: A cleaner solution might be something like handle.wait().map(|status| status.exit_ok()).map_err(...)? but exit_ok() is not stable yet.

Testing

Before

$ cargo run --example text-generation
[...]
Creating SQL from Arrow format:   0%|                                                                                                                                | 0/560 [00:00<?, ?ba/s]
thread 'main' panicked at examples/text-generation/src/data/dataset.rs:40:14:
called `Result::unwrap()` on an `Err` value: SqliteDataset(ConnectionPool(Error(Some("unable to open database file: /home/drozdziak1/.cache/burn-dataset/dbpedia_14.db"))))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

After

$ cargo run --example text-generation
[...]
Creating SQL from Arrow format:   0%|                                                                                                                                | 0/560 [00:00<?, ?ba/s]
thread 'main' panicked at examples/text-generation/src/data/dataset.rs:40:14:
called `Result::unwrap()` on an `Err` value: Unknown("signal: 11 (SIGSEGV) (core dumped)")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link

codecov bot commented May 30, 2025

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 82.12%. Comparing base (1d2423a) to head (97cd462).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
.../burn-dataset/src/source/huggingface/downloader.rs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3236      +/-   ##
==========================================
- Coverage   82.12%   82.12%   -0.01%     
==========================================
  Files         966      966              
  Lines      123257   123261       +4     
==========================================
+ Hits       101229   101232       +3     
- Misses      22028    22029       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Did you manage to find what the root segfault issue was?

@laggui laggui merged commit 3bfe672 into tracel-ai:main May 30, 2025
11 checks passed
@drozdziak1
Copy link
Contributor Author

Yes and no, the general cause seems to be a disagreement between NixOS and the pyarrow native lib pulled in by Burn's venv. The details seem unclear when looking at the corefile.

My Nix shell provides a couple things relevant to this crash - mostly libc stuff, libstdc++ and Python (3.12). Full ldd output:

$ ldd ~/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_python.so.2000
        linux-vdso.so.1 (0x00007ffd867f9000)
        libarrow_substrait.so.2000 => /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_substrait.so.2000 (0x00007f6e00a00000)
        libarrow_dataset.so.2000 => /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_dataset.so.2000 (0x00007f6e00833000)
        libarrow_acero.so.2000 => /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_acero.so.2000 (0x00007f6e006b4000)
        libparquet.so.2000 => /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libparquet.so.2000 (0x00007f6dffe00000)
        libarrow.so.2000 => /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow.so.2000 (0x00007f6dfc800000)
        libdl.so.2 => /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/libdl.so.2 (0x00007f6e00def000)
        librt.so.1 => /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/librt.so.1 (0x00007f6e00dea000)
        libstdc++.so.6 => /nix/store/w47x7y9xjgq7n3171ckg722ddwf2pi3v-gcc-13.3.0-lib/lib/libstdc++.so.6 (0x00007f6dfc400000)
        libm.so.6 => /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/libm.so.6 (0x00007f6dffd19000)
        libgcc_s.so.1 => /nix/store/w47x7y9xjgq7n3171ckg722ddwf2pi3v-gcc-13.3.0-lib/lib/libgcc_s.so.1 (0x00007f6e00dc3000)
        libc.so.6 => /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/libc.so.6 (0x00007f6dfc208000)
        libpthread.so.0 => /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/libpthread.so.0 (0x00007f6e00dbe000)
        /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib64/ld-linux-x86-64.so.2 (0x00007f6e00ff0000)

coredump excerpt

I stared at the dumped core with cgdb, here's some disasm leading up to the crash:

48│    0x00007f60920abeac <+156>:   mov    0x153b55(%rip),%rax        # 0x7f60921ffa08 <arrow_ARRAY_API>
49│    0x00007f60920abeb3 <+163>:   mov    $0x7,%ecx
50│    0x00007f60920abeb8 <+168>:   mov    $0x1,%esi
51├──> 0x00007f60920abebd <+173>:   mov    0x10(%rax),%rd

and the thread's backtrace:

#0  0x00007f60920abebd in arrow::py::(anonymous namespace)::PandasWriter::EnsurePlacementAllocated() ()
   from /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_python.so.2000
#1  0x00007f60920b0ef4 in arrow::py::(anonymous namespace)::PandasWriter::Write(std::shared_ptr<arrow::ChunkedArray>, long, long) ()
   from /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_python.so.2000
#2  0x00007f60920ad037 in arrow::py::(anonymous namespace)::ConsolidatedBlockCreator::WriteTableToBlocks()::{lambda(int)#1}::operator()(int) const ()
   from /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_python.so.2000
#3  0x00007f60920bdb54 in arrow::internal::FnOnce<void ()>::FnImpl<std::_Bind<arrow::detail::ContinueFuture (arrow::Future<arrow::internal::Empty>, arrow::py::(anonymous namespace)::Consoli
datedBlockCreator::WriteTableToBlocks()::{lambda(int)#1}, int)> >::invoke() () from /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow_python.so.2000   
#4  0x00007f608f13a70c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::{lambda()#1}> > >::_M_run() ()
   from /home/drozdziak1/.cache/burn-dataset/venv/lib/python3.12/site-packages/pyarrow/libarrow.so.2000
#5  0x00007f608d6e86d3 in execute_native_thread_routine () from /nix/store/w47x7y9xjgq7n3171ckg722ddwf2pi3v-gcc-13.3.0-lib/lib/libstdc++.so.6
#6  0x00007f6093098af3 in start_thread () from /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/libc.so.6
#7  0x00007f6093117f4c in __clone3 () from /nix/store/p9kdj55g5l39nbrxpjyz5wc1m0s7rzsx-glibc-2.40-66/lib/libc.so.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants