Skip to content

Conversation

@roeap
Copy link
Collaborator

@roeap roeap commented Oct 3, 2025

Description

This is another attempt at making our CI a bit more pleasant to work with. The basic idea is - more smaller jobs as well as avoiding some redundant work.

we moved integration tests into their own job and streamlined the build job.

  • build/check: fmt, clippy (with --test), docs, on linux only
  • build/build: previous buid and check commands but without --test, all os
  • build/test: unit tests on all os

In integration we now run a job for every crate that has integration tests. azure, aws, gcp, hdfs, lakefs

@roeap roeap requested a review from rtyler as a code owner October 3, 2025 10:49
@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.22%. Comparing base (c5eb4c2) to head (c4dadb6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3806      +/-   ##
==========================================
- Coverage   76.07%   74.22%   -1.86%     
==========================================
  Files         145      145              
  Lines       45313    39500    -5813     
  Branches    45313    39500    -5813     
==========================================
- Hits        34474    29317    -5157     
+ Misses       9148     8771     -377     
+ Partials     1691     1412     -279     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abhiaagarwal
Copy link
Contributor

I read this article the other day which might be helpful to avoid CI running out of space: https://kobzol.github.io/rust/2025/09/22/reducing-binary-size-of-rust-programs-with-debuginfo.html. You can compress debuginfo to keep tracebacks with RUSTFLAGS="-Clink-arg=-Wl,--compress-debug-sections=zlib"

@roeap
Copy link
Collaborator Author

roeap commented Oct 3, 2025

@abhiaagarwal - awesome, thanks! Will take a look.

@rtyler @fvaleye @ion-elgreco - not sure how far we would wnat to go (if we follow this approach at all), but for now I tried to get the failures done and reduce runtime ...

In teh current config we run only linux tests on PRs and windows / osx only on main. WIndows beings by far the biggest offender. I think here caches are kicking in more effectively and most builds are done after about 2mins. The full unit tests still take about 5 mins (10+ mins on windows).

In the many CI runs I observed, i never saw an out of memory exception happening though!

Thoughts?

@roeap roeap requested review from fvaleye and ion-elgreco October 3, 2025 13:21
roeap added 2 commits October 3, 2025 15:23
Signed-off-by: Robert Pack <[email protected]>
Copy link
Collaborator Author

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving some comments in the hopes of making this somehow reviewable.

Comment on lines -19 to -21
- name: checkout
uses: actions/checkout@v4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to checkout the code before we can use local actions. so this must have been unused anyhow.


- uses: Swatinem/rust-cache@v2
- name: Setup Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actions-rs/toolchain seems unmaintained albeit still widely used. This action is maintained and has the additional benefit of also setting up Swatinem/rust-cache. My assumtion beeing that they spend much more time thinging about how than I am hoping to invest 😆.

Comment on lines -13 to -16
RUSTFLAGS: -C debuginfo=line-tables-only
# Disable incremental builds by cargo for CI which should save disk space
# and hopefully avoid final link "No space left on device"
CARGO_INCREMENTAL: 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately removed most our rust flags etc. to see how well we perform out of the box. SO far we seem to be doing well.

One question came up along the way. Could it also be that CARGO_INCREMENTAL: 0 hurts reusability of cached data? not sure what cargo / rustc are writing out in the different modes, but if its is fewer larger files vs. more smaller files (which are in total more) it feel intuitive that this might be the case.

Just speculating though :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental cargo builds don't work with sccache. So it's the right decision.

Also, I think incremental builds theoretically are non-reproducible, so it's probably correct to not run it in CI. This job automatically does it https://github.com/actions-rust-lang/setup-rust-toolchain/blob/063a3b947b5c5bf7d5f87076c3e5e9784b776aa8/action.yml#L118

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abhiaagarwal - if I read the linked code correctly, this setting is also applied internally,, so we don't need to re-set this here. Or should we also be explicitly setting this?

Copy link
Contributor

@abhi-airspace-intelligence abhi-airspace-intelligence Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job already sets CARGO_INCREMENTAL=0, so you don't need to be explicitly setting it.

Comment on lines +133 to +137
--exclude deltalake \
--exclude deltalake-azure \
--exclude deltalake-aws \
--exclude deltalake-hdfs \
--exclude deltalake-lakefs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that we would not build these packages, but looking at the logs it seems we still do. This was an attempt to get build time on windows down, but seems to not have much of an impact. SO we ended at just removing them.

Comment on lines -86 to -92
typos:
name: Spell Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check spelling
uses: crate-ci/typos@v1
Copy link
Collaborator Author

@roeap roeap Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is on me ... This file triggers on the pull request target, which is why we checking the errors on the main branch and seeing failures if we changed files that had spelling errors on main, even if we fixed the errors in the PR.

Now this is in a dedicated workflow that triggers on pull request.

run: |
gmake setup-dat
cargo test \
--package deltalake-aws \
Copy link
Collaborator Author

@roeap roeap Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i looked through our crates and it seems that we are using a native-tls flag only in the AWS crate. We should make this a matrix if there are ever any more or I missed some here.

Comment on lines -46 to -47
- name: Check Rust
run: make check-rust
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our rust checks already check the python crate as well, so no need to repeat that check. Alternative is to also install clippy in case someone feels this check adds value.

@ion-elgreco
Copy link
Collaborator

@abhiaagarwal - awesome, thanks! Will take a look.

@rtyler @fvaleye @ion-elgreco - not sure how far we would wnat to go (if we follow this approach at all), but for now I tried to get the failures done and reduce runtime ...

In teh current config we run only linux tests on PRs and windows / osx only on main. WIndows beings by far the biggest offender. I think here caches are kicking in more effectively and most builds are done after about 2mins. The full unit tests still take about 5 mins (10+ mins on windows).

In the many CI runs I observed, i never saw an out of memory exception happening though!

Thoughts?

I'm fine with not running windows and Mac in PRS

@ion-elgreco ion-elgreco enabled auto-merge (squash) October 5, 2025 06:42
@roeap roeap disabled auto-merge October 5, 2025 07:53
@roeap roeap merged commit b8e2170 into delta-io:main Oct 5, 2025
39 of 40 checks passed
@roeap roeap deleted the ci/split-integration branch October 5, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants