Skip to content

feat(py): add custom progress reporter callbacks to installer#2187

Open
ritankarsaha wants to merge 1 commit intoconda:mainfrom
ritankarsaha:feat/py-installer-progress-callbacks
Open

feat(py): add custom progress reporter callbacks to installer#2187
ritankarsaha wants to merge 1 commit intoconda:mainfrom
ritankarsaha:feat/py-installer-progress-callbacks

Conversation

@ritankarsaha
Copy link
Contributor

Description

Fixes thr TODO - Accept functions to report progress in py-rattler/src/installer.rs.

  • PyReporter (Rust) — a new struct wrapping a PyObject that implements the full rattler::install::Reporter trait. Each of the 16 lifecycle methods acquires the GIL and calls the corresponding method on the Python object. Methods that return usize opaque tokens extract the Python return value (defaulting to 0).
  • InstallerReporter (Python) — a base class with all 16 callbacks as documented, no-op methods. Users subclass it and override only what they need.
  • install() updated — accepts a new optional reporter: InstallerReporter | None = None kwarg. When provided it is used instead of the built-in IndicatifReporter; show_progress is ignored (backwards-compatible).
  • Exports — InstallerReporter is re-exported from rattler.install and the top-level rattler package.

Usage example

  from rattler import install, InstallerReporter

  class MyReporter(InstallerReporter):
      def on_transaction_start(self, total_operations: int) -> None:
          print(f"Installing {total_operations} packages…")

      def on_download_progress(self, download_idx, progress, total):
          pct = f"{progress}/{total}" if total else str(progress)
          print(f"  [{download_idx}] {pct} bytes")

      def on_transaction_complete(self) -> None:
          print("Done!")

  await install(records, target_prefix="/my/env", reporter=MyReporter())

Fixes #2186

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added sufficient tests to cover my changes.

@baszalmstra
Copy link
Collaborator

Did you meaure the performance overhead?

@ritankarsaha
Copy link
Contributor Author

Did you meaure the performance overhead?

Nope, didn't benchmark this. I think your concern is that on_download_progress fires once per downloaded chunk
and each call acquires the GIL via Python::with_gil. On large environments with concurrent downloads this could stall Tokio worker threads while waiting for the GIL, degrading overall throughput ?

I guess the straightforward mitigation would be to throttle on_download_progress at the Rust level (e.g. skip the
GIL acquisition if fewer than 100ms have elapsed since the last call for a given download index), which would reduce GIL pressure.

What do you think @baszalmstra ? Is this approach correct? Should I add it here or do in a follow up PR or try something else?

@baszalmstra

@baszalmstra
Copy link
Collaborator

That is indeed my concern, but I might be wrong so first measuring the impact seems like the logical choice

@ritankarsaha
Copy link
Contributor Author

That is indeed my concern, but I might be wrong so first measuring the impact seems like the logical choice

got it, on it

@ritankarsaha
Copy link
Contributor Author

That is indeed my concern, but I might be wrong so first measuring the impact seems like the logical choice

Hello @baszalmstra -

Here are the results:

Phase A warm-cache (linking only)

  • Baseline: 109.6 ms mean | Reporter: 110.1 ms mean
  • Overhead: +0.4% (+0.5 ms) — negligible

Phase B cold-cache (download + linking)

  • Baseline: 415.9 ms mean | Reporter: 344.4 ms mean
  • Overhead: −17.2% (−71.4 ms) — the reporter is actually faster, which is noise from network variance

Key observations:

  1. on_download_progress fired only 10 times for this small package (~500 KB). That's very low call volume.
  2. The Phase B difference is within network jitter noise (stdev of ~30 ms vs 71 ms difference) — not a
    meaningful signal.
  3. The GIL acquisition cost is not a problem for this workload. With only 10 download-progress calls, there's
    nothing to throttle.

According to me, the overhead is essentially zero for this package size. For very large packages that trigger hundreds or thousands of on_download_progress calls, it might be worth revisiting, but the current data shows no measurable GIL penalty.

@baszalmstra
Copy link
Collaborator

Can you try with a larger package? :D rust-src or pytorch is pretty big.

@ritankarsaha
Copy link
Contributor Author

Can you try with a larger package? :D rust-src or pytorch is pretty big.

Ahh sure, trying it out !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python installer doesn't support custom progress callbacks

2 participants