Avoid short reads and seeking of inputs #329

tamird · 2025-12-03T18:09:29Z

Use .display rather than Debug
Use mmap to avoid reading entire input files

This change is

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/linker.rs

alessandrod · 2025-12-03T23:35:29Z

I'm meh on this change, virtually all inputs are rlibs so we're not mmapping anyway, and not clear what the point of mmapping here even is we're just going through a worse abstraction (mmap is top 3 worst unix abstractions)

tamird · 2025-12-04T01:16:31Z

I'm meh on this change, virtually all inputs are rlibs so we're not mmapping anyway, and not clear what the point of mmapping here even is we're just going through a worse abstraction (mmap is top 3 worst unix abstractions)

I don't understand your objection. This change allows us to stop worrying about how much of the file we need to read to guess its type.

BretasArthur1 · 2025-12-04T01:33:47Z

ge allows us to stop worrying about how much of the file we need to read to guess its type.

You mean by using mmap we can treat the file as a contiguous byte slice, so we can check the header or signature without manually allocating or reading into a buffer, right?

tamird · 2025-12-04T01:43:09Z

ge allows us to stop worrying about how much of the file we need to read to guess its type.

You mean by using mmap we can treat the file as a contiguous byte slice, so we can check the header or signature without manually allocating or reading into a buffer, right?

Yes

BretasArthur1 · 2025-12-04T02:21:15Z

And what's the trade off with this approach?

tamird · 2025-12-04T02:24:48Z

It's a few syscalls, but it's constant. ChatGPT can give a more complete answer.

alessandrod · 2025-12-04T02:26:47Z

I'm meh on this change, virtually all inputs are rlibs so we're not mmapping anyway, and not clear what the point of mmapping here even is we're just going through a worse abstraction (mmap is top 3 worst unix abstractions)

I don't understand your objection. This change allows us to stop worrying about how much of the file we need to read to guess its type.

But who is worried and why? We can just read_to_end, we're already reading eveything for rlibs which are 99% of the inputs anyway. Mmap just adds an extra layer of (bad) abstraction and an extra dependency for no actual material gain imo.

tamird · 2025-12-04T02:41:33Z

I'm meh on this change, virtually all inputs are rlibs so we're not mmapping anyway, and not clear what the point of mmapping here even is we're just going through a worse abstraction (mmap is top 3 worst unix abstractions)

I don't understand your objection. This change allows us to stop worrying about how much of the file we need to read to guess its type.

But who is worried and why? We can just read_to_end, we're already reading eveything for rlibs which are 99% of the inputs anyway. Mmap just adds an extra layer of (bad) abstraction and an extra dependency for no actual material gain imo.

I don't understand this use of the word "already". The current implementation reads twice for rlibs, this gets that down to once. I also don't understand the complaints about mmap, we're pretty insulated from whatever it is you dislike and just get a byte slice.

The real payoff is in the simplicity we get in #323 as a result.

alessandrod · 2025-12-04T03:05:39Z

The real payoff is in the simplicity we get in #323 as a result.

Is it tho? Isn't it a lot simpler to do read_to_end and then just pass the slice around? No new dependencies, no mmap? BPF files are tiny, we don't need the demand paging mmap gives you. We can read the whole thing in one line and call it a day.

tamird · 2025-12-04T03:38:53Z

The real payoff is in the simplicity we get in #323 as a result.

Is it tho? Isn't it a lot simpler to do read_to_end and then just pass the slice around? No new dependencies, no mmap? BPF files are tiny, we don't need the demand paging mmap gives you. We can read the whole thing in one line and call it a day.

Seems strictly worse. What am I missing?

alessandrod · 2025-12-04T03:52:15Z

Seems strictly worse. What am I missing?

The goal is to pass a slice around. How is reading into a buffer and passing the slice around worse and more complex than adding a dependency and doing a syscall to ask linux to do magic to let us pass a slice around?

alessandrod · 2025-12-04T03:54:34Z

Or I guess, another point of view:

When I see mmap I think that it's being used to demand page a file, not to pass &[u8]. Passing &[u8] is a type system thing. Mmap is a system level thing.

Mmap makes no sense for the kind of inputs we use. The min granularity of the page cache is 4k anyway, and it'll actually prefetch, so given our inputs, we're reading everything from disk anyway, just with more steps (even leaving aside that most inputs are archives so we read them anyway).

So just conceptually, it makes no sense to me to be doing this. If you want to pass a slice, read a buffer and pass a slice?

Avoid short reads and seeking.

tamird · 2025-12-04T14:27:21Z

I split out the use of mmap into its own commit. In my mind using mmap frees us from having to decide whether input files are big or not - we just ask the kernel to deal with it, which it already has to since all reads go through the page cache.

If you want me to remove the tail commit, I can.

alessandrod · 2025-12-04T22:16:09Z

yes pls remove mmap - death to unnecessary dependencies! Otherwise looks good

tamird · 2025-12-04T22:44:25Z

Done. I'll save the branch for when we discover someone is sending us huge inputs :)

BretasArthur1 · 2025-12-05T01:28:32Z

So, this one will not affect #323 anymore right? Should we wrap there the last changes and get it ready to merge?

Use .display rather than Debug

25ede93

tamird mentioned this pull request Dec 3, 2025

feat: add support for LLVM IR files in the linker #323

Open

chatgpt-codex-connector bot reviewed Dec 3, 2025

View reviewed changes

src/linker.rs Outdated Show resolved Hide resolved

tamird force-pushed the mmap-simpler branch 2 times, most recently from 30c0e42 to 4140f32 Compare December 3, 2025 18:18

tamird assigned vadorovsky Dec 3, 2025

Read entire input into memory

ac0f305

Avoid short reads and seeking.

tamird force-pushed the mmap-simpler branch from 4140f32 to d83b98e Compare December 4, 2025 14:26

tamird changed the title ~~Use mmap to avoid reading entire input files~~ Avoid short reads and seeking of inputs Dec 4, 2025

tamird force-pushed the mmap-simpler branch from d83b98e to ac0f305 Compare December 4, 2025 22:44

tamird merged commit ac0f305 into main Dec 5, 2025
78 of 79 checks passed

tamird deleted the mmap-simpler branch December 5, 2025 14:19

Avoid short reads and seeking of inputs #329

Avoid short reads and seeking of inputs #329

Uh oh!

Conversation

tamird commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

alessandrod commented Dec 3, 2025

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

BretasArthur1 commented Dec 4, 2025

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

BretasArthur1 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

alessandrod commented Dec 4, 2025

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

alessandrod commented Dec 4, 2025

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

alessandrod commented Dec 4, 2025

Uh oh!

alessandrod commented Dec 4, 2025

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

alessandrod commented Dec 4, 2025

Uh oh!

tamird commented Dec 4, 2025

Uh oh!

BretasArthur1 commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tamird commented Dec 3, 2025 •

edited

Loading

BretasArthur1 commented Dec 4, 2025 •

edited

Loading