Skip to content

Conversation

@BretasArthur1
Copy link

@BretasArthur1 BretasArthur1 commented Nov 20, 2025

Context

Recently I was testing generating IR with different languages and using LLVM to generate the bitcode, then using sbpf-linker to generate a .so file

Problem

In the process we need this extra step to generate the .bc with LLVM. I started a thread on X about adding .ll files as parameters for the sbpf-linker and then and alessandro mention the possibility of adding it for bpf-linker and here we are!


This change is Reviewable

@alessandrod alessandrod self-requested a review November 20, 2025 10:41
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

@tamird reviewed 4 of 5 files at r1, all commit messages.
Reviewable status: 4 of 5 files reviewed, 10 unresolved discussions (waiting on @alessandrod)


-- commits line 2 at r1:
it would be great to include your motivation here


src/llvm/mod.rs line 144 at r1 (raw file):

    linked
}
#[must_use]

newline between functions please


src/llvm/mod.rs line 148 at r1 (raw file):

    context: &'ctx LLVMContext,
    module: &mut LLVMModule<'ctx>,
    buffer: &CStr,

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]


src/llvm/mod.rs line 176 at r1 (raw file):

        linked = unsafe { LLVMLinkModules2(module.as_mut_ptr(), temp_module) } == 0;
    } else {
        if !error_msg.is_null() {

we should return the error message.


src/llvm/mod.rs line 179 at r1 (raw file):

            unsafe { LLVMDisposeMessage(error_msg) };
        }
        if !temp_module.is_null() {

this should be impossible, no?


Cargo.toml line 43 at r1 (raw file):

rustc-build-sysroot = { workspace = true }
which = { version = "8.0.0", default-features = false, features = ["real-sys", "regex"] }
tempfile = "3.13"

could you keep this alphabetical please and use { version = ... } for consistency?


src/linker.rs line 514 at r1 (raw file):

    // buffer used to perform file type detection
    let mut buf = [0u8; 1024];

could you add a comment here, or maybe make this more general so that this arbitrary size isn't needed? e.g. i wonder if BufRead would help?


src/linker.rs line 600 at r1 (raw file):

        }
        InputType::Ir => {
            data.push(0); // force push null terminator

I think you can just use CString::new(data) which will internally append the nul byte.


src/linker.rs line 914 at r1 (raw file):

        Some(position) => &data[position..],
        None => return false,
    };

data.trim_ascii_start?

Code quote:

    // Trim whitespace from the start of the data
    let trimmed = match data.iter().position(|b| !b.is_ascii_whitespace()) {
        Some(position) => &data[position..],
        None => return false,
    };

src/linker.rs line 923 at r1 (raw file):

        || trimmed.starts_with(b"target ")
        || trimmed.starts_with(b"define")
        || trimmed.starts_with(b"!llvm")

might be easier to read as [.....].iter().any(|prefix| trimmed.starts_with(prefix))

Code quote:

    trimmed.starts_with(b"; ModuleID")
        || trimmed.starts_with(b"target triple")
        || trimmed.starts_with(b"target datalayout")
        || trimmed.starts_with(b"source_filename")
        || trimmed.starts_with(b"target ")
        || trimmed.starts_with(b"define")
        || trimmed.starts_with(b"!llvm")

@alessandrod
Copy link
Collaborator

alessandrod commented Nov 20, 2025

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]

because deep inside parse IR, LLVM hardcodes a RequiresTerminator=true, so you get a debug assertion if you set this to false and don't null terminate

image

@tamird
Copy link
Member

tamird commented Nov 20, 2025

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]

because deep inside parse IR, LLVM hardcodes a RequiresTerminator=true, so you get a debug assertion if you set this to false and don't null terminate

image

Sounds like a great comment to leave.

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 10 unresolved discussions (waiting on @alessandrod and @tamird)


src/linker.rs line 514 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

could you add a comment here, or maybe make this more general so that this arbitrary size isn't needed? e.g. i wonder if BufRead would help?

What you suggest as a comment for this one?

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 10 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/linker.rs line 514 at r1 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

What you suggest as a comment for this one?

I would avoid the arbitrary-size buffer if possible - in the case of IR you could have unbounded whitespace before the thing you look for in is_llvm_ir.

@BretasArthur1 BretasArthur1 requested a review from tamird November 21, 2025 17:37
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed 1 of 4 files at r2, all commit messages.
Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/linker.rs line 915 at r2 (raw file):

fn is_llvm_ir(data: &[u8]) -> bool {
    let trimmed = data.trim_ascii_start();
    if trimmed.is_empty() {

this check is not needed, right?


src/llvm/mod.rs line 152 at r2 (raw file):

    let buffer_name = c"ir_buffer";
    let buffer = buffer.to_bytes();
    let mem_buffer = unsafe {

don't you need to unsafe { LLVMDisposeMemoryBuffer(buffer) }; like the function above?


src/llvm/mod.rs line 157 at r2 (raw file):

            buffer.len(),
            buffer_name.as_ptr(),
            1, // LLVM internally sets RequiresTerminator=true

I am now confused by this. Can we just set it to 0 and then we wouldn't need to create a C-string?

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @tamird)


src/linker.rs line 915 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this check is not needed, right?

Now that we switched to the approach of no hardcoded buffer I need to remove this, sorry


src/llvm/mod.rs line 152 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

don't you need to unsafe { LLVMDisposeMemoryBuffer(buffer) }; like the function above?

On some tests I was doing this but, if we drop the buffer here it can't hold the internal reference to it and perform the checks, but this was with the previous approach, I need to check now!


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I am now confused by this. Can we just set it to 0 and then we wouldn't need to create a C-string?

So because of the llvm ir parser we need to set this to one because of that hardcoded value setting null termination to true even if the buffer don't have it... This was a issue alessandro found it with llvm in debug mode, but maybe he can answer this better
cc @alessandrod

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

So because of the llvm ir parser we need to set this to one because of that hardcoded value setting null termination to true even if the buffer don't have it... This was a issue alessandro found it with llvm in debug mode, but maybe he can answer this better
cc @alessandrod

Now that I am looking at the screenshot he posted, I think that was just because of this 1. If you change it to 0 I think you do not need a null terminator.

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 11 unresolved discussions (waiting on @alessandrod and @tamird)


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Now that I am looking at the screenshot he posted, I think that was just because of this 1. If you change it to 0 I think you do not need a null terminator.

That print was testing with 0

@BretasArthur1
Copy link
Author

Done, removed the unnecessary check and also tested with unsafe { LLVMDisposeMemoryBuffer(buffer) }; and the tests panic. cc @tamird

@BretasArthur1 BretasArthur1 requested a review from tamird November 21, 2025 22:32
@BretasArthur1
Copy link
Author

hey @tamird gm! any blockers here ?

@alessandrod
Copy link
Collaborator

@tamird wrt the memory buffer thing, I think we actually need to change all the other instances to be null terminated

The RequiresNullTerminator field isn't stored anywhere, it's only used to do a check when the buffer is created https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/Support/MemoryBuffer.cpp#L48

Then this is why we get the assertion in the screenshot:

LLVMParseIRInContext => parseIR => parseAssembly => parseAssemblyInto

https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/AsmParser/Parser.cpp#L30

std::unique_ptr<MemoryBuffer> Buf = MemoryBuffer::getMemBuffer(F);

https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/include/llvm/Support/MemoryBuffer.h#L138

so RequiresNullTerminator=true and we hit the assertion if we don't null terminate our buffer

I think in the other instances we're lucky and we're not hitting the assertion by accident

src/linker.rs Outdated
let mut buf = BufReader::new(input);

// Peek at the buffer to determine file type
let preview = buf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still an arbitrary size buffer (the size of the internal BufReader
buffer which is 4096 by default)

You don't need BufReader here. What you want to do is pass input to
detect_input_type instead of passing &[u8]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, just saw on the implementation

"BufReader can improve the speed of programs that make small and repeated read calls to the same file or network socket. It does not help when reading very large amounts at once, or reading just one or a few times. It also provides no advantage when reading from a source that is already in memory, like a Vec"

) -> Result<bool, String> {
let buffer_name = c"ir_buffer";
let buffer = buffer.to_bytes();
let mem_buffer = unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're leaking this you need to call LLVMDisposeMemoryBuffer before returning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Corrupting IR content
let invalid_content = valid_content
.replace("define", "defXne")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to corrupt 3 things, pick one :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed 1 of 3 files at r4, all commit messages.
Reviewable status: 3 of 5 files reviewed, 16 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/llvm/mod.rs line 152 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

On some tests I was doing this but, if we drop the buffer here it can't hold the internal reference to it and perform the checks, but this was with the previous approach, I need to check now!

as @alessandrod said below, you are leaking here


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

That print was testing with 0

Could you please add a citation? i.e. please link to the relevant LLVM code.


src/linker.rs line 885 at r5 (raw file):

}

fn detect_input_type(reader: &mut (impl Read + Seek)) -> Option<InputType> {

i think you don't need the parenthesis around Read + Seek


src/linker.rs line 887 at r5 (raw file):

fn detect_input_type(reader: &mut (impl Read + Seek)) -> Option<InputType> {
    let mut header = [0u8; 8];
    let bytes_read = reader.read(&mut header).ok()?;

please don't use result.ok -- this discards information. why is it ok to throw away this error?


src/linker.rs line 898 at r5 (raw file):

        b"\xcf\xfa\xed\xfe" => Some(InputType::MachO),
        _ => {
            reader.rewind().ok()?;

why is it ok to toss this error, and why is it ok to rewind the reader only here? in other words, the contract of this function is unclear; what will be the state of the reader when it returns?


src/linker.rs line 924 at r5 (raw file):

    loop {
        let bytes_read = reader.read(&mut byte).ok()?;

reading one byte at a time is wildly inefficient

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 16 unresolved discussions (waiting on @alessandrod and @tamird)


src/linker.rs line 898 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why is it ok to toss this error, and why is it ok to rewind the reader only here? in other words, the contract of this function is unclear; what will be the state of the reader when it returns?

You're right, I will refactor and self contain this logic inside of is_llvm_ir since it's for the specific funtion.


src/linker.rs line 924 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

reading one byte at a time is wildly inefficient

Working on a improvement


src/llvm/mod.rs line 152 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

as @alessandrod said below, you are leaking here

Actually I'm not sure. If I add the unsafe { LLVMDisposeMemoryBuffer(mem_buffer) }; on the success path before returning Ok(linked) the test panic's with a SIGSEGV error, probably meaning that we force the drop of the memory that it will perform the assertion. And since LLVMParseIRInContext takes ownership of the buffer, it should resolve the drop by itself.

ref: https://github.com/llvm/llvm-project/blob/00276b67d36a665119a6a7b39dbba69f45c44e58/llvm/lib/IRReader/IRReader.cpp#L122

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 16 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/llvm/mod.rs line 152 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Actually I'm not sure. If I add the unsafe { LLVMDisposeMemoryBuffer(mem_buffer) }; on the success path before returning Ok(linked) the test panic's with a SIGSEGV error, probably meaning that we force the drop of the memory that it will perform the assertion. And since LLVMParseIRInContext takes ownership of the buffer, it should resolve the drop by itself.

ref: https://github.com/llvm/llvm-project/blob/00276b67d36a665119a6a7b39dbba69f45c44e58/llvm/lib/IRReader/IRReader.cpp#L122

I think you are right which is utterly crazy because LLVMParseBitCodeInContext2 does not take ownership though LLVMParseBitCodeInContext does!

Please add a comment on the call to LLVMParseIRInContext to explain this madness.

@BretasArthur1 BretasArthur1 requested a review from tamird December 1, 2025 23:15
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BretasArthur1 please address all the comments before requesting re-review. There are many outstanding comments.

@tamird reviewed all commit messages.
Reviewable status: 2 of 5 files reviewed, 17 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/linker.rs line 598 at r6 (raw file):

            let data = CString::new(data).unwrap();
            if !llvm::link_ir_buffer(context, module, &data)
                .map_err(|_| LinkerError::LinkModuleError(path.to_owned()))?

why throw away the error here?

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about it @tamird, besides the most recent one, which one I missed?

Reviewable status: 2 of 5 files reviewed, 17 unresolved discussions (waiting on @alessandrod and @tamird)


-- commits line 2 at r1:

Previously, tamird (Tamir Duberstein) wrote…

it would be great to include your motivation here

Done.


Cargo.toml line 43 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

could you keep this alphabetical please and use { version = ... } for consistency?

Done.


src/linker.rs line 600 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think you can just use CString::new(data) which will internally append the nul byte.

Done.


src/linker.rs line 914 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

data.trim_ascii_start?

Done.


src/linker.rs line 923 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

might be easier to read as [.....].iter().any(|prefix| trimmed.starts_with(prefix))

Done.


src/linker.rs line 885 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i think you don't need the parenthesis around Read + Seek

You mean, creating a generic with bounds?


src/linker.rs line 887 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

please don't use result.ok -- this discards information. why is it ok to throw away this error?

Done.


src/linker.rs line 924 at r5 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Working on a improvement

Done.


src/llvm/mod.rs line 144 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

newline between functions please

Done.


src/llvm/mod.rs line 148 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why does this need a c-string? LLVMCreateMemoryBufferWithMemoryRange takes a pointer and a length, i think this can be &[u8]

Done.


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Could you please add a citation? i.e. please link to the relevant LLVM code.

Done.

) -> Result<bool, String> {
let buffer_name = c"ir_buffer";
let buffer = buffer.to_bytes();
let mem_buffer = unsafe {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Corrupting IR content
let invalid_content = valid_content
.replace("define", "defXne")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 12 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


-- commits line 2 at r1:

Previously, BretasArthur1 (Arthur Bretas) wrote…

Done.

Not done. Please squash to a single commit and include why, not just what.


src/linker.rs line 885 at r5 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

You mean, creating a generic with bounds?

I mean you can write &mut impl Read + Seek, I think?


src/linker.rs line 887 at r5 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Done.

huh? not done.


src/linker.rs line 924 at r5 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Done.

Why not use a BufReader here? Seems like you've reinvented it here. Reading a byte at a time from a BufReader is fine because it buffers internally. This function can be much simpler, I think.


src/llvm/mod.rs line 148 at r1 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Done.

The comment explaining that LLVM has a hard-coded RequiresNullTerminator=1 (along with a citation) should be here on the function to explain the API.


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Done.

This comment is written pretty strangely and also is anchored in the wrong place - it doesn't explain this 1, it explains why this function takes a CStr.


src/linker.rs line 927 at r6 (raw file):

        .rewind()
        .inspect_err(|e| error!("failed to rewind reader for LLVM IR detection: {}", e))
        .ok()?;

same problem here, this is discarding the error. please avoid random error prints

@BretasArthur1
Copy link
Author

idk why, but my reviewable wasn't showing some comments, sorry abt it

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 12 unresolved discussions (waiting on @alessandrod and @tamird)


-- commits line 2 at r1:

Previously, tamird (Tamir Duberstein) wrote…

Not done. Please squash to a single commit and include why, not just what.

Ok, will be doing before the merge


src/linker.rs line 885 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I mean you can write &mut impl Read + Seek, I think?

we actually do need... without the parenthesis we hit ambiguous + in a type


src/linker.rs line 887 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

huh? not done.

fixing it


src/linker.rs line 924 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Why not use a BufReader here? Seems like you've reinvented it here. Reading a byte at a time from a BufReader is fine because it buffers internally. This function can be much simpler, I think.

BufReader haves that internal size limit, I was trying to avoid it. I opt to a approach were we read all the file until we found a non-whitespace character... I can use the read_to_end that may cause some overhead, but I can also use the BufReader here, but maybe it would fail if we just check the 4096 and we have more then that of whitespaces... wdyt?


src/linker.rs line 598 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why throw away the error here?

fixing it, I'll create a new error type returning the path and the error message.


src/linker.rs line 927 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

same problem here, this is discarding the error. please avoid random error prints

I will need to change both signatures to unwind the error... I'll work on it


src/llvm/mod.rs line 148 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

The comment explaining that LLVM has a hard-coded RequiresNullTerminator=1 (along with a citation) should be here on the function to explain the API.

now, done


src/llvm/mod.rs line 152 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think you are right which is utterly crazy because LLVMParseBitCodeInContext2 does not take ownership though LLVMParseBitCodeInContext does!

Please add a comment on the call to LLVMParseIRInContext to explain this madness.

Done.


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This comment is written pretty strangely and also is anchored in the wrong place - it doesn't explain this 1, it explains why this function takes a CStr.

Oh ok, addressing the fix

@BretasArthur1 BretasArthur1 requested a review from tamird December 2, 2025 04:16
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is red.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed all commit messages.
@tamird dismissed @alessandrod from a discussion.
Reviewable status: 2 of 5 files reviewed, 8 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


-- commits line 2 at r1:

Previously, BretasArthur1 (Arthur Bretas) wrote…

Ok, will be doing before the merge

Do it now?


src/linker.rs line 924 at r5 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

BufReader haves that internal size limit, I was trying to avoid it. I opt to a approach were we read all the file until we found a non-whitespace character... I can use the read_to_end that may cause some overhead, but I can also use the BufReader here, but maybe it would fail if we just check the 4096 and we have more then that of whitespaces... wdyt?

What internal limit? It's not a limit, it will read more data as needed. Either I've not understood you or you've not understood BufReader.


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

Oh ok, addressing the fix

Please remove this comment and set this to zero. There is absolutely no reason to pass 1 here.


src/linker.rs line 48 at r7 (raw file):

    /// Linking an IR module failed.
    #[error("failure linking IR module `{0}`: {1}")]
    LinkIrModuleError(PathBuf, String),

This is a failure to parse IR, not to link it.

I am getting the feeling that this work is very rushed. Please, take the necessary time to review your own work.


src/linker.rs line 928 at r7 (raw file):

    ];

    const MAX_PREFIX_LEN: usize = 17; // "target datalayout"

just make the string the constant and use .len()


src/llvm/mod.rs line 149 at r7 (raw file):

/// The buffer must be null-terminated (hence `&CStr`), because LLVM's IR parser
/// requires `RequiresNullTerminator=1` when creating the memory buffer.
/// See: https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/Support/MemoryBuffer.cpp#L48

This is irrelevant? The only thing that matters is the link below to the header which has a default parameter = true and the actual call to that function which doesn't pass that parameter (you need to find that call and cite it here)

Code quote:

/// requires `RequiresNullTerminator=1` when creating the memory buffer.
/// See: https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/Support/MemoryBuffer.cpp#L48

@BretasArthur1
Copy link
Author

src/linker.rs line 48 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This is a failure to parse IR, not to link it.

I am getting the feeling that this work is very rushed. Please, take the necessary time to review your own work.

Ok, you're right. Let me took the much as needed to review this, sorry about the back and forth

@BretasArthur1
Copy link
Author

src/linker.rs line 924 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

What internal limit? It's not a limit, it will read more data as needed. Either I've not understood you or you've not understood BufReader.

I'll use it. but what I was talking don't apply here since we already have the buffer and can simply use BufReader to look into it

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 8 unresolved discussions (waiting on @alessandrod and @tamird)


src/linker.rs line 928 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

just make the string the constant and use .len()

I'll remove this logic


src/llvm/mod.rs line 157 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Please remove this comment and set this to zero. There is absolutely no reason to pass 1 here.

Done.


src/llvm/mod.rs line 149 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This is irrelevant? The only thing that matters is the link below to the header which has a default parameter = true and the actual call to that function which doesn't pass that parameter (you need to find that call and cite it here)

Done.

@BretasArthur1 BretasArthur1 requested a review from tamird December 3, 2025 15:43
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed all commit messages.
Reviewable status: 2 of 5 files reviewed, 9 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/linker.rs line 924 at r5 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

I'll use it. but what I was talking don't apply here since we already have the buffer and can simply use BufReader to look into it

I sent #329 to just make this whole thing simpler.


-- commits line 5 at r8:
s/it/their/
s/a extra/an extra/
s/on`/through `/


-- commits line 7 at r8:
s/Rust/rustc/


-- commits line 17 at r8:
drop this, it's just repeating code and useless without looking at the code.

Code quote:

  Changes:
  - Add InputType::Ir variant for .ll files
  - Implement is_llvm_ir() to detect IR files by common prefixes
    (; ModuleID, source_filename, target datalayout, etc.)
  - Add link_ir_buffer() to parse and link IR modules using
    LLVMParseIRInContext
  - Add IRParseError for better error handling when IR parsing fails
  - Update detect_input_type() to handle IR detection as fallback

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 9 unresolved discussions (waiting on @alessandrod and @tamird)


-- commits line 17 at r8:

Previously, tamird (Tamir Duberstein) wrote…

drop this, it's just repeating code and useless without looking at the code.

Done with the fixes on the PR message


src/linker.rs line 924 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I sent #329 to just make this whole thing simpler.

Thanks, I'll take a look and leave my review, once it's merged I can rebase and use it here.

@BretasArthur1
Copy link
Author

BretasArthur1 commented Dec 4, 2025

Should we wait #329 to resolve this one ? cc @tamird @alessandrod

@tamird
Copy link
Member

tamird commented Dec 5, 2025

Please rebase and fix broken CI. Now that we are reading the entire input into memory you shouldn't need to mess with buffered readers and do manual seeking.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed 1 of 2 files at r8, 2 of 3 files at r10, 1 of 1 files at r11, all commit messages.
Reviewable status: 4 of 5 files reviewed, 8 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


tests/ir_file_test.rs line 1 at r11 (raw file):

#![expect(unused_crate_dependencies, reason = "used in lib/bin")]

i know we mostly have integration tests, but the linker is now a library -- can this be a unit test?


src/llvm/mod.rs line 152 at r11 (raw file):

/// https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/include/llvm/Support/MemoryBuffer.h#L134
/// Called by `LLVMParseIRInContext` via `parseIR`:
/// https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/lib/IRReader/IRReader.cpp#L122

what is this link to? where's the getMemBuffer call?


src/llvm/mod.rs line 155 at r11 (raw file):

///
/// Without the null terminator, LLVM hits an assertion in debug builds:
/// https://github.com/llvm/llvm-project/blob/bde90624185ea2cead0a8d7231536e2625d78798/llvm/include/llvm/Support/MemoryBuffer.h#L138

wrong link? this is the same link as the first one


src/linker.rs line 545 at r11 (raw file):

    module: &mut LLVMModule<'ctx>,
    path: &Path,
    data: &[u8],

can you just pass &mut Vec<u8> here? in the InputType::Ir arm below you can then write

let data = CString::new(mem::take(data)).expect();
...

and you can get rid of the unreachable while you're at it

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 8 unresolved discussions (waiting on @alessandrod and @tamird)


-- commits line 5 at r8:

Previously, tamird (Tamir Duberstein) wrote…

s/it/their/
s/a extra/an extra/
s/on`/through `/

Done.


-- commits line 7 at r8:

Previously, tamird (Tamir Duberstein) wrote…

s/Rust/rustc/

Done.


src/linker.rs line 545 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you just pass &mut Vec<u8> here? in the InputType::Ir arm below you can then write

let data = CString::new(mem::take(data)).expect();
...

and you can get rid of the unreachable while you're at it

Ok


src/llvm/mod.rs line 152 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

what is this link to? where's the getMemBuffer call?

updated


src/llvm/mod.rs line 155 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

wrong link? this is the same link as the first one

deleted

@BretasArthur1 BretasArthur1 requested a review from tamird December 6, 2025 02:58
@BretasArthur1
Copy link
Author

I'm not sure why CI is red, I'll investigate tmrw

This enables bpf-linker to recieve LLVM IR text files directly, to emit
their respective bitcode without needing an extra pass through llvm-as to convert
them to bitcode first. This is useful for linking IR files generated
by any language frontend (not just rustc), making the linker more flexible
for mixed-language BPF projects.
@BretasArthur1
Copy link
Author

Ok, now it's good to review @tamird

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed 1 of 3 files at r12, all commit messages.
Reviewable status: 3 of 5 files reviewed, 5 unresolved discussions (waiting on @alessandrod and @BretasArthur1)


src/llvm/mod.rs line 152 at r11 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

updated

OK so there are deeper reasons here than this -- it turns out that LLVM parsers rely on null termination for a performance optimization. The type safety of these interfaces is utter dog shit. Can you please document this fact that took me a day to chase down?


src/linker.rs line 524 at r13 (raw file):

            ty => {
                info!("linking file {} type {}", path.display(), ty);
                match link_data(context, &mut module, &path, &mut input.to_vec(), ty) {

ah, we don't have a vector in hand in the LinkerInput::Buffer case. Instead of this, please push back on my suggestion. In this case it seems that we need to use a Cow<_, [T]>

@BretasArthur1
Copy link
Author

src/linker.rs line 524 at r13 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ah, we don't have a vector in hand in the LinkerInput::Buffer case. Instead of this, please push back on my suggestion. In this case it seems that we need to use a Cow<_, [T]>

yep, I had to force the vec

@BretasArthur1
Copy link
Author

src/llvm/mod.rs line 152 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

OK so there are deeper reasons here than this -- it turns out that LLVM parsers rely on null termination for a performance optimization. The type safety of these interfaces is utter dog shit. Can you please document this fact that took me a day to chase down?

wow, ok, sure

Copy link
Author

@BretasArthur1 BretasArthur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 5 unresolved discussions (waiting on @alessandrod and @tamird)


src/llvm/mod.rs line 152 at r11 (raw file):

Previously, BretasArthur1 (Arthur Bretas) wrote…

wow, ok, sure

see if this is descriptive enough

@BretasArthur1 BretasArthur1 requested a review from tamird December 10, 2025 03:10
@BretasArthur1
Copy link
Author

hey @tamird @alessandrod any other additions/requests here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants