Read debug_names section from DWARF 5. #807

tmcgilchrist · 2025-10-01T00:11:36Z

Fix for #469. This adds support for reading the debug_names section in DWARF 5 as documented in the DWARF 5 spec "6.1 Accelerated Access".

I also added a sample implementation to dwarfdump that prints the same information as llvm-dwarfdump for MachO binaries. The section structure is the same for ELF binaries

This is my first significant contribution to a Rust library, so the API might not be ideal and I might have un-idomatic Rust code. I would appreciate specific suggestions to make it better.

It needs further testing on Linux ELF and some of the corner cases.

Adds sample implementation to dwarfdump that prints the same information as llvm-dwarfdump for MachO binaries.

philipc · 2025-10-01T00:40:32Z

src/read/names.rs

+#[derive(Debug)]
+pub struct DebugNamesUnit<R: Reader> {
+    header: DebugNamesHeader<R>,
+    content: R,


I haven't looked at much of the PR yet, but for some early feedback, this probably should have an R for each part of the unit. Then things like name_table_reader can simply return that R instead of needing to do size calculations. Once that is done, there's probably no need to store all the sizes from the header.

Ok, I didn't understand this pattern on my first reading of the code.

If I understand correctly I should store pre-sliced Reader instances for each section in DebugNamesUnit<R>, similar to the pattern in CFI (CommonInformationEntry, FrameDescriptionEntry). For example:

pub struct DebugNamesUnit<R: Reader> { // Metadata version: u16, format: Format, augmentation_string: R, // Counts for iteration comp_unit_count: u32, local_type_unit_count: u32, foreign_type_unit_count: u32, bucket_count: u32, name_count: u32, abbrev_table_size: u32, // Pre-sliced readers for each section comp_unit_list: R, local_type_unit_list: R, foreign_type_unit_list: R, hash_table_data: R, name_table_data: R, abbreviation_table: R, entry_pool: R, }

Basically yes, but you should be able to avoid a lot of the counts. Probably the main one you need to keep is bucket_count, since that is needed for converting a hash to a bucket index. For the others, we only need to look entries up by index.

Related to this, I see a lot of functions for reading these tables and storing them as Vecs. We should avoid that for simple arrays. The abbreviation table will need to be parsed into another data structure, but the rest should be kept as R instead of parsing them into a Vec.

I haven't looked at this code closely, so I may be missing something, but the goal is to be zero copy where possible. These tables can potentially be quite large, and the user will only need to access a few entries at a time.

VorpalBlade

Parts of this is clearly AI generated (though I don't necessarily believe all of it is). And a lot of it does actually looks reasonable to me. I don't know what policy this project have on using AI tools.

Still, this PR require greater scrutiny than I could give it in a half hour review.

VorpalBlade · 2025-10-01T09:38:38Z

crates/examples/src/bin/dwarfdump.rs

+            object::Architecture::I386 => "Mach-O i386",
+            _ => "Mach-O",
+        },
+        object::BinaryFormat::Pe => "PE",


Is PE with DWARF even a thing? I thought Microsoft used their own debug info format?

Yes, the MinGW toolchain uses this. GDB doesn't support pdb files.

There's some support for adding DWARF sections to PE/COFF executables in clang-cl and lldb has basic support for DWARF on Windows. It's an unusual combination for sure. I'll remove it.

It's fine to have PE. This is better than printing "Unknown" for PE.

This file format line is unrelated to .debug_names. If we're going to add this, it should be at the start of the output for all sections (in main or dump_file), not here.

src/read/dwarf.rs

src/read/names.rs

src/read/dwarf.rs

VorpalBlade · 2025-10-01T10:06:44Z

src/read/names.rs

+    /// Contains the absolute offset of the parent entry in the entry pool
+    /// if this entry has a parent relationship. None indicates either
+    /// no parent (top-level entry) or that parent information is not indexed.
+    pub parent_info: Option<u64>,


I don't see a list of what attributes are allowed / required / expected for entries in the entry pool.

From 6.1.1.4.8:

Each index entry in the series begins with an abbreviation code, and is followed
by the attributes described by the abbreviation declaration for that code. The last
index entry in the series is followed by a terminating entry whose abbreviation
code is 0.

So I'm a bit confused by the ones you have added here. Would it be better to expose a more generic way to iterate over the attributes, just like is done for .debug_info?

Yes, I think we should start with a generic way to obtain the attributes (and that is what dwarfdump should use), and add helpers on top of that to find particular attributes.

src/read/names.rs

philipc · 2025-10-01T12:12:06Z

I don't know what policy this project have on using AI tools.

No formal policy yet. I haven't noticed any prior AI contributions. I'd expect the following at least:

Be up front about the use of AI. The PR should state how AI was used to assist.
All AI generated code must be fully reviewed by the author prior to submission. Don't waste maintainer time on reviewing low quality AI generated code.

VorpalBlade · 2025-10-01T17:45:28Z

I don't know what policy this project have on using AI tools.

No formal policy yet. I haven't noticed any prior AI contributions. I'd expect the following at least:

Be up front about the use of AI. The PR should state how AI was used to assist.

Not done in this case, obviously.

All AI generated code must be fully reviewed by the author prior to submission.

I think for the most part this was done, but perhaps not for the code documentation comments. (Or perhaps AI was only used for the comments?) The issues in the code itself could plausibly have been human errors that we all make.

The obvious AI generated parts were all in the documentation, which made me scrutinise it closer and start cross checking more references and find more issues.

Don't waste maintainer time on reviewing low quality AI generated code.

I don't think this PR necessarily feels low quality to me. It just feels like the AI wasn't fully checked. I would say it needs a bit of work. Plus checking by someone more familiar with gimli's internals than me. For example: There were some confusing constructs in the unit tests, but I have not compared to other unit tests in gimli to see how they work.

I think the right approach at the moment is a nuanced approach.

philipc · 2025-10-01T22:11:36Z

I don't think this PR necessarily feels low quality to me.

Just to be clear (since my previous comment wasn't): I wasn't making a judgement on this PR, just saying what the policy would be. I still haven't done a review.

tmcgilchrist · 2025-10-01T22:56:26Z

Apologies for the confusion. I used AI tooling to extract/summarise parts of the DWARF spec and add them as code comments. They're too wordy (or inaccurate) for the style of this library, however I do find the links back to specific sections in the DWARF spec useful.

The Rust errors are all my own, unfortunately. Thank you for the review comments, I'll continue working on this.

philipc

I've review the dwarfdump changes only.

philipc · 2025-10-02T03:15:55Z

crates/examples/src/bin/dwarfdump.rs

+        writeln!(
+            w,
+            "Name Index @ 0x{:x} {{",
+            unit_index * header.length().into_u64()


Add a header.offset() method instead.

philipc · 2025-10-02T03:20:03Z

crates/examples/src/bin/dwarfdump.rs

+        writeln!(
+            w,
+            "    Format: {}",
+            match header.format() {
+                gimli::Format::Dwarf32 => "DWARF32",
+                gimli::Format::Dwarf64 => "DWARF64",
+            }
+        )?;


Suggested change

writeln!(

w,

" Format: {}",

match header.format() {

gimli::Format::Dwarf32 => "DWARF32",

gimli::Format::Dwarf64 => "DWARF64",

}

)?;

writeln!(w, " Format: {:?}", header.format())?;

This is what we already use elsewhere. Matching llvm-dwarfdump exactly may be useful, but I'd prefer that to be done separately.

philipc · 2025-10-02T03:23:58Z

crates/examples/src/bin/dwarfdump.rs

+        if header.augmentation_string_size() > 0 {
+            let mut aug_reader = header.augmentation_string().clone();
+            let mut aug_string = Vec::new();
+            while !aug_reader.is_empty() {
+                match aug_reader.read_u8() {
+                    Ok(b) => aug_string.push(b),
+                    Err(_) => break,
+                }
+            }
+            if let Ok(aug_str) = String::from_utf8(aug_string) {
+                writeln!(w, "    Augmentation: '{}'", aug_str)?;
+            }
+        }


Suggested change

if header.augmentation_string_size() > 0 {

let mut aug_reader = header.augmentation_string().clone();

let mut aug_string = Vec::new();

while !aug_reader.is_empty() {

match aug_reader.read_u8() {

Ok(b) => aug_string.push(b),

Err(_) => break,

}

}

if let Ok(aug_str) = String::from_utf8(aug_string) {

writeln!(w, " Augmentation: '{}'", aug_str)?;

}

}

if let Some(s) = header.augmentation_string() {

writeln!(w, " Augmentation: '{}'", s.to_string_lossy)?;

}

Change augmentation_string to return an Option and delete augmentation_string_size from the API.

crates/examples/src/bin/dwarfdump.rs

philipc · 2025-10-02T03:32:09Z

crates/examples/src/bin/dwarfdump.rs

+
+    loop {
+        // Read abbreviation code
+        let abbrev_code = entry_reader.read_uleb128()?;


This is parsing code. That's gimli's job, we shouldn't be doing it in dwarfdump.

philipc · 2025-10-02T03:40:09Z

crates/examples/src/bin/dwarfdump.rs

+            HashMap::new();
+
+        for (name_idx, hash, string_offset, entry_offset) in names {
+            let string_content = resolve_string_name(dwarf, string_offset);


Suggested change

let string_content = resolve_string_name(dwarf, string_offset);

let Ok(string_content) = dwarf.debug_str.get_str(gimli::DebugStrOffset(string_offset)) else {

// TODO print warning

continue;

};

get_str already handles null termination, and we shouldn't proceed with an invalid string.

philipc · 2025-10-02T04:07:13Z

crates/examples/src/bin/dwarfdump.rs

+
+    // Group entries by resolved string content to handle multiple entries per name
+    // Following DWARF 5 spec: "If two different symbol names produce the same hash value,
+    // that hash value will occur twice in the hashes array"


I don't understand why we need to do this. The hash table is grouped by name, and its entries point back to the name, so we already have this grouping. Why are we building up these maps instead of simply iterating over the buckets as we display?

philipc · 2025-10-10T05:59:36Z

I assume you're still working on the previous review comments. I resolved the ones that look like they've been addressed. Let me know when you're ready for further review or need guidance.

Read debug_names section from DWARF 5.

ca83ea3

Adds sample implementation to dwarfdump that prints the same information as llvm-dwarfdump for MachO binaries.

tmcgilchrist force-pushed the debug_names branch from 83cda2a to ca83ea3 Compare October 1, 2025 00:14

philipc reviewed Oct 1, 2025

View reviewed changes

VorpalBlade suggested changes Oct 1, 2025

View reviewed changes

philipc requested changes Oct 2, 2025

View reviewed changes

fixup! Read debug_names section from DWARF 5.

b77e957

-            let string_content = resolve_string_name(dwarf, string_offset);
+            let Ok(string_content) = dwarf.debug_str.get_str(gimli::DebugStrOffset(string_offset)) else {
+                // TODO print warning
+                continue;
+            };

Read debug_names section from DWARF 5. #807

Are you sure you want to change the base?

Read debug_names section from DWARF 5. #807

Uh oh!

Conversation

tmcgilchrist commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VorpalBlade left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philipc commented Oct 1, 2025

Uh oh!

VorpalBlade commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philipc commented Oct 1, 2025

Uh oh!

tmcgilchrist commented Oct 1, 2025

Uh oh!

philipc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philipc commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VorpalBlade left a comment •

edited

Loading

VorpalBlade commented Oct 1, 2025 •

edited

Loading