Skip to content

Issue #1761: handle new bin_attribute format for module sections #1773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from

Conversation

Abyss-W4tcher
Copy link
Contributor

@Abyss-W4tcher Abyss-W4tcher commented Apr 15, 2025

Hi,

This PR suggests a way to handle the removal of module_sect_attr structure to the benefit of bin_attribute.

The nsections member of module's sect_attrs was also removed in 6.14-rc1:

torvalds/linux@d8959b9

The kernel-way to count the number of sections is to iterate over the NULL-terminated array of pointers pointed by module.sect_attrs.grp.bin_attrs. Using module.sect_attrs.grp.attrs was proven to dereference a NULL pointer, but I think this wasn't spotted before because:

  • get_sections() wasn't called anywhere (and still isn't)
  • nsections would have still been accessible anyways

On another note, it seems that module_extract and class module have duplicated code, which could be unified directly in class module and called with my_mod.get_sections() for example.


Tested on an Ubuntu 25.04 sample:

Linux version 6.14.0-11-generic (buildd@lcy02-amd64-043) (x86_64-linux-gnu-gcc-14 (Ubuntu 14.2.0-19ubuntu1) 14.2.0, GNU ld (GNU Binutils for Ubuntu) 2.44) #11-Ubuntu SMP PREEMPT_DYNAMIC Mon Mar 17 12:39:41 UTC 2025 (Ubuntu 6.14.0-11.11-generic 6.14.0-rc7)

Fixes #1761.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. A couple of hard coded values that I'd like to check can't be programattically determined, and a question about the code duplication, but that's not a show stopper...

subtype=self._context.symbol_space.get_type(
symbol_table_name + constants.BANG + "pointer"
),
count=25,
count=50,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always a little nervous about random hard coded numbers. Is this defined anywhere we could find it, is 50 enough, does that number come from source code or how was 50 decided upon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the answer to this question, for now I just duplicated code from modules.py, given the context.

As this is a NULL terminated array of pointers, it would be optimal to iterate over it that way, or even extend current utilities for it like array_of_pointers(null_terminated=True).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a number of string functions that do null terminate, but they should all always have a maximum value (otherwise we could just keep running forever). I'm just trying to find out what the appropriate maximum value should be and why (if anyone knows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any progress on where this value came from? Ideally we'd avoid this the same was as we avoid the other one (or at least comment where the size of the value comes from).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a change that makes it a NULL-terminated iteration (just as the structure's design). It is safer to make a NULL-terminated array of pointers iteration, as if it encounters a value that cannot be dereferenced the loop will break immediately.

@ikelos
Copy link
Member

ikelos commented Apr 15, 2025

Ok, so... how should we proceed? Should we just stick with the constants, open a ticket to note down where they are, or try and find out where they should come from and fix them before merging this?

@Abyss-W4tcher
Copy link
Contributor Author

Maybe this refinement can happen in the consolidation PR, as I think it'll contextualize better. Given the current schedule I think it'll also leave us more time to fix it more consistently?

@ikelos
Copy link
Member

ikelos commented Apr 15, 2025

Well, there's no rush? This is going into develop, so if it needs more time, then by all means take it. Just lemme know when you're happy for it to go in...

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Apr 16, 2025

I didn't want to push too much changes into this one, given @atcuno request to include it in 2.26.1. I think the goal was to make the mandatory changes for 6.14-rc1+ analysis without an early crash, which is already distributed in Ubuntu 25.04 for example.

Happy to make the other changes we talked about in another PR if that is ok for you, while #1761's discussion is not concluded?

@ikelos
Copy link
Member

ikelos commented Apr 16, 2025

2.26.1 isn't the release branch (that's 2.26.0)? If you meant to target the release branch, please change the PR. There's a fair number of changes in this though, so I'd be happier just to put it in develop?

@ikelos
Copy link
Member

ikelos commented Apr 16, 2025

Again, I don't see support for only-just published kernels as needing to be rushed into this version rather than waiting for the next one in 3-4 months. It doesn't feel critical (most of volatility works fine for the majority of images), and we're only two weeks from release (meaning if this messed things up in some way, we'd be rushing to get it fixed). I'm open to reasons why this has to be in this version, but they'd have to be pretty good reasons...

@ikelos
Copy link
Member

ikelos commented May 6, 2025

So... I'm not sure I'm keen on us just looping forever, and doing all that maths manually isn't a good plan. I'd almost sooner you set a ridiculously large count than reconstruct everything the array is supposed to do just so it can be unbounded? All I really wanted was a rationale for why that number had been picked? We can even just say "we think this is as many as there will ever be", but we need to say why it was chosen in case someone discovers a case where it's too small and wants to know why we stop there?

@Abyss-W4tcher
Copy link
Contributor Author

I see, but using a while loop is intended by design for this NULL-terminated array. Using a for loop and constructing an array with an arbitrary size beforehand does not really make sense either.

Still, I can add a guard value (while ... and i < GUARD_VALUE) for extreme scenarios (millions of unrelated valid pointers appended at the end of this array). I think that is more coherent this way than the other way around?

@ikelos
Copy link
Member

ikelos commented May 7, 2025

If you're implementing a guard statement, it has the same effect as the array max size? The original while loop was just fine, it only needed a comment to say where the guard size came from. Reimplementing loops is a good way to slip up (maybe not this time, but in general) and create a hard to spot corner case where something goes wrong. Using existing machinery to do it is far better (and also means if we develop ways of doing the same task but using less memory, we only have to implement the fix in one place rather than every plugin that decides to rewrite looping methods we already have).

The old method also stopped not just on NULL, but on any unreachable pointers. I'm not sure if that's better or worse, but it's certainly far less code to have to reason about?

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented May 7, 2025

This method also stops on unreachable pointers (which includes NULL of course) through <ptr>.is_readable() ? Here is how the kernel does it since nsections was removed:

image

As you can see it iterates over the array while a valid pointer is found. The existing machinery sometimes does not fit exactly what we want to do, which is the case here. Using a while loop also prevents instantiating an array the size of the safe guard we decided to use even if we end up iterating over 20% of it. The code size is actually equivalent as well (we only do one context.object call)?

We already use this kind of mechanism, for example here:

while current_bucket_ptr.is_readable():
yield current_bucket_ptr.dereference().cast("ftrace_func_entry")
current_bucket_ptr = current_bucket_ptr.next

because we cannot always come up with a maximum iteration value. In the ftrace case this could be leveraged by rootkits to hide tracing entries past our safe guard.

In summary my goal is simply to align more closely with the kernel code, while adding our own safe guards (e.g. against smearing) 👍

edit: Here is the guard value we could use:

        # We chose 100 as an arbitrary guard value against smearing,
        # as it is highly unlikely that any module will ever have 
        # this many sections.
        while entry.is_readable() and idx < 100:

@ikelos
Copy link
Member

ikelos commented May 7, 2025

The code you've suggested to swap to is still several lines long and bogged down in object construction for something that already appears to be a pointer?

If you truly think there shouldn't be a guard statement, then I'll go along with it (the is_readable is a reasonable check). Could we check whether simply doing cast would be better easier than all the potentially error-prone type find/maths (the repeated type finding will be extremely inefficient and could be done once before the loop at a bare minimum)?

It really doesn't feel like it needs as much code as has been used to achieve what I think you're trying to achieve and I'd prefer it be readable rather than mired in object creation code without good cause...

@ikelos
Copy link
Member

ikelos commented May 7, 2025

The existing machinery sometimes does not fit exactly what we want to do, which is the case here.

I don't understand how it doesn't fit here. Could you clarify why this deviates from simply iterating over an array (which, incidentally, isn't what ftrace does, ftrace follows an attribute of each item).

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented May 7, 2025

The existing machinery sometimes does not fit exactly what we want to do, which is the case here.

I don't understand how it doesn't fit here. Could you clarify why this deviates from simply iterating over an array (which, incidentally, isn't what ftrace does, ftrace follows an attribute of each item).

We are still iterating over an array, the ""issue"" here is that we are not supposed to know the size of this array beforehand. The size we set is more of a "smearing guard" value, which is not exactly the same as the array size. As discussed, there is no place in the kernel where "50" is declared as this exact array size.

I'm happy with setting a guard size, even if this does not cover the case where obscure rootkits setup hundreds of sections just to mess with us and store their payloads in sections skipped because of the guard size.

Could we check whether simply doing cast would be better easier than all the potentially error-prone type find/maths (the repeated type finding will be extremely inefficient and could be done once before the loop at a bare minimum)?

entry.vol.offset + ptr_size is only an integer, not a vol object, thus I don't think we can use cast on it?

True, I can assign the pointer size before the loop 👍. I think the pointer arithmetics are still performed the same by Vol3 under the hood?

By the way, I checked the Ubuntu 6.14.0 sample I have and some modules have 38 sections, which would already have made the previous 25 constant miss a lot of them. 100 should be good, but that's part of the dangers of using a guard value :/.

@ikelos
Copy link
Member

ikelos commented May 7, 2025

The only reason we're doing the maths to figure out the next one is because... it isn't in an array? That's precisely what arrays are there to do! If it's just what the guard value should be, then we can choose a ridiculously big one and say we expect the invalid pointer to happen before the guard is hit. As long as we document it so it's not a random 25 (and someone has to scrabble through the code to figure out whether that's accurate or an invented value in the middle of a dense codebase) then it'll be fine! 5;P

It probably is what we do under the hood, but the point is to have it done by the same code so that if it needs changing/fixing/making more efficient, it can be done once for everything? Given the old code seemed to work, and the only issue I had was where the value 25/50 came from, it seems like it's easier to understand as long as we document why 25/50 was chosen. That's literally all we need to do for this. It doesn't sounds like all the extra object creation complexity changing the outcome in any way...

@Abyss-W4tcher
Copy link
Contributor Author

If we use a ridiculously big value, then the context.object("array") will still instantiate a very large array in memory even if 99.9% of the time it will iterate over no more than 5% of it.

I don't think that an identical outcome always justifies keeping the core code identical, currently we are not iterating over this array the right way.

I suggested a more accurate way to do this, that might come helpful for other use cases and is quite modular. Tell me what you think about this design, which in my opinion achieves an even better level of abstraction and readability than the previous implementation. The resources in the docstring also help a lot to see why it is done that way.

@Abyss-W4tcher
Copy link
Contributor Author

I know it might sound overkill when we could indeed just put one comment and keep the old way, but if we can achieve greater precision and coherency with the kernel I think it is worth the effort.

@ikelos
Copy link
Member

ikelos commented May 7, 2025

Well, ok, whatever changes you needed to achieve grater precision and coherency, go with them, but please use the array mechanism rather than manually constructing the objects? There should be functionally no difference between a guarded version of your object construction, and a vast array, except the code that acts on it will be much shorter/easier to read and understand and that all the code we use for handling arrays would live in one place. I don't see how your solution achieves better precision or coherency, but I feel those differences can be applied to the array creation method, rather than a strange bespoke guarded manual object iterator thing...

I feel like I'm missing something causing you to keep pushing for this technique, rather than the other one, so please try to explain it to me? From my perspective, the code feels functionally identical (and indeed, was the original version you submitted, so you seemed happy with it too just a few days ago)?

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented May 7, 2025

Yeah so originally as the output was the same I didn't realize it wasn't the correct way of iterating.

When we iterate over arrays, we know their start and end beforehand. The end is most of the time directly stored in the symbols type definition or stored in another kernel symbol.

However, NULL-terminated arrays sizes cannot be predicted by the kernel at compile time. Indeed, the kernel cannot predict how many sections an arbitrary module loaded by a user will have for example (at least for > 6.14.0).

In doing so, we have to check that the first member/pointer is not NULL, and only then move to the next entry and repeat the process. Using already existing array mechanisms would not really apply here, hence why I added this new dynamically_sized_array_of_pointers helper.

Please note that the "guard mechanism" is only a sanity mechanism inherent to memory forensics to account for smearing, but not something that we should conceptually see as a part of the original kernel iteration mechanism.

You can also notice that the original code (check the PR diff) does a while loop right after the context.object("array") call, which is exactly trying to find this NULL entry. One in the other, I think this new version actually makes it easier to read.

Hopefully that makes sense, sorry if it seems a bit far fetched, but we don't want to confuse arrays that have predictable sizes with ones that don't.

@ikelos
Copy link
Member

ikelos commented May 7, 2025

So... I think I get what you're saying, but I fundamentally disagree that:

        while arr[idx] and arr[idx].is_readable():
            idx = idx + 1

is more difficult to read than

        while entry.is_readable():
            idx += 1
            entry = self._context.object(
                symbol_table_name + constants.BANG + "pointer",
                layer_name=self.vol.layer_name,
                offset=entry.vol.offset
                + self._context.symbol_space.get_type(
                    symbol_table_name + constants.BANG + "pointer"
                ).size,
            )

?

I understand we don't know the end of the array, and I'm fine with us using a ridiculously large number just as a safety guard, as long as we comment it (so we don't end up in a pages long discussion about the number 50 again in the future). 5;P I think the way that we deal with strings is to set a maximum limit, then read upwards to the first null and discard the rest. Doing the same with an array seems like it would make sense too (and not confuse people). If you'd prefer, we could add an additional method to the utility class called something like null_terminated_array that you hand in a starting value and it spits you out entries until it hits null. BUT, it should do this using the existing array machinery we've already got, rather than being identical (duplicated) code that does the maths itself. I'm open to whether it takes an array and just increases the count to a huge arbitrary number (parameterized but defaulting to something enourmous) or whether it takes in a start address and a type and builds the array itself.

Would that satisfy your concerns about people a) misunderstanding it and b) thinking it looks complicated? At long as we're using the existing code for traversing what is an array (be it bounded or unbounded) then I'm fine with it living off in a shared location so it can be reused, we just really, really, really shouldn't be reinventing the wheel with 9 lines of duplicate custom code to calculate array members simply because they may become rubbish after a null value...

@Abyss-W4tcher
Copy link
Contributor Author

Oh, I think there was a misunderstanding as I pushed this helper and other commits this afternoon.

Please check the latest diff and tell me if it fits the framework better?

@Abyss-W4tcher
Copy link
Contributor Author

Based on that, as I said earlier even in the new helper I don't think we can rely on existing vol3 array mechanisms for unbound arrays? The current ones always expect an end value?

@ikelos
Copy link
Member

ikelos commented May 7, 2025

You're right, I hadn't seen it, but that's not using an array type at all, and it forces the target type to be a pointer but also doesn't allow the user to specify the subtype it points to.

I really think we can rely on the array type. Simply create or be handed the array, then use the count setter to make it a huge size, run through it until you find the null entry (or hit the guard) and then set the count to that and return the array. It will not read entries you don't ask it to, so as long as you go through the indices in order (and stop if there's an address exception) there should be no problem at all.

Can you see a reason why it wouldn't work, or shouldn't be used? So far you seem to suggest you don't think it's right to (although it's now hidden in a helper function) and... I think that's about it? Is there another technical reason that using the array code wouldn't work properly?

@ikelos
Copy link
Member

ikelos commented May 7, 2025

If we're handed the array, it should already be primed with the correct subtype so that would be all we'd need. We could also optionally accept a guard value if people know a specific maximum, but it's not necessary we could just choose an exceedingly high one (and log if it ever gets hit). We probably want to allow InvalidAddressExceptions through to the caller to deal with. The other option would be taking a start offset and a subtype and constructing an array, but in either instance the caller would have the chance to specify the type of the array (and if it's a pointer, the pointer's subtype if they wanted). That should make the function much more applicable to all circumstances.

@ikelos
Copy link
Member

ikelos commented May 7, 2025

If we wanted we could instead yield as the items are traversed. I haven't thought about which might be better?

@Abyss-W4tcher
Copy link
Contributor Author

Yeah so using utility.array_of_pointers could work, because I realized I was maybe wrongfully thinking it was returning a list and not an iterator.

In the second case then yes it wouldn't go past the NULL entry, but in the first case it would construct an array the size of our large value before we can iterate over it.

I can't check right now, but does your last comment implies that it isn't an iterator?

@ikelos
Copy link
Member

ikelos commented May 7, 2025

In the second case then yes it wouldn't go past the NULL entry, but in the first case it would construct an array the size of our large value before we can iterate over it.

You can construct an array of any size and (just like a structure of any size) it won't actually try and read anything of any size until it's requested). Here you can see __getitem__ when the array is indexed. Until it's indexed, nothing is constructed, no extra memory is used. You can have a 10000000 entry array and have it take up no more space than a 1 entry array until you access its elements...

I hadn't been intending it to be an interator, I thought it would return the array allowing the user to pick and choose the entries they wanted (in case it a was a long list and they knew they only needed the 200th entry)? Since we have to iterate once to find the maximum, it may not be more efficient in this case, so that's a design decision I'm open to decide. We just have to figure out what the we think the user will want most...

@@ -10,12 +10,14 @@
Dict,
)

import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module volatility3.framework.symbols.linux.utilities.modules begins an import cycle.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, unfortunately the module address space helper is in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not close this conversation yet. I'll re-open it if CodeQL resolves it.

@Abyss-W4tcher
Copy link
Contributor Author

Thanks for the details, I think it basically is the same as an iterator in the end. I initially thought that a thousand Pointer objects would be created but if we iterate over an Array object it does not consume extra memory.

I made adjustments to the new helper, and also added sanity checks in the ModuleExtract class, as it wasn't validating section addresses. You can notice that I use array_of_pointers twice inside of dynamically_sized_array_of_pointers to fully leverage the Array object even in the return value. A bit of code duplication but it avoids any live re-sizing of the first array.

@ikelos
Copy link
Member

ikelos commented May 8, 2025

Ok, so as I see it, there's two ways to (eventually) split this:

  • Extraction is a feature built-on top of the general module functionality, therefore the module functionality doesn't do dumping itself and callers that want it have to do it themselves or...
  • Extraction is a core part of the Module functionality and it should all live in the same class/file

Roughly happy with either. I do get a little concerned about features that try to be all singing and dancing, but since it makes sense to do the dumping in the middle of the generation, it might be the best move is to deprecate and shift the module_extract code into the module_display_plugin code. Having said that, module_display_plugin definitely feels like it's overreaching by doing more than just displaying... 5:S

@Abyss-W4tcher
Copy link
Contributor Author

So in the end I think you can merge #1801 and I'll handle potential conflicts here (there shouldn't be any) as well as a new design to fix the circular import?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bleeding Edge Kernel versions drop module_sect_attr
3 participants