Questions regarding base64 encoding styles and modifiers #98

L015H4CK · 2023-09-27T10:58:32Z

L015H4CK
Sep 27, 2023

Hello everyone,

in the past months I have been working with Sigma rules for Windows Base64 encoding a lot. Recently, I noticed some inconsistency regarding the modifier in the rules' selection part and the Sigma specification.

I understand that the utf16, utf16be and utf16le modifiers (see specification) are no longer supported by the current version of the pySigma converter (see comparison here). The wide modifier (which basically is utf16le) is still supported.

An example: The rule proc_creation_win_powershell_base64_frombase64string.yml detects Base64 encoding of the string ::FromBase64String with different offsets.

detection:
    selection:
        - CommandLine|base64offset|contains: '::FromBase64String'
        # UTF-16 LE
        - CommandLine|contains:
            - 'OgA6AEYAcgBvAG0AQgBhAHMAZQA2ADQAUwB0AHIAaQBuAGcA'
            - 'oAOgBGAHIAbwBtAEIAYQBzAGUANgA0AFMAdAByAGkAbgBnA'
            - '6ADoARgByAG8AbQBCAGEAcwBlADYANABTAHQAcgBpAG4AZw'
    condition: selection

Now to my actual questions with the above rule as an example case:

Why does the rule include the clear-text string with base64offset modifier as well as the encoded strings? This might result in a redundancy after converting the rules.
Shouldn't the first part (CommandLine|base64offset|contains) also contain the wide modifier to define the UTF-16 LE encoding? When converting the rules on a Linux machine, the resulting encoded strings might be wrong because they are encoded in UTF-8. Using the `wide´ modifier it is ensured that the clear-text strings are encoded using the right format.
Is UTF-8 defined as "default" when using the base64offset modifier or does the resulting format depend on the OS running the converter (did not have the time to check this myself..)? Would it make sense to define UTF-16 LE as default encoding because most rules are for Windows?
Some rules only use the clear-text strings (example), others only use the encoded strings (example) and others use both (see above). Is this a wanted state of the rules or is this just a result of the before-mentioned problem with inconsistency with base64 encoding modifiers?
Using both (clear-text and encoding) might result in unwanted rule states where not every clear-text string is encoded as well (example) because some clear-text strings were added in a rule update

And as final question:

What is the actually desired representation of base64 encoded strings in sigma rules? Currently, the sigma rules do not give a clear answer to this. Options are:
- Clear-text with wide|base64offset modifiers
- Clear-text with only base64offset modifier
- Base64encoded strings
- Clear-text and base64 encoded

If more information or examples are needed I will gladly supply those.

Thanks for your answers and input in advance!

Best regards,
L015

Additional info: I noticed the problem first when rules with base64offset modifier did not match event logs that should have actually matched. After investigating, I noticed that the converter did not use the correct format and thus the search strings were different that the actual encoded strings (UTF-8 vs. UTF-16 LE). First, I thought it was a problem with my converter but after re-reading the specification and investigating several rules I noticed the above-mentioned inconsistency,

Edit: Whoops, sorry for opening this in the wrong place! Thanks for moving!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions regarding base64 encoding styles and modifiers #98

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Questions regarding base64 encoding styles and modifiers #98

Uh oh!

Uh oh!

L015H4CK Sep 27, 2023

Replies: 0 comments

L015H4CK
Sep 27, 2023