Skip to content

String length expressed as byte or character count for bencode #92

@trantor

Description

@trantor

Hello.

First of all, thanks a lot for the tool.
I am, however, encountering problems when dealing with data encoded with bencode.
It's a problem I've come across time and again and hopefully one you can address.
From what I've seen you've interpreted the string length as the number of bytes the string is encoded as, which should be fine.
Since, I guess, the original specs of the format, if we can call them that, were less than crystal clear as to what string length meant, there are many implementations around interpreting the string length as the character count, in Unicode terms the count of codepoints present in the string.
Could you create a variant of the bencode format supported by faq that matches the variant interpretation of string length described above? It would make my life a lot easier dealing with these sorts of systems.

Just as a reference, faq would encode (arguably correctly) the JSON { "a": "à" } as the bencode-d d1:a2:àe, while the variant format would encode it as d1:a1:àe, assuming UTF-8 encoded strings.

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    exploratoryResearch and opinions are needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions