Skip to content

Clarify bit-to-byte ordering in bitstring serialization #211

@oriolcanades

Description

@oriolcanades

Summary

The specification does not explicitly define the bit ordering within bytes when serializing the bitstring prior to GZIP compression. This ambiguity is causing interoperability issues between independent implementations.

Current specification text

Section 3.3 (Bitstring Generation Algorithm) states:

"Let bitstring be a list of bits with a minimum size of 16KB, where each bit is initialized to 0 (zero)."

The algorithm describes the bitstring as an abstract list of bits, but does not specify how this list should be packed into bytes before compression.

The ambiguity

When mapping bit index n to a byte array, there are two possible conventions:

Convention Byte index Bit position within byte Example: bit 0 set
MSB-first floor(n / 8) 7 - (n mod 8) 0b10000000 (128)
LSB-first floor(n / 8) n mod 8 0b00000001 (1)

Real-world impact

We encountered this issue when integrating two independently developed implementations: one Issuer using LSB-first encoding and one validator using MSB-first decoding. Status verification failed despite both implementations following the specification as written.

Ecosystem observation

Digital Bazaar's implementation uses MSB-first ordering, which appears to be the prevailing convention in the ecosystem.

Proposal

Add normative language to Section 3.3 specifying the bit-to-byte mapping. Suggested addition:

"When serializing the bitstring to a byte array, implementations MUST use most-significant-bit-first (MSB-first) ordering. Specifically, bit index n MUST be stored in byte floor(n / 8) at bit position 7 - (n mod 8), where bit position 7 represents the most significant bit of the byte."

Questions for the working group

  1. Is MSB-first the intended convention?
  2. Should this be explicitly documented in the specification?
  3. Would it be useful to include a test vector (e.g., "a bitstring with only bit 0 set compresses to X") to help implementers verify correctness?

Thank you for your consideration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions