|
| 1 | +# binary |
| 2 | + |
| 3 | +A binary serialization package for Solana wire formats. Vendored from |
| 4 | +`github.com/gagliardetto/binary` with substantial performance work on |
| 5 | +top (see [BENCH.md](BENCH.md) for the headline numbers). |
| 6 | + |
| 7 | +Handles three Solana-relevant encodings: |
| 8 | + |
| 9 | +| Constant | Used for | Length prefix | |
| 10 | +| -------------------- | ------------------------------------------------------- | ---------------- | |
| 11 | +| `EncodingBin` | bincode-style (fluxd / legacy Solana tooling) | uvarint | |
| 12 | +| `EncodingBorsh` | Anchor programs, SPL state accounts, general Solana IDL | u32 LE | |
| 13 | +| `EncodingCompactU16` | Solana transaction / message length prefixes | 1-3 byte compact | |
| 14 | + |
| 15 | +Unless you know you need bincode, you almost always want `Borsh` for |
| 16 | +program/account state and `CompactU16` for raw transaction/message |
| 17 | +parsing. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Quick start |
| 22 | + |
| 23 | +Most callers only need the top-level marshal helpers. They work with |
| 24 | +plain Go types and struct tags. |
| 25 | + |
| 26 | +```go |
| 27 | +import bin "github.com/gagliardetto/solana-go/binary" |
| 28 | + |
| 29 | +type Foo struct { |
| 30 | + A uint64 |
| 31 | + B string |
| 32 | + C []byte |
| 33 | +} |
| 34 | + |
| 35 | +// Encode. |
| 36 | +wire, err := bin.MarshalBorsh(&Foo{A: 7, B: "hi", C: []byte{1,2,3}}) |
| 37 | + |
| 38 | +// Decode. |
| 39 | +var out Foo |
| 40 | +err = bin.UnmarshalBorsh(&out, wire) |
| 41 | +``` |
| 42 | + |
| 43 | +The three encoding variants share the same signature: |
| 44 | + |
| 45 | +```go |
| 46 | +bin.MarshalBin(v any) ([]byte, error) // uvarint lengths |
| 47 | +bin.MarshalBorsh(v any) ([]byte, error) // u32 LE lengths |
| 48 | +bin.MarshalCompactU16(v any) ([]byte, error) // compact-u16 lengths |
| 49 | + |
| 50 | +bin.UnmarshalBin(v any, b []byte) error |
| 51 | +bin.UnmarshalBorsh(v any, b []byte) error |
| 52 | +bin.UnmarshalCompactU16(v any, b []byte) error |
| 53 | +``` |
| 54 | + |
| 55 | +These go through a reflection-driven encoder with a pooled internal |
| 56 | +buffer. One allocation per call for the returned slice; zero per-field |
| 57 | +allocations on the encode path. |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## Struct tags |
| 62 | + |
| 63 | +Fields control their wire layout through the `bin` struct tag. All tags |
| 64 | +are space-separated tokens inside a single `bin:"..."` string. |
| 65 | + |
| 66 | +| Tag | Effect | |
| 67 | +| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | |
| 68 | +| `sizeof=<fieldName>` | This field's decoded value is the length of the named slice/array field later in the struct. Used when a separate count field precedes a body. | |
| 69 | +| `big` | Encode/decode multi-byte integers as big-endian. | |
| 70 | +| `little` | Force little-endian (default already for borsh and compact-u16). | |
| 71 | +| `optional` or `option` | Field is preceded by a 1-byte "present" flag (Rust `Option<T>`). | |
| 72 | +| `coption` | Field is preceded by a 4-byte "present" flag (Solana C-style Option). | |
| 73 | +| `binary_extension` | Anchor "binary extension" sentinel: present-if-remaining-bytes. | |
| 74 | +| `skip` or `-` | Skip this field on both encode and decode. | |
| 75 | +| `enum` | Struct is the tagged-union body of a borsh enum. | |
| 76 | + |
| 77 | +Also recognized for compatibility with upstream borsh libs: |
| 78 | + |
| 79 | +| Tag | Effect | |
| 80 | +| ------------------- | ----------------------- | |
| 81 | +| `borsh_skip:"true"` | Alias for `bin:"skip"`. | |
| 82 | +| `borsh_enum:"true"` | Alias for `bin:"enum"`. | |
| 83 | + |
| 84 | +### Example |
| 85 | + |
| 86 | +```go |
| 87 | +type Instruction struct { |
| 88 | + Discriminator uint8 |
| 89 | + NumAccounts uint8 `bin:"sizeof=AccountIdx"` |
| 90 | + AccountIdx []uint8 |
| 91 | + Data []byte // length-prefixed as bincode: uvarint |
| 92 | + Tip *uint64 `bin:"optional"` |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Custom types (preferred for hot paths) |
| 99 | + |
| 100 | +Types that can serialize themselves can implement the marshaler |
| 101 | +interfaces. This is the mechanism `solana.PublicKey`, `solana.Signature`, |
| 102 | +and most program types use internally. |
| 103 | + |
| 104 | +```go |
| 105 | +type BinaryMarshaler interface { |
| 106 | + MarshalWithEncoder(encoder *Encoder) error |
| 107 | +} |
| 108 | + |
| 109 | +type BinaryUnmarshaler interface { |
| 110 | + UnmarshalWithDecoder(decoder *Decoder) error |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +When both are implemented, the reflection encoder/decoder detects the |
| 115 | +type and dispatches to the custom method -- no per-field reflection cost. |
| 116 | +This is the fastest generic path and what program packages should |
| 117 | +implement. |
| 118 | + |
| 119 | +```go |
| 120 | +func (p *Pubkey) MarshalWithEncoder(e *bin.Encoder) error { |
| 121 | + return e.WriteBytes(p[:], false) |
| 122 | +} |
| 123 | + |
| 124 | +func (p *Pubkey) UnmarshalWithDecoder(d *bin.Decoder) error { |
| 125 | + _, err := d.Read(p[:]) |
| 126 | + return err |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +--- |
| 131 | + |
| 132 | +## Faster paths |
| 133 | + |
| 134 | +The package exposes three zero-allocation escape hatches for latency- |
| 135 | +sensitive code. Each one is progressively more unsafe in exchange for |
| 136 | +more speed. Use the lowest-cost option that fits your constraints. |
| 137 | + |
| 138 | +### 1. Pre-sized buffer encode -- `MarshalXxxInto` |
| 139 | + |
| 140 | +If the caller already knows the wire size, the encoder can write |
| 141 | +straight into the caller's buffer. Zero allocations, roughly 20-30% |
| 142 | +faster than `Marshal*`. |
| 143 | + |
| 144 | +```go |
| 145 | +size, _ := bin.BorshByteCount(&foo) // or compute statically |
| 146 | +buf := make([]byte, size) |
| 147 | + |
| 148 | +n, err := bin.MarshalBorshInto(&foo, buf) |
| 149 | +wire := buf[:n] |
| 150 | + |
| 151 | +// io.ErrShortBuffer is returned if buf is too small. buf is never |
| 152 | +// reallocated -- this is the zero-alloc guarantee. |
| 153 | +``` |
| 154 | + |
| 155 | +For repeat encodes into the same destination, keep an `*Encoder` and |
| 156 | +re-target it with `ResetInto`: |
| 157 | + |
| 158 | +```go |
| 159 | +enc := bin.NewBorshEncoderInto(nil) |
| 160 | +for _, msg := range messages { |
| 161 | + enc.ResetInto(scratch) |
| 162 | + _ = enc.Encode(msg) |
| 163 | + send(enc.Bytes()) |
| 164 | +} |
| 165 | +``` |
| 166 | + |
| 167 | +### 2. Hand-rolled encoders -- `Cursor` |
| 168 | + |
| 169 | +When you write the encoding logic yourself (e.g. hot-path program |
| 170 | +instructions), `Cursor` skips the error-returning Encoder primitives |
| 171 | +and does one memory poke per call. 6-12x faster than the Encoder for |
| 172 | +primitive-heavy code; matches hand-written `binary.LittleEndian.Put*` |
| 173 | +calls in generated assembly. |
| 174 | + |
| 175 | +```go |
| 176 | +buf := make([]byte, txHeaderSize) |
| 177 | +wire := bin.NewCursor(buf). |
| 178 | + WriteU8(hdr.NumReqSigs). |
| 179 | + WriteU8(hdr.NumROSigned). |
| 180 | + WriteU8(hdr.NumROUnsigned). |
| 181 | + WriteLenCompactU16(len(accounts)). |
| 182 | + // ... |
| 183 | + Written() |
| 184 | +``` |
| 185 | + |
| 186 | +Methods return `*Cursor`, so writes chain. A `Cursor` owns nothing -- |
| 187 | +out-of-bounds writes panic (standard Go slice bounds panic). Pre-size |
| 188 | +correctly, or use `Encoder.NewXxxEncoderInto` for error-returning |
| 189 | +writes. |
| 190 | + |
| 191 | +Available methods: `WriteU8/16/32/64LE/BE`, `WriteI*`, `WriteBool`, |
| 192 | +`WriteF32/64LE/BE`, `WriteBytes`, `WriteZero`, `Skip`, `WriteUvarint`, |
| 193 | +`WriteVarint`, `WriteLenBin`, `WriteLenBorsh`, `WriteLenCompactU16`. |
| 194 | +Back-patching via `SetPos(n)` / `Pos()` / `Written()`. |
| 195 | + |
| 196 | +### 3. In-place mutation -- `ViewAs` |
| 197 | + |
| 198 | +When the task is "patch field X in a pre-built wire buffer and send |
| 199 | +it," do not decode, mutate, and re-encode. Reinterpret the buffer as a |
| 200 | +typed pointer and write through it. |
| 201 | + |
| 202 | +```go |
| 203 | +// Patch a 32-byte recent blockhash at a known offset. |
| 204 | +v, err := bin.ViewAs[bin.Blockhash](wire[blockhashOffset:]) |
| 205 | +if err != nil { return err } |
| 206 | +*v = newBlockhash |
| 207 | + |
| 208 | +// Mutations are visible in the original wire[] -- no copy. |
| 209 | +send(wire) |
| 210 | +``` |
| 211 | + |
| 212 | +`ViewAs[T]` returns `*T` aliasing the byte slice. `ViewSliceAs[T](buf, n)` |
| 213 | +returns a `[]T` alias. |
| 214 | + |
| 215 | +**Constraints.** T must be a POD type with no Go-compiler-inserted |
| 216 | +padding. Run `bin.AssertPOD[T]()` once at program start (e.g. from |
| 217 | +`init()`) to catch layout violations before they silently mis-patch |
| 218 | +the wire: |
| 219 | + |
| 220 | +```go |
| 221 | +func init() { |
| 222 | + bin.MustAssertPOD[Blockhash]() |
| 223 | +} |
| 224 | +``` |
| 225 | + |
| 226 | +Safe shapes: fixed-size byte arrays (`[32]byte`, `[64]byte`), structs |
| 227 | +of same-size integer fields, homogeneous fixed-size arrays. Unsafe: |
| 228 | +mixed-width packed structs like `{uint8; uint64}` where Go adds 7 |
| 229 | +bytes of padding but the wire format is tight. `AssertPOD` flags these. |
| 230 | +For packed formats, use `Cursor` with byte-level offsets instead. |
| 231 | + |
| 232 | +~730x faster than a decode-then-encode round trip on a single-field |
| 233 | +patch. |
| 234 | + |
| 235 | +### 4. Generic memcpy marshal -- `MarshalPOD[T]` / `UnmarshalPOD[T]` |
| 236 | + |
| 237 | +For types that satisfy the same POD constraints as `ViewAs`, the whole |
| 238 | +struct can be marshaled or unmarshaled as a single `memcpy` rather than |
| 239 | +field-by-field through reflection. Typed via Go generics -- no reflection |
| 240 | +walk, no `any` boxing, no interface dispatch. The compiler often inlines |
| 241 | +small fixed sizes into register moves. |
| 242 | + |
| 243 | +```go |
| 244 | +var key bin.Pubkey |
| 245 | +for i := range key { |
| 246 | + key[i] = byte(i) |
| 247 | +} |
| 248 | + |
| 249 | +// Marshal into a pre-sized buffer (zero-alloc, detached copy of *v). |
| 250 | +dst := make([]byte, 32) |
| 251 | +n, err := bin.MarshalPOD(&key, dst) |
| 252 | + |
| 253 | +// Or alloc-and-return for ergonomics: |
| 254 | +wire := bin.MarshalPODAlloc(&key) |
| 255 | + |
| 256 | +// Unmarshal: copy wire bytes into *v (detached from src). |
| 257 | +var decoded bin.Pubkey |
| 258 | +err = bin.UnmarshalPOD(&decoded, wire) |
| 259 | +``` |
| 260 | + |
| 261 | +Unlike `ViewAs` (which aliases the source buffer), `MarshalPOD` and |
| 262 | +`UnmarshalPOD` produce *detached* copies -- mutating one side after the |
| 263 | +call does not affect the other. Use `ViewAs` when you want live alias |
| 264 | +access; use `MarshalPOD` / `UnmarshalPOD` when you want a clean separation |
| 265 | +between wire bytes and your owned struct. |
| 266 | + |
| 267 | +Same POD constraints as `ViewAs`: run `AssertPOD[T]()` once at program |
| 268 | +start to verify. Unsafe for padded structs, heterogeneous packed wire |
| 269 | +formats, or big-endian hosts. |
| 270 | + |
| 271 | +Benchmarks: |
| 272 | + |
| 273 | +| Operation | Reflective path | MarshalPOD path | |
| 274 | +| ------------------------ | --------------: | --------------: | |
| 275 | +| Marshal 32-byte Pubkey | 58 ns | ~0.25 ns (inlined to register moves) | |
| 276 | +| Marshal 64-byte struct | 118 ns | ~0.25 ns | |
| 277 | +| Unmarshal 64-byte struct | 60 ns | ~0.76 ns | |
| 278 | + |
| 279 | +See [BENCH.md](BENCH.md) for the full table. |
| 280 | + |
| 281 | +--- |
| 282 | + |
| 283 | +## Bounding untrusted input |
| 284 | + |
| 285 | +When decoding data from the network (RPC, websocket subscriptions, |
| 286 | +untrusted block data) a malicious length prefix can trigger arbitrarily |
| 287 | +large allocations. The decoder has two opt-in caps: |
| 288 | + |
| 289 | +```go |
| 290 | +dec := bin.NewBorshDecoder(payload). |
| 291 | + SetMaxSliceLen(256). // reject slice prefixes > 256 elements |
| 292 | + SetMaxMapLen(64) // reject map prefixes > 64 entries |
| 293 | + |
| 294 | +if err := dec.Decode(&v); err != nil { |
| 295 | + // errors.Is(err, bin.ErrSliceLenTooLarge) == true on cap violation |
| 296 | +} |
| 297 | +``` |
| 298 | + |
| 299 | +Default (no caller-set cap) preserves historical behavior. Internally |
| 300 | +the decoder also enforces `len * minElementWireSize <= Remaining()` so |
| 301 | +a malicious `[]Pubkey` prefix claiming 1000 entries is rejected when |
| 302 | +only 100 wire bytes remain, even with no explicit cap set. |
| 303 | + |
| 304 | +--- |
| 305 | + |
| 306 | +## Picking a path |
| 307 | + |
| 308 | +``` |
| 309 | + unknown size? |
| 310 | + | |
| 311 | + +----------+----------+ |
| 312 | + | | |
| 313 | + yes no |
| 314 | + | | |
| 315 | + Marshal/Unmarshal MarshalBinInto |
| 316 | + (simplest, 1 alloc) (zero alloc) |
| 317 | + | |
| 318 | + know the exact field |
| 319 | + layout + want no error |
| 320 | + returns? |
| 321 | + | |
| 322 | + +-----+-----+ |
| 323 | + | | |
| 324 | + no yes |
| 325 | + | | |
| 326 | + (stay with Cursor |
| 327 | + Marshal) (6-12x faster, |
| 328 | + panics on OOB) |
| 329 | +``` |
| 330 | + |
| 331 | +Separate decision for mutation: |
| 332 | + |
| 333 | +``` |
| 334 | +need to patch |
| 335 | +bytes in place? |
| 336 | + | |
| 337 | + |---> whole struct fits a POD shape? -> ViewAs[T] |
| 338 | + | (+ AssertPOD[T] in init) |
| 339 | + | |
| 340 | + |---> mixed-width packed wire? -> Cursor at known offsets |
| 341 | + | + SetPos() for back-patch |
| 342 | + | |
| 343 | + +---> decoding and re-encoding -> Marshal/Unmarshal round trip |
| 344 | + is acceptable (~100-200 ns overhead) |
| 345 | +``` |
| 346 | + |
| 347 | +--- |
| 348 | + |
| 349 | +## Thread safety |
| 350 | + |
| 351 | +`Encoder`, `Decoder`, and `Cursor` are not safe for concurrent use. |
| 352 | +The top-level `Marshal*` / `Unmarshal*` helpers are safe to call from |
| 353 | +multiple goroutines because they acquire their own pooled |
| 354 | +Encoder/Decoder for each call. |
| 355 | + |
| 356 | +--- |
| 357 | + |
| 358 | +## Reference |
| 359 | + |
| 360 | +- Package docs: `go doc github.com/gagliardetto/solana-go/binary` |
| 361 | +- Benchmarks: [BENCH.md](BENCH.md) |
| 362 | +- Upstream (before vendoring): [github.com/gagliardetto/binary](https://github.com/gagliardetto/binary) |
0 commit comments