wordvec

A thin and small vector that can fit data into a single usize.

Memory layout

WordVec<T, N> has different layouts for small (inlined) and large (heap) lengths. Inlined layout is used when the length is less than or equal to N, while heap layout is used when the length exceeds N. The type is a union of either inlined layout or heap layout.

Inlined layout

WordVec can store up to N (generic const) items on the stack, where N <= 127:

┌──────────┬────────────────────────┐
│ 1|len<<1 │   Elements (T × len)   │
└──────────┴────────────────────────┘

The length is stored in the 7 more significant bits of the first byte. The least significant bit is always set.

If T has an alignment greater than 1, there would be align_of::<T>() - 1 padding bytes between the length and the first element.

Heap layout

In the heap layout, WordVec is effectively a thin pointer to the following structure on the heap:

┌────────┬──────────┬────────────────────────┐
│ length │ capacity │   Elements (T × cap)   │
└────────┴──────────┴────────────────────────┘

Since length and capacity are usizes, the thin pointer is always a multiple of align_of::<usize>(). Thus, the least significant bit of the thin pointer is always 0, which distinguishes it from the inlined layout.

When to use

Although the technical limit is N <= 127, it is not meaningful to set N such that align_of::<T>() + N * size_of::<T>() exceeds 24; WordVec has no advantage over SmallVec if it cannot pack into a smaller struct.

Thin vectors are significantly (several times) slower than conventional vectors since reading the length and capacity usually involves accessing memory out of active cache. Thus, heap layout is supposed to be the cold path. In other words, WordVec is basically "length should never exceed N, but behavior is still correct when it exceeds".

Since the length encoding in the inlined layout is indirect (involves a bitshift), raw inlined access also tends to be slower in WordVec compared to SmallVec, as a tradeoff of reduced memory footprint of each vector alone. This may get handy in scenarios with a large array of small vectors, e.g. as an ECS component.

Platform requirements

Targets violating the following requirements will lead to compile error:

Little-endian only
align_of::<usize>() must be at least 2 bytes.

Benchmarks

Full criterion benchmark report generated from GitHub CI can be found on GitHub pages. Note that GitHub CI runners are subject to many uncontrolled noise sources and may not be very reliable. You may reproduce the benchmarks yourself by running cargo bench --bench criterion, or check the valgrind-based analysis instead.

The benchmarks compare std::vec,

The general observation is that WordVec performance is comparable to SmallVec, but:

is slower with operations on a single small vector (presumably due to bitshifting the length byte)
is sometimes slower with operations on large vectors due to thinness (reading/writing length/capacity from heap)
is sometimes faster with operations on many small vectors due to more efficient memory (fewer RAM accesses)

Vec feature parity

WordVec is a new project to experiment on new semantics. Currently only the basic features required to produce meaningful benchmarks are implemented, but all features from std::vec shall be implemented in this library eventually. Pull requests are welcome to align WordVec functionality with std::vec or smallvec; I do not have bandwidth to implement all those functions but I am happy to review such contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
benches		benches
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

wordvec

Memory layout

Inlined layout

Heap layout

When to use

Platform requirements

Benchmarks

Vec feature parity

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

SOF3/wordvec

Folders and files

Latest commit

History

Repository files navigation

wordvec

Memory layout

Inlined layout

Heap layout

When to use

Platform requirements

Benchmarks

Vec feature parity

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages