(Entry is entirely public, so any changes would have to be a breaking change)
this is a follow-up to:
just the size_of::<rc_zip::parse::Entry>() alone is 144 which is pretty large when there are very very many of them. there are quite a few ways that it can be pruned to be a bit smaller, so let's have a look at things (by my best guessing)
struct Entry {
// beeg
name: String, // <---\
comment: String, // <-x- 24 bytes each
// medium
modified: DateTime<Utc>, // <-------\
created: Option<DateTime<Utc>>, // <-x
accessed: Option<DateTime<Utc>>, // <-x- 12 bytes each
uid: Option<u32>, // <--x- 8 bytes each (probably bc alignment? 🥴)
gid: Option<u32>, // <-/
// smol
method: Method, // <---------\
reader_version: Version, // <-x- 2 bytes (by itself 4, but variants are stored in niches)
// fixed, no realistic gains to be had
compressed_size: u64, // <--\
uncompressed_size: u64, // <-x- 8 bytes each
header_offset: u64, // <----/
mode: Mode, // <--x- 4 bytes each
crc32: u32, // <-/
flags: u16, // 2 bytes
}
in short that's
- beeg: 48 bytes (!!)
- medium: 52 bytes
- smol: 4 bytes
- fixed: 34 bytes
which comes out to a total of 138. if you add in the whatever bit i missed and round up for alignment then it looks like things make sense
to get the simple ones out of the way:
- fixed - what can ya do 🤷
- medium - all of types using
DateTime. it looks like internally this is 2 u32s and one NonZeroI32 which provides the niche for Option. the uid and gid can potentially be packed down a bit
- smol - same deal as
uid and gid although the best you can expect are some modest gains
beeg
sooo that leaves the two Strings for name and comment taking up 48 bytes. both of those could have the internals hidden away in a new-type that expose a &str to give more freedom to change things down the line. there are a lot of options
Method 1: Box<str>
who needs the capacity anyways. it would drop 8 bytes and works for both the name and comment
Method 2: comment is almost always empty
beyond a Box<str> in theory the length could be stored in the start of the allocation. with that you can get away with a single pointer where null is empty and non-null can be used to fetch the length and construct the &str. that would shave off 16 bytes in total
Method 3: name is like... never empty
but it's often very short. it could be a good candidate for small-string-optimization which could allow avoiding extra allocations on pretty common Entrys. a lot of rust crates use take advantage of invalid-utf to store >16 bytes inline, but we would probably want one that can be created from some kind of inline Vec<u8> which can't take advantage of that trick (and #148 exploits being able to convert the Vec<u8> form directly to the String form, so something similar would be ideal)
considering that this would likely involve pulling in some third-party crates it can be feature-gated off to be something simple like Box<str> when disabled
(
Entryis entirely public, so any changes would have to be a breaking change)this is a follow-up to:
just the
size_of::<rc_zip::parse::Entry>()alone is 144 which is pretty large when there are very very many of them. there are quite a few ways that it can be pruned to be a bit smaller, so let's have a look at things (by my best guessing)in short that's
which comes out to a total of 138. if you add in the whatever bit i missed and round up for alignment then it looks like things make sense
to get the simple ones out of the way:
DateTime. it looks like internally this is 2u32s and oneNonZeroI32which provides the niche forOption. theuidandgidcan potentially be packed down a bituidandgidalthough the best you can expect are some modest gainsbeeg
sooo that leaves the two
Strings fornameandcommenttaking up 48 bytes. both of those could have the internals hidden away in a new-type that expose a&strto give more freedom to change things down the line. there are a lot of optionsMethod 1:
Box<str>who needs the
capacityanyways. it would drop 8 bytes and works for both thenameandcommentMethod 2:
commentis almost always emptybeyond a
Box<str>in theory the length could be stored in the start of the allocation. with that you can get away with a single pointer where null is empty and non-null can be used to fetch the length and construct the&str. that would shave off 16 bytes in totalMethod 3:
nameis like... never emptybut it's often very short. it could be a good candidate for small-string-optimization which could allow avoiding extra allocations on pretty common
Entrys. a lot of rust crates use take advantage of invalid-utf to store >16 bytes inline, but we would probably want one that can be created from some kind of inlineVec<u8>which can't take advantage of that trick (and #148 exploits being able to convert theVec<u8>form directly to theStringform, so something similar would be ideal)considering that this would likely involve pulling in some third-party crates it can be feature-gated off to be something simple like
Box<str>when disabled