-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Currently we have something like the following. Min sizes assume empty storage, averages are a best guess of the common case:
// We count the data behind Arcs as free because we have to store that information anyway
// Ignoring string: 24 local, 56ish remote
// String: 24 local, about 64 remote
type WordList = HashMap<String, Vec<Meta>>;
// 40 local
pub struct Meta {
stem: Arc<str>, // 16 local
source: Source, // 24 local
}
// 24 local
pub enum Source {
Affix(Arc<AfxRule>, usize), // 16, 32 pointee
Dict(Box[Arc<MorphInfo>]>), // 16 local, 24 pointee
Personal(Box<PersonalMeta>), // 8 local, 40 pointee
Raw,
}
// 40 local, extra meta in personal is uncommon
pub struct PersonalMeta {
friend: Option<Arc<str>>, // 16 local
morph: Vec<Arc<MorphInfo>>, // 24 local
}
// 24 local, ~8 pointee
pub enum MorphInfo {
Stem(MorphStr), /* ... */
}
// 32 local
pub struct AfxRule {
kind: RuleType,
can_combine: bool,
patterns: Vec<AfxRulePattern>,
}
// 88 local
pub struct AfxRulePattern {
affix: Box<str>,
condition: Option<ReWrapper>,
strip: Option<Arc<str>>,
morph_info: Vec<Arc<MorphInfo>>,
}That's really not terrible at ~80 bytes per entry for meta but I think we can simplify things, even outside of the storage reasons.
// Ignoring string: 24 local, 32ish remote
// String: 24 local, about 64 remote
type WordList = HashMap<String, Vec<Meta>>
// 16 local, 16 remote max
struct Meta(MetaInner);
enum MetaInner // 16 local
DictStem(Arc<str>),
DictMorph(Arc<MorphInfo>),
PersonalStem(Arc<str>),
PersonalFriend(Arc<str>),
AfxRule(Box<AfxMeta>),
Raw,
}
// 16 local
struct AfxMeta {
rule: Arc<AfxRule>,
pat_idx: usize
}This would mean more entries in a single vector rather than multiple entries in multiple vectors, and that's probably a good thing for various reasons. Having a flat structure rather than nested will probably make the CPU a bit happier too.
I would like to valgrind this all before actually doing the change, to get a good idea of how much we save.
Metadata
Metadata
Assignees
Labels
No labels