Description
The SharingTables structure is used to compress the terms and types that appear in a theory's signature. The compression ensures that repeated terms and types are only stored once.
This compression is done within a theory's signature, but there's no attempt to handle cases where a term or type is repeated from an ancestor theory. I think that quite a bit of space and time could be saved here.
I came across this while investigating the high memory consumption of some CakeML tasks. I discovered that, after loading various theories, a PolyML.shareCommonData could compress PolyML's heap by 80%, sometimes close to 90%. The effect was substantial (> 70%) even if shareCommonData was limited to the conclusion terms of the theorems fetched by DB.listDB ().
This indicates that there's a fair amount of room to improve the sharing, at least for CakeML's use case. I suspect that other HOL theories would benefit from a more global sharing approach, and the .dat files might shrink quite a bit, although the export process would slow down as more of the world would have to be put in a canonisation table.