-
Notifications
You must be signed in to change notification settings - Fork 828
Description
I tested Memvid with three text files totaling 94 KB:

After processing, the outputs were:
docs.mp4
(video file)docs_index.json
(JSON index)
Together, these files are roughly 2 MB, which is about 20× larger than the original text.
See also issue #49
Why the JSON Index Exists
The JSON file is required because video compression codecs are typically lossy, which risks corrupting the information embedded in QR codes. The JSON index serves as a fallback to preserve exact chunk data.
See issue #39 for a deeper analysis.
Problem with Current Approach
Encoding via a video compression pipeline introduces:
- Significant processing overhead
- Substantial storage inflation compared to traditional compression
Also discussed in issue #63
Strong Recommendation
Instead of relying on a video-based workflow, just use any general-purpose LZ compressor (e.g., gzip
, zstd
, LZMA
).
This would:
- Eliminate the need for lossy video encoding and JSON fallback
- Achieve far better compression ratios
- Reduce both processing complexity and storage footprint
In short: an LZ compressor directly on the text or embeddings would be vastly more efficient than the current pipeline.