Skip to content

Very poor compression performance #85

@andybbruno

Description

@andybbruno

I tested Memvid with three text files totaling 94 KB:

Image

After processing, the outputs were:

  • docs.mp4 (video file)
  • docs_index.json (JSON index)

Together, these files are roughly 2 MB, which is about 20× larger than the original text.

See also issue #49


Why the JSON Index Exists

The JSON file is required because video compression codecs are typically lossy, which risks corrupting the information embedded in QR codes. The JSON index serves as a fallback to preserve exact chunk data.

See issue #39 for a deeper analysis.


Problem with Current Approach

Encoding via a video compression pipeline introduces:

  • Significant processing overhead
  • Substantial storage inflation compared to traditional compression

Also discussed in issue #63


Strong Recommendation

Instead of relying on a video-based workflow, just use any general-purpose LZ compressor (e.g., gzip, zstd, LZMA).

This would:

  • Eliminate the need for lossy video encoding and JSON fallback
  • Achieve far better compression ratios
  • Reduce both processing complexity and storage footprint

In short: an LZ compressor directly on the text or embeddings would be vastly more efficient than the current pipeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions