Skip to content

Conversation

@danielsimonjr
Copy link

Problem

On Windows systems using cp1252 encoding, memvid fails with UnicodeEncodeError when printing debug messages containing emoji characters:

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f41b' in position 0: character maps to <undefined>

This prevents video encoding from completing successfully on Windows.

Solution

Removed all emoji characters (🐛, 🎬, 🎥) and Unicode arrows (→) from print statements in:

  • encoder.py (4 locations)
  • docker_manager.py (6 locations)

Replaced with ASCII-safe alternatives:

  • 🐛 → (removed)
  • 🎬, 🎥 → (removed)
  • ->

Testing

  • ✅ Tested on Windows 11 with Python 3.14
  • ✅ Video encoding now completes without Unicode errors
  • ✅ Added test_unicode_fix.py to verify fix
  • ✅ All functionality preserved, only print output changed

Impact

This is a non-breaking change that only affects debug output. Makes memvid fully compatible with Windows console encoding.

danielsimonjr and others added 2 commits November 7, 2025 10:29
Changes:
- Remove numpy <2.0.0 constraint
- Bump version to 0.1.4
- Update repository URL to danielsimonjr fork
- Tested with numpy 2.3.4 (Python 3.14)

Test Results:
✓ MemvidEncoder initializes successfully
✓ Numpy array operations work correctly
✓ No deprecated numpy type aliases found

This resolves compatibility issues with Python 3.14 which
requires numpy>=2.0.0 for optimal performance.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Removed emoji characters (🐛, 🎬, 🎥) and Unicode arrows (→) from print
statements in encoder.py and docker_manager.py that caused UnicodeEncodeError
on Windows systems using cp1252 encoding.

Changes:
- encoder.py: Replaced emoji and arrow characters with ASCII equivalents
- docker_manager.py: Replaced emoji and arrow characters with ASCII equivalents
- Added test_unicode_fix.py to verify the fix works on Windows

This resolves the issue where memvid archival would fail on Windows with:
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f41b'

Tested on Windows 11 with Python 3.14.
Copilot AI review requested due to automatic review settings November 8, 2025 11:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses Unicode encoding issues on Windows by removing emoji characters from print statements and adding a test to verify the fix. The version is bumped to 0.1.4 and the repository URL is updated.

Key changes:

  • Removes Unicode emoji characters (🐛, 🎬, 🎥, →) from debug print statements in encoder and docker manager
  • Adds a test file to verify Unicode encoding works without errors on Windows
  • Updates numpy dependency to remove upper version constraint

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
test_unicode_fix.py New test file to verify Unicode encoding works without errors
setup.py Version bump to 0.1.4, repository URL update, and numpy dependency relaxation
memvid/encoder.py Removes Unicode emoji characters from print statements
memvid/docker_manager.py Removes Unicode emoji characters from print statements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/olow304/memvid",
url="https://github.com/danielsimonjr/memvid",
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository URL has been changed from 'olow304' to 'danielsimonjr', but other references to 'olow304' remain throughout the codebase (in README.md and CONTRIBUTING.md). This inconsistency could cause confusion. Ensure all repository references are updated to maintain consistency across the project.

Copilot uses AI. Check for mistakes.
danielsimonjr and others added 4 commits November 9, 2025 11:55
Performance optimization:
- Changed VIDEO_CODEC from 'h265' to 'mp4v'
- MP4V uses OpenCV directly (no FFmpeg subprocess)
- Results in faster encoding, especially for small files

Note: Parallel QR generation was attempted but failed due to
ThreadPoolExecutor import context issues with relative imports.

Test results: 2.28s for 2 chunks (acceptable performance)

🤖 Generated with Claude Code
Changes:
- Added generate_qr_frame_worker() to utils.py for multiprocessing
- Added enable_parallel and parallel_threshold parameters to build_video()
- Default threshold: 200 chunks (only enables for very large files)
- Default mode: Serial (fast, no overhead)

Usage for large files (500+ chunks):
encoder.build_video(video_path, index_path,
                   enable_parallel=True,
                   parallel_threshold=200)

WARNING: Windows multiprocessing has ~120s startup overhead.
Only beneficial for extremely large files (500+ chunks).

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Changes:
- Added _should_use_parallel() method for smart detection
- Default enable_parallel='auto' (was False)
- Auto threshold: 500 chunks (overcomes 120s Windows overhead)
- Manual overrides still work: True (force on), False (force off)

Auto detection logic:
- < 500 chunks: Serial mode (fast, no overhead)
- >= 500 chunks: Parallel mode (worth the startup cost)

Usage:
# Auto mode (default - smart detection)
encoder.build_video(video_path, index_path)

# Force parallel for 300+ chunks
encoder.build_video(video_path, index_path,
                   enable_parallel=True, parallel_threshold=300)

# Force serial (disable parallel completely)
encoder.build_video(video_path, index_path, enable_parallel=False)

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Created CHANGELOG.md documenting MP4V codec optimization and parallel processing features
- Removed all test artifacts (bench_*, test_*, mem.mp4) from root directory
- Performance improvements documented:
  - MP4V codec: 1.84x faster than H.265/HEVC
  - Automatic parallel processing for large datasets (500+ chunks)
  - Parallel speedup: 1.69x for 8000+ chunks (40.8% faster)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant