Skip to content

Commit 4cf9e06

Browse files
zireclaude
andcommitted
Fix filename too long error for CJK titles in conversions
_safe_filename() truncated at 100 characters, but CJK chars are 3 bytes each in UTF-8, producing ~300-byte filenames that exceed Linux's 255-byte limit. Now truncates by UTF-8 byte length (220 bytes) with safe multi-byte boundary handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 711d022 commit 4cf9e06

1 file changed

Lines changed: 13 additions & 5 deletions

File tree

backend/converter.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -790,11 +790,19 @@ def _download_image(self, url: str, referer: str = "") -> bytes | None:
790790
return None
791791

792792
def _safe_filename(self, title: str) -> str:
793-
"""Convert title to safe filename."""
793+
"""Convert title to safe filename.
794+
795+
Truncates by byte length (not character count) to stay under the
796+
255-byte Linux filename limit. Leaves room for timestamp suffix
797+
and extension (e.g. '_20260329_223621.epub' = 25 bytes).
798+
"""
794799
# Remove or replace unsafe characters
795-
safe = re.sub(r'[<>:"/\\|?*]', "", title)
800+
safe = re.sub(r'[<>:"/\\|?*\n\r]', "", title)
796801
safe = safe.strip()
797-
# Limit length
798-
if len(safe) > 100:
799-
safe = safe[:100]
802+
# Truncate to fit within 255-byte filename limit
803+
# Reserve 30 bytes for timestamp suffix + extension
804+
max_bytes = 220
805+
encoded = safe.encode("utf-8")
806+
if len(encoded) > max_bytes:
807+
safe = encoded[:max_bytes].decode("utf-8", errors="ignore").rstrip()
800808
return safe or "article"

0 commit comments

Comments
 (0)