Skip to content

firdaus-aziz/fix-encoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fix-encoding

Fix UTF-8 encoding issues (mojibake) in markdown documentation files.

Features

  • Automatic mojibake repair using ftfy
  • Unicode to ASCII conversion for symbols, emoji, and smart quotes
  • Cross-platform - works on macOS, Linux, and Windows
  • Zero configuration - just run it
  • Safe - use --dry-run to preview changes

Use Case: Claude.ai Artifact Downloads

This tool was created to solve a specific problem with Claude.ai artifacts.

The Problem:

When downloading markdown/text artifacts from Claude.ai using Firefox, UTF-8 characters often get double-encoded, resulting in mojibake (garbled text).

  1. Claude.ai generates UTF-8 text with special characters
  2. Firefox downloads with incorrect encoding handling
  3. File contains corrupted characters like âœ" instead of

This tool fixes these issues automatically, but works for any UTF-8 corruption scenario.

Examples

Before (corrupted):

âœ" Task complete
can’t find file
Price: $50 â€" $100
â†' Next step

After (fixed):

[done] Task complete
can't find file
Price: $50 -- $100
-> Next step

Common Corruption Patterns

Corrupted Original Cause
âœ" (checkmark) UTF-8 read as Latin-1
’ ' (smart quote) Smart quote corruption
â€" (em dash) Em dash corruption
â†' (arrow) Arrow corruption
é é (accented e) Accent corruption
✅ (emoji) Emoji corruption

Installation

Quick Start (recommended)

No installation required. Just run with uv:

uv run fix-encoding.py

From Source

git clone https://github.com/firdaus-aziz/fix-encoding.git
cd fix-encoding
pip install -e .

From PyPI (coming soon)

pip install fix-encoding

Usage

# Fix all markdown files in current directory
uv run fix-encoding.py

# Fix specific directory
uv run fix-encoding.py ./docs

# Preview changes without modifying files
uv run fix-encoding.py --dry-run

# Verbose output
uv run fix-encoding.py -v

What it fixes

Mojibake (encoding corruption)

Corrupted Fixed
’ '
â€" -
é é
✅ Yes

Unicode symbols to ASCII

Unicode ASCII
Yes [done]
No
-> <- ^ v
*
-- -
' ' " " ' ' " "
>= <=
× x

Requirements

  • Python 3.9+
  • ftfy (automatically installed when using uv run)

License

MIT

About

Fix UTF-8 encoding issues (mojibake) in markdown files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages