-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
bugSomething isn't workingSomething isn't working
Description
This was on a Linux system, and the "A~-" was an "Ö".
- Fix
Ã.problem above - Fix
LookupError: unknown encoding: EUC-TWproblem
For plain text files it would be best to
-
Review CLI
-
cli.py(esp.process_dir) -
ocrd_cli.py- any plain text files supported here? -
cli_line_dirs.py -
cli_summarize.py?
-
-
add
--plain-encodingoption so users have the chance to give it manually -
Fall back to detecting
-
while warning about the auto detecting
-
What about the BOM now?
- Do we have a test that checks if files with BOM are read correctly?
Later
- Autodetect over all files
- falling back to UTF-8 if the detected charset is way out there/unknown like
EUC-TW
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
