MJcdi is a barebones node.js CLI tool that converts Bangla text inside PDFs between Bijoy ANSI encoding and Unicode using greedy substitution and PDF text extraction.
⚠️ Status: Beta. Still in active development.
Currently designed as a bare minimal CLI requiring manual node.js installation.
- Convert digital Bijoy (ANSI) text in PDFs to Unicode
- Supports reverse*: Unicode → Bijoy (*festered with bugs and won't be worked on, cause MJcdi.)
- Greedy substitution with post-processing reordering
⚠️ Known issue: Repha (র্) in tri-, tetra- and penta-consonantal conjuncts. e.g.,- tri:
বজ্র্যinstead ofবর্জ্য - tetra:
আক্ষ্র্যinstead ofআর্ক্ষ্য - penta:
কাৎর্স্ন্য/কাত্স্ন্র্যinstead ofকার্ৎস্ন্য/কার্ত্স্ন্য
- tri:
- PDF input →
.txtoutput
- node.js (v14+ recommended)
npm(comes bundled with node.js)
-
Clone or download this repository:
git clone https://github.com/syzarn/mjcdi_beta cd mjcdi_beta
-
Install dependencies:
npm install
node mjcdi.js ansi2uni "input.pdf" "output.txt"
node mjcdi.js uni2ansi "input.pdf" "output.txt"- The output file name is optional.
- If not provided, it will automatically save as:
input.pdf → input.txt
- Fix repha (র্) related errors
- Add .txt and .docx input support
- .exe build
- GUI wrapper
This project is licensed under the GNU General Public License v3.0. You are free to use, modify, and distribute this tool under the terms of that license. [Note that this is not affiliated with, endorsed by (!), or sponsored by (!!) MJ.]