@VikParuchuri Hi, Thanks for sharing your great work. I wonder if "pdftext" can also extract words like ["pymupdf"](https://pymupdf.readthedocs.io/en/latest/app1.html#words) It looks a little bit difference between "span" and "words"