Skip to content

Commit c29ac33

Browse files
committed
docs: add info and links to similar libraries and dependencies of the project
1 parent c3419d4 commit c29ac33

File tree

1 file changed

+27
-6
lines changed

1 file changed

+27
-6
lines changed

README.md

+27-6
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,37 @@
22

33
Yet another library to extract text from MS Office (`docx`, `pptx`, `xlsx`) and PDF (`pdf`) files.
44

5-
## How this is different from other text extraction tools
5+
## Similar libraries
6+
7+
There are a other great libraries that do the same job and have inspired this project, such as:
8+
9+
- [`any-text`](https://github.com/abhinaba-ghosh/any-text)
10+
- [`officeparser`](https://github.com/harshankur/officeParser)
11+
- [`textract`](https://www.npmjs.com/package/textract)
12+
13+
### How this is different from other text extraction tools
614

715
- Parses file based on mime type, not file extension
816
- Does not spawn a child process to use a tool installed on the device
917
- Reads and returns text from file if it is a simple text file
1018

19+
## Libraries used
20+
21+
This module uses some amazing existing libraries that perform better than the ones that originally existed in this module, and are therefore used instead:
22+
23+
- [`pdf-parse`](https://www.npmjs.com/package/pdf-parse), for parsing PDF files
24+
- [`xlsx`](https://www.npmjs.com/package/xlsx), for parsing MS Excel files
25+
26+
A big thank you to the contributors of these projects.
27+
28+
This module also uses:
29+
30+
- [`xml2js`](https://www.npmjs.com/package/xml2js) - to convert the MS Office XML files into JSON
31+
- [`js-yaml`](https://www.npmjs.com/package/js-yaml) - to convert JSON into YAML
32+
- [`file-type`](https://www.npmjs.com/package/file-type) - to detect the mime type of files
33+
- [`decompress`](https://www.npmjs.com/package/decompress) - to unzip files
34+
- [`read-chunk`](https://www.npmjs.com/package/read-chunk) - to read chunks of data from large files
35+
1136
## Installation
1237

1338
To use this in an npm project, simply type in:
@@ -16,11 +41,7 @@ To use this in an npm project, simply type in:
1641
npm install office-text-extractor
1742
```
1843

19-
**Notes:**
20-
21-
- No support for browser environments yet. If you want to add support, please feel free to [open a pull request](https://github.com/gamemaker1/office-text-extractor/pulls).
22-
- To parse PDFs, this module uses the amazing `pdf-parse` npm package.
23-
- To parse Excel files, this module uses the amazing `xlsx` npm package.
44+
**There is no support for browser environments yet. If you want to add support, please feel free to [open a pull request](https://github.com/gamemaker1/office-text-extractor/pulls).**
2445

2546
## Usage
2647

0 commit comments

Comments
 (0)