1- |PyPI version | |Build Status | |Coverage Status | |BCH compliance |
1+ `PyPI version <https://badge.fury.io/py/tika-app >`__ `Build
2+ Status <https://travis-ci.org/fedelemantuano/tika-app-python> `__
3+ `Coverage
4+ Status <https://coveralls.io/github/fedelemantuano/tika-app-python?branch=master> `__
5+ `BCH compliance <https://bettercodehub.com/ >`__
26
37tika-app-python
48===============
@@ -7,7 +11,10 @@ Overview
711--------
812
913tika-app-python is a wrapper for `Apache Tika
10- App <https://tika.apache.org/> `__.
14+ App <https://tika.apache.org/> `__. With this library you can analyze: -
15+ file on disk - payload in base64 - file object (like standard input)
16+
17+ To use file object function you should use Apache Tika version >= 1.17.
1118
1219Apache 2 Open Source License
1320~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -31,21 +38,21 @@ Clone repository
3138
3239::
3340
34- git clone https://github.com/fedelemantuano/tika-app-python.git
41+ git clone https://github.com/fedelemantuano/tika-app-python.git
3542
3643and install tika-app-python with ``setup.py ``:
3744
3845::
3946
40- cd tika-app-python
47+ cd tika-app-python
4148
42- python setup.py install
49+ python setup.py install
4350
4451or use ``pip ``:
4552
4653::
4754
48- pip install tika-app
55+ pip install tika-app
4956
5057Usage in a project
5158------------------
@@ -54,43 +61,52 @@ Import ``TikaApp`` class:
5461
5562::
5663
57- from tikapp import TikaApp
64+ from tikapp import TikaApp
5865
59- tika_client = TikaApp(file_jar="/opt/tika/tika-app-1.15 .jar")
66+ tika_client = TikaApp(file_jar="/opt/tika/tika-app-1.18 .jar")
6067
6168For get **content type **:
6269
6370::
6471
65- tika_client.detect_content_type("your_file")
72+ tika_client.detect_content_type("your_file")
6673
6774For detect **language **:
6875
6976::
7077
71- tika_client.detect_language("your_file")
78+ tika_client.detect_language("your_file")
7279
7380For detect **all metadata and content **:
7481
7582::
7683
77- tika_client.extract_all_content("your_file")
84+ tika_client.extract_all_content("your_file")
7885
7986For detect **only content **:
8087
8188::
8289
83- tika_client.extract_only_content("your_file")
90+ tika_client.extract_only_content("your_file")
8491
85- If you want to use payload in base64, you can use the same methods with
92+ You can analyze payload in base64 with the same methods, but passing
8693``payload `` argument:
8794
8895::
8996
90- tika_client.detect_content_type(payload="base64_payload")
91- tika_client.detect_language(payload="base64_payload")
92- tika_client.extract_all_content(payload="base64_payload")
93- tika_client.extract_only_content(payload="base64_payload")
97+ tika_client.detect_content_type(payload="base64_payload")
98+ tika_client.detect_language(payload="base64_payload")
99+ tika_client.extract_all_content(payload="base64_payload")
100+ tika_client.extract_only_content(payload="base64_payload")
101+
102+ or you can analyze file object (like standard input) with the same
103+ methods, but passing ``objectInput `` argument:
104+
105+ ::
106+
107+ tika_client.detect_language(objectInput="objectInput")
108+ tika_client.extract_all_content(objectInput="objectInput")
109+ tika_client.extract_only_content(objectInput="objectInput")
94110
95111Usage from command-line
96112-----------------------
@@ -107,29 +123,36 @@ These are all swithes:
107123
108124::
109125
110- usage: tikapp [-h] (-f FILE | -p PAYLOAD) [-j JAR] [-d] [-t] [-l] [-a ]
111- [-v]
126+ usage: tikapp [-h] (-f FILE | -p PAYLOAD | -k ) [-j JAR] [-d] [-t] [-l]
127+ [-a] [-v]
112128
113- Wrapper for Apache Tika App.
129+ Wrapper for Apache Tika App.
130+
131+ optional arguments:
132+ -h, --help show this help message and exit
133+ -f FILE, --file FILE File to submit (default: None)
134+ -p PAYLOAD, --payload PAYLOAD
135+ Base64 payload to submit (default: None)
136+ -k, --stdin Enable parsing from stdin (default: False)
137+ -j JAR, --jar JAR Apache Tika app JAR (default: None)
138+ -d, --detect Detect document type (default: False)
139+ -t, --text Output plain text content (default: False)
140+ -l, --language Output only language (default: False)
141+ -a, --all Output metadata and content from all embedded files
142+ (default: False)
143+ -v, --version show program's version number and exit
144+
145+ Example from file on disk:
146+
147+ .. code :: shell
114148
115- optional arguments:
116- -h, --help show this help message and exit
117- -f FILE, --file FILE File to submit (default: None)
118- -p PAYLOAD, --payload PAYLOAD
119- Base64 payload to submit (default: None)
120- -j JAR, --jar JAR Apache Tika app JAR (default: None)
121- -d, --detect Detect document type (default: False)
122- -t, --text Output plain text content (default: False)
123- -l, --language Output only language (default: False)
124- -a, --all Output metadata and content from all embedded files
125- (default: False)
126- -v, --version show program's version number and exit
149+ $ tikapp -f example_file -a
127150
128- Example:
151+ Example from standard input
129152
130153.. code :: shell
131154
132- $ tikapp -f example_file -a
155+ $ tikapp -a -k < example_file
133156
134157 Performance tests
135158-----------------
@@ -140,25 +163,16 @@ folder:
140163
141164::
142165
143- (Python 2)
144- tika_content_type() 0.704840 sec
145- tika_detect_language() 1.592066 sec
146- magic_content_type() 0.000215 sec
147- tika_extract_all_content() 0.816366 sec
148- tika_extract_only_content() 0.788667 sec
149-
150- (Python 3)
151- tika_content_type() 0.698357 sec
152- tika_detect_language() 1.593452 sec
153- magic_content_type() 0.000226 sec
154- tika_extract_all_content() 0.785915 sec
155- tika_extract_only_content() 0.766517 sec
156-
157- .. |PyPI version | image :: https://badge.fury.io/py/tika-app.svg
158- :target: https://badge.fury.io/py/tika-app
159- .. |Build Status | image :: https://travis-ci.org/fedelemantuano/tika-app-python.svg?branch=master
160- :target: https://travis-ci.org/fedelemantuano/tika-app-python
161- .. |Coverage Status | image :: https://coveralls.io/repos/github/fedelemantuano/tika-app-python/badge.svg?branch=master
162- :target: https://coveralls.io/github/fedelemantuano/tika-app-python?branch=master
163- .. |BCH compliance | image :: https://bettercodehub.com/edge/badge/fedelemantuano/tika-app-python?branch=develop
164- :target: https://bettercodehub.com/
166+ (Python 2)
167+ tika_content_type() 0.704840 sec
168+ tika_detect_language() 1.592066 sec
169+ magic_content_type() 0.000215 sec
170+ tika_extract_all_content() 0.816366 sec
171+ tika_extract_only_content() 0.788667 sec
172+
173+ (Python 3)
174+ tika_content_type() 0.698357 sec
175+ tika_detect_language() 1.593452 sec
176+ magic_content_type() 0.000226 sec
177+ tika_extract_all_content() 0.785915 sec
178+ tika_extract_only_content() 0.766517 sec
0 commit comments