You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/bricks.rst
+3-2
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,7 @@ In cases where ``libmagic`` is not available, filetype detection will fall back
30
30
As shown in the examples below, the ``partition`` function accepts both filenames and file-like objects as input.
31
31
``partition`` also has some optional kwargs.
32
32
For example, if you set ``include_page_breaks=True``, the output will include ``PageBreak`` elements if the filetype supports it.
33
+
Additionally you can bypass the filetype detection logic with the optional ``content_type`` argument which may be specified with either the ``filename`` or file-like object, ``file``.
33
34
You can find a full listing of optional kwargs in the documentation below.
34
35
35
36
.. code:: python
@@ -38,7 +39,7 @@ You can find a full listing of optional kwargs in the documentation below.
elements = partition(filename=filename, content_type="application/pdf")
42
43
print("\n\n".join([str(el) for el in elements][:10]))
43
44
44
45
@@ -57,7 +58,7 @@ The ``unstructured`` library also includes partitioning bricks targeted at speci
57
58
The ``partition`` brick uses these document-specific partitioning bricks under the hood.
58
59
There are a few reasons you may want to use a document-specific partitioning brick instead of ``partition``:
59
60
60
-
* If you already know the document type, filetype detection is unnecessary. Using the document-specific brick directly will make your program run faster.
61
+
* If you already know the document type, filetype detection is unnecessary. Using the document-specific brick directly, or passing in the ``content_type`` will make your program run faster.
61
62
* Fewer dependencies. You don't need to install ``libmagic`` for filetype detection if you're only using document-specific bricks.
62
63
* Additional features. The API for partition is the least common denominator for all document types. Certain document-specific brick include extra features that you may want to take advantage of. For example, ``partition_html`` allows you to pass in a URL so you don't have to store the ``.html`` file locally. See the documentation below learn about the options available in each partitioning brick.
0 commit comments