Skip to content

feat(latex): add optional Tectonic TikZ rendering#3369

Merged
dolfim-ibm merged 16 commits into
docling-project:mainfrom
adityasasidhar:main
May 15, 2026
Merged

feat(latex): add optional Tectonic TikZ rendering#3369
dolfim-ibm merged 16 commits into
docling-project:mainfrom
adityasasidhar:main

Conversation

@adityasasidhar

@adityasasidhar adityasasidhar commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Resolves #3302

Description

Added an optional TikZ rendering path for the LaTeX backend using Tectonic, with configurable flags and safe fallbacks.

Here's how it works:

  1. A user passes a LaTeX source containing a tikzpicture environment.
  2. The user enables the TikZ engine option with --tikz-engine. ( tectonic automatically installs the required packages only )
  3. The LaTeX backend detects tikzpicture blocks and captures them atomically.
  4. A Tectonic render task is scheduled asynchronously while the rest of the document continues processing.
  5. The renderer builds a standalone LaTeX document from:
    • the extracted TikZ source
    • the extracted LaTeX preamble
  6. If the source document is file-backed, explicitly referenced local TikZ dependencies are staged into the temporary render directory:
    • \input
    • \include
    • \includegraphics
  7. Tectonic compiles the staged standalone document.
  8. If compilation succeeds, the generated PDF is rasterized into an image and attached to the PictureItem.
  9. If compilation fails, times out, produces no PDF, or rasterization fails, the system falls back to storing the original TikZ source as PictureMeta.code.

Configuration added

  1. --tikz-engine / -T: Enables optional TikZ rendering with Tectonic.
  2. --no-tikz-engine-download: Disables automatic Tectonic download if no binary is present.
  3. --tikz-engine-timeout: Sets the timeout for rendering a single TikZ diagram.
  4. --tikz-shell-escape: Explicitly enables shell escape during Tectonic compilation. This is optional and remains disabled by default.

I ran testing on 10's of files, but testing on a corpus would be great to capture all kinds of edge cases that could creep in.

Further instead of installing tectonic from the default curl script could pose a threat, so we can choose an appropriate version and store it in the docling hugging face repo where it can simply send a curl request from a safe and known source.

currently we use this

curl --proto '=https' --tlsv1.2 -fsSL https://drop-sh.fullyjustified.net | sh

It could be a hf curl request. Currently I have not added installation support for windows, this should be taken into account for the next commit.

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

This commit introduces a high-performance, asynchronous pipeline for rendering
TikZ diagrams into images during LaTeX document conversion.

Key Changes:
- Tectonic Integration (`TectonicEngine`): Compiles `tikzpicture` environments
  into PDFs using Tectonic, auto-downloading the binary if missing. Rasterizes
  the PDF to 300 DPI images.
- Asynchronous Processing: Utilizes a dynamic `ThreadPoolExecutor` (scaled to
  `os.cpu_count() - 1`) to render multiple diagrams concurrently without
  blocking the main document conversion pipeline.
- Preamble Extraction: Dynamically parses the main document's preamble and
  injects it into standalone diagrams to ensure compatibility with complex
  libraries (e.g., `pgfgantt`, `tikz-cd`, `tkz-euclide`).
- Graceful Fallbacks: If Tectonic compilation fails due to LaTeX syntax errors
  or incompatible packages, the engine gracefully falls back to preserving the
  raw TikZ source code as a `CodeMetaField` to prevent data loss.
- CLI Support: Added `--tikz-engine tectonic` option to enable the backend
  configuration.

Resolves pre-commit hooks (MyPy, Ruff linter/formatter).

Signed-off-by: Aditya Sasidhar <arctic@arctic>
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
…ndency staging

  Add opt-in TikZ image rendering for the LaTeX backend using Tectonic,
  while preserving stable fallback behavior when rendering fails.

  What this changes:
  - add optional `tikz_engine="tectonic"` backend support for TikZ diagrams
  - render `tikzpicture` environments asynchronously during LaTeX parsing
  - preserve raw TikZ code as `PictureMeta.code` whenever rendering fails,
    times out, or rasterization cannot complete
  - add Tectonic engine options for:
    - automatic binary download
    - per-diagram timeout
    - shell escape control
  - make shell escape explicit opt-in via CLI/backend config
  - sanitize known pdfTeX-only assignment lines in preambles for better
    Tectonic/XeTeX compatibility
  - restore file-backed relative TikZ compatibility by staging only explicit
    local dependencies (`\input`, `\include`, `\includegraphics`) into the
    temp render directory
  - block dependency path traversal and avoid ambient source-directory search
  - rasterize generated PDFs with locking and crop whitespace from output

  CLI / config updates:
  - add `--tikz-engine` / `-T`
  - add `--no-tikz-engine-download`
  - add `--tikz-engine-timeout`
  - add `--tikz-shell-escape`

  Tests:
  - add focused Tectonic engine tests for download behavior, timeout,
    preamble sanitization, shell escape toggling, dependency staging,
    and path traversal blocking
  - add backend tests for TikZ fallback behavior and file-backed source-root
    handling

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
@github-actions

github-actions Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @adityasasidhar, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@adityasasidhar adityasasidhar changed the title feat(latex): add optional Tectonic TikZ rendering feat(latex): add optional Tectonic TikZ rendering Apr 27, 2026
@dosubot

dosubot Bot commented Apr 27, 2026

Copy link
Copy Markdown

Documentation Updates

1 document(s) were updated by changes in this PR:

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?
View Changes
@@ -247,11 +247,15 @@
 - **Pipeline/Backend**: `SimplePipeline` + `LatexDocumentBackend`
 - **Key Options** (`LatexBackendOptions`):
     - `parse_timeout` (default: 30.0 seconds): Maximum time allowed for parsing a LaTeX document. Set to `None` to disable the timeout. This prevents `pylatexenc` from spinning indefinitely when parsing legacy arXiv documents with complex or malformed macroscopic environments. If parsing exceeds this timeout, the conversion will fall back to raw text extraction rather than structured parsing. A warning will be logged when a timeout occurs.
+    - `tikz_engine` (Optional[Literal["tectonic"]]): The engine to use for rendering TikZ diagrams into images. Set to `'tectonic'` to enable asynchronous image generation. Defaults to `None`.
+    - `tikz_engine_timeout` (float, default: 60.0): The timeout in seconds for rendering a single TikZ diagram.
+    - `tikz_engine_allow_shell_escape` (bool, default: False): Allow Tectonic TikZ rendering to enable shell escape during compilation. Disabled by default for safer rendering of untrusted LaTeX.
 - **Processing**:
     - Parses LaTeX source using `pylatexenc` to extract structured content (sections, equations, tables, etc.)
     - Pre-processes custom macros (e.g., `\be`/`\ee` shortcuts for equations)
     - Timeout enforcement runs parsing in a daemon thread to allow graceful fallback on timeout
-- **Notes**: The `parse_timeout` option is particularly useful for processing legacy arXiv documents that may contain complex or malformed macro environments. To configure the timeout:
+    - **TikZ Rendering**: When `tikz_engine` is set to `'tectonic'`, the backend detects `tikzpicture` environments and renders them asynchronously into images. When Tectonic compilation succeeds, the TikZ diagram is rasterized and stored as an image. When compilation fails, times out, produces no PDF, or rasterization fails, Docling preserves the original TikZ source as fallback code metadata.
+- **Notes**: The `parse_timeout` option is particularly useful for processing legacy arXiv documents that may contain complex or malformed macro environments. CLI flags are available for TikZ rendering: `--tikz-engine` / `-T`, `--tikz-engine-timeout`, and `--tikz-shell-escape`. To configure the timeout:
 
 ```python
 from docling.datamodel.backend_options import LatexBackendOptions

How did I do? Any feedback?  Join Discord

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
@codecov

codecov Bot commented May 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.17021% with 56 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/backend/latex/engines/tectonic.py 70.94% 52 Missing ⚠️
docling/backend/latex/handlers/environments.py 87.09% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread docling/backend/latex/handlers/environments.py Outdated
Comment thread docling/backend/latex/engines/tectonic.py Outdated
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
@adityasasidhar adityasasidhar requested a review from dolfim-ibm May 7, 2026 14:49
PeterStaar-IBM
PeterStaar-IBM previously approved these changes May 12, 2026

@PeterStaar-IBM PeterStaar-IBM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@adityasasidhar

Copy link
Copy Markdown
Contributor Author

@PeterStaar-IBM

Merged the main branch please re run the CI tests

@PeterStaar-IBM

Copy link
Copy Markdown
Member

@adityasasidhar Seems like fail here,

  Files exceeding 1000 line limit:
    FAIL: docling/cli/main.py has 1030 lines

I think to add these very specific tikz commands to the main docling cli might be a bit overkill. I wonder if it is not better to have a dedicated docling-latex one, where we can add something more specific. I know it is nice to have, but the current cli is already very heavy.

@cau-git @dolfim-ibm feel free to pitch in

@dolfim-ibm

Copy link
Copy Markdown
Member

@adityasasidhar Seems like fail here,

  Files exceeding 1000 line limit:
    FAIL: docling/cli/main.py has 1030 lines

I think to add these very specific tikz commands to the main docling cli might be a bit overkill. I wonder if it is not better to have a dedicated docling-latex one, where we can add something more specific. I know it is nice to have, but the current cli is already very heavy.

@cau-git @dolfim-ibm feel free to pitch in

If I get it right, we are talking about 3 arguments. We indeed don't need all options in the CLI, we could have clean examples for the latex options.
On the other hand, if we agree on exposing them in the CLI, I would prepend with latex_ or tex_. I'm pretty sure that 90% of the Docling users don't know what tikz is 😉

@adityasasidhar

Copy link
Copy Markdown
Contributor Author

@adityasasidhar Seems like fail here,

  Files exceeding 1000 line limit:
    FAIL: docling/cli/main.py has 1030 lines

I think to add these very specific tikz commands to the main docling cli might be a bit overkill. I wonder if it is not better to have a dedicated docling-latex one, where we can add something more specific. I know it is nice to have, but the current cli is already very heavy.

@cau-git @dolfim-ibm feel free to pitch in

hey @PeterStaar-IBM @dolfim-ibm

I agree with you, certainly exceeding the 1000 line limit on the cli/main.py file adds overhead....

also true on adding very specific latex commands like, most people using docling won't know what tikz is...

specifically those are:

  1. --tikz-engine or -T:  Enables TikZ rendering with Tectonic
  2. --tikz-shell-escape: Enables shell escape during Tectonic rendering
  3. --tikz-engine-timeout: Sets the TikZ render timeout for each diagram

I think skipping them in the cli would probably be a better choice:

  1. most people people are not looking for such specific niche features
  2. the people who are looking out could probably just use the python API
  3. Plus people who want the features in the CLI could simply benefit from a dedicated docling-latex, which could be introduced if there is any demand for it ( shouldn't be a lot of work, easy couple of commits )

I'm however good to go in any direction

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
@adityasasidhar

Copy link
Copy Markdown
Contributor Author

@PeterStaar-IBM @dolfim-ibm

Just pushed the latest changes

Apologies for the delay.

The latest changes include:

  1. getting rid of the latex specific flags and keeping the cli/main.py under the 1000 line limit
  2. exposes the options only through the python api
  3. passes all the linters and checkers

@PeterStaar-IBM

Copy link
Copy Markdown
Member

@PeterStaar-IBM @dolfim-ibm

Just pushed the latest changes

Apologies for the delay.

The latest changes include:

  1. getting rid of the latex specific flags and keeping the cli/main.py under the 1000 line limit
  2. exposes the options only through the python api
  3. passes all the linters and checkers

all good, let's let the CI do its thing now!

@adityasasidhar

Copy link
Copy Markdown
Contributor Author

@PeterStaar-IBM @dolfim-ibm
Just pushed the latest changes
Apologies for the delay.
The latest changes include:

  1. getting rid of the latex specific flags and keeping the cli/main.py under the 1000 line limit
  2. exposes the options only through the python api
  3. passes all the linters and checkers

all good, let's let the CI do its thing now!

yessss lets go

@PeterStaar-IBM PeterStaar-IBM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@PeterStaar-IBM

Copy link
Copy Markdown
Member

@dolfim-ibm I think this looks good to merge now

@dolfim-ibm dolfim-ibm merged commit eceedc2 into docling-project:main May 15, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an optional latex compiler engine for tikz block rendering

3 participants