-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Bug Report
Describe the bug
This is not necessarily an IPyPublish bug, but a limitation in the lstlisting LaTeX package causes pdf conversion to fail if unicode characters are used within an lstlisting environment. I stumbled upon this using the %timeit ipython magic in a code cell, as the output of %timeit includes unicode characters (the plus-minus sign, greek characters for second-prefixes, etc.)
To Reproduce
Steps to reproduce the behavior:
- Create a file called
example.Rmdwith the following contents
\```{python}
%timeit a = 2 + 2
\```
nbpublish -f latex_ipypublish_all.exec -pdf example.Rmd
Minimal Notebook Example
Same build instructions as above (with the different filename of course). Note that this issue is downstream in the build process (at the latex -> pdf step) so is insensitive to whether the input file is .Rmd, .ipynb, etc.
Expected Behaviour
Currently, the conversion fails with errors from pdflatex. The desired behavior is a successful build with unicode characters properly represented in lstlisting environments.
Runtime Information
(please complete the following information)
-
IPyPublish: 0.10.10
-
Python: 3.8.1
-
OS: Arch linux (5.5.2-arch1-1)
-
Pandoc: 2.8
-
(optional for pdf issues) texlive: 3.14159265
-
(optional for pdf issues) latexmk: 4.65
Additional context
The .log file provided by pdflatex is not particularly helpful as it makes it seem as though the problem is with the utf8x or ucs packages/options. After some digging, I was able to trace the problem back to a limitation with lstlisting. A simple procedure for confirming this:
- Open the
converted/timeit.texfile generated by thenbpublishprocess - Navigate to the
lstlistingenvironment around the output from the code cell - Comment out the
lstlistingenvironment - Build with
pdflatex:pdflatex timeit.tex
The build will complete without errors and the output from the code cell will be properly rendered, albeit in plain LaTeX.
Proposed solution
The limitations of lstlisting with respect to unicode input are documented, and there is a proposed solution in section 2.5 of the documentation. It involves including an escapeinside= parameter in the lstlisting environment to pass the handling of characters in the environment back to latex. For example, here is the original lstlisting in timeit.tex as generated by the build process:
\begin{lstlisting}[language={},postbreak={},numbers=none,xrightmargin=7pt,belowskip=5pt,aboveskip=5pt,breakindent=0pt]
11.1 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
\end{lstlisting}Here is the modified version that includes escapeinside that fixes the issue:
\begin{lstlisting}[language={},postbreak={},numbers=none,xrightmargin=7pt,belowskip=5pt,aboveskip=5pt,breakindent=0pt,escapeinside={*(}{)*}]
*(11.1 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each) )*
\end{lstlisting}Note that the characters that define the escaped section (*( and )* in my example) are configurable and could be specified for the entire document with \lstset.
If the proposed solution sounds workable to you, I'm happy to attempt to implement it. Some discussion would be required to hammer out details (e.g. appropriate escape characters). I wanted to create an issue first to see if there were any additional insights/ideas.
Logging
- nbpublish log for minimal example: timeit.nbpub.log
- Latexmk log for minimal example: timeit.log