Skip to content

Commit 4f13470

Browse files
committed
markup: add --citeproc to pandoc converter
Adds the citeproc filter to the pandoc converter. There are several PRs for it this feature already. However, I think simply adding `--citeproc` is the cleanest way to enable this feature, with the option to flesh it out later, e.g., in gohugoio#7529. Some PRs and issues attempt adding more config options to Hugo which indirectly configure pandoc, but I think simply configuring Pandoc via Pandoc itself is simpler, as it is already possible with two YAML blocks -- one for Hugo, and one for Pandoc: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib ... Document content with @citation! There are other useful options, e.g., gohugoio#4800 attempts to use `nocite`, which works out of the box with this PR: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib nocite: | @* ... Document content with no citations but a full bibliography: ## Bibliography Other useful options are `csl: ...` and `link-citations: true`, which set the path to a custom CSL file and create HTML links between the references and the bibliography. The following issues and PRs are related: - Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension gohugoio#6101 Bundles multiple requests, this PR tackles citation parsing. - WIP: Bibliography with Pandoc gohugoio#4800 Passes the frontmatter to Pandoc and still uses `--filter pandoc-citeproc` instead of `--citeproc`. - Allow configuring Pandoc gohugoio#7529 That PR is much more extensive and might eventually supersede this PR, but I think --bibliography and --citeproc should be independent options (--bibliography should be optional and citeproc can always be specified). - Pandoc - allow citeproc extension to be invoked, with bibliography. gohugoio#8610 Similar to gohugoio#7529, gohugoio#8610 adds a new config option to Hugo. I think passing --citeproc and letting the users decide on the metadata they want to pass to pandoc is better, albeit uglier.
1 parent 15463f8 commit 4f13470

File tree

4 files changed

+227
-11
lines changed

4 files changed

+227
-11
lines changed

Diff for: docs/content/en/content-management/formats.md

+57-5
Original file line numberDiff line numberDiff line change
@@ -47,21 +47,21 @@ Hugo passes reasonable default arguments to these external helpers by default:
4747

4848
- `asciidoctor`: `--no-header-footer -`
4949
- `rst2html`: `--leave-comments --initial-header-level=2`
50-
- `pandoc`: `--mathjax`
50+
- `pandoc`: `--mathjax` and, for pandoc >= 2.11, `--citeproc`
5151

5252
{{% warning "Performance of External Helpers" %}}
5353
Because additional formats are external commands, generation performance will rely heavily on the performance of the external tool you are using. As this feature is still in its infancy, feedback is welcome.
5454
{{% /warning %}}
5555

5656
### External Helper AsciiDoc
5757

58-
[AsciiDoc](https://github.com/asciidoc/asciidoc) implementation EOLs in Jan 2020 and is no longer supported.
59-
AsciiDoc development is being continued under [Asciidoctor](https://github.com/asciidoctor). The format AsciiDoc
58+
[AsciiDoc](https://github.com/asciidoc/asciidoc) implementation EOLs in Jan 2020 and is no longer supported.
59+
AsciiDoc development is being continued under [Asciidoctor](https://github.com/asciidoctor). The format AsciiDoc
6060
remains of course. Please continue with the implementation Asciidoctor.
6161

6262
### External Helper Asciidoctor
6363

64-
The Asciidoctor community offers a wide set of tools for the AsciiDoc format that can be installed additionally to Hugo.
64+
The Asciidoctor community offers a wide set of tools for the AsciiDoc format that can be installed additionally to Hugo.
6565
[See the Asciidoctor docs for installation instructions](https://asciidoctor.org/docs/install-toolchain/). Make sure that also all
6666
optional extensions like `asciidoctor-diagram` or `asciidoctor-html5s` are installed if required.
6767

@@ -109,13 +109,65 @@ Example of how to set extensions and attributes:
109109
my-attribute-name = "my value"
110110
```
111111

112-
In a complex Asciidoctor environment it is sometimes helpful to debug the exact call to your external helper with all
112+
In a complex Asciidoctor environment it is sometimes helpful to debug the exact call to your external helper with all
113113
parameters. Run Hugo with `-v`. You will get an output like
114114

115115
```
116116
INFO 2019/12/22 09:08:48 Rendering book-as-pdf.adoc with C:\Ruby26-x64\bin\asciidoctor.bat using asciidoc args [--no-header-footer -r asciidoctor-html5s -b html5s -r asciidoctor-diagram --base-dir D:\prototypes\hugo_asciidoc_ddd\docs -a outdir=D:\prototypes\hugo_asciidoc_ddd\build -] ...
117117
```
118118

119+
### External Helper Pandoc
120+
121+
[Pandoc](https://pandoc.org) is a universal document converter and can be used to convert markdown files.
122+
In Hugo, Pandoc can be used for LaTeX-style math (the `--mathjax` command line option is provided):
123+
124+
```
125+
---
126+
title: Math document
127+
---
128+
129+
Some inline math: $a^2 + b^2 = c^2$.
130+
```
131+
132+
This will render in your HTML as:
133+
134+
```
135+
<p>Some inline math: <span class="math inline">\(a^2 + b^2 = c^2\)</span></p>
136+
```
137+
You will have to [add MathJax](https://www.mathjax.org/#gettingstarted) to your template to properly render the math.
138+
139+
For **Pandoc >= 2.11**, you can use [citations](https://pandoc.org/MANUAL.html#extension-citations).
140+
One way is to employ [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX) to cite:
141+
142+
```
143+
---
144+
title: Citation document
145+
---
146+
---
147+
bibliography: assets/bibliography.bib
148+
...
149+
This is a citation: @Doe2022
150+
```
151+
152+
Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc.
153+
Thus, all Pandoc settings should go there.
154+
155+
You can also add all elements from a bibliography file (without citing them explicitly) using:
156+
157+
```
158+
---
159+
title: My Publications
160+
---
161+
---
162+
bibliography: assets/bibliography.bib
163+
nocite: |
164+
@*
165+
...
166+
```
167+
168+
It is also possible to provide a custom [CSL style](https://citationstyles.org/authors/) by passing `csl: path-to-style.csl` as a Pandoc option.
169+
170+
119171
## Learn Markdown
120172

121173
Markdown syntax is simple enough to learn in a single sitting. The following are excellent resources to get you up and running:

Diff for: markup/pandoc/convert.go

+54-2
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,16 @@
1515
package pandoc
1616

1717
import (
18+
"bytes"
19+
"fmt"
20+
"strings"
21+
"sync"
22+
1823
"github.com/gohugoio/hugo/common/hexec"
1924
"github.com/gohugoio/hugo/htesting"
2025
"github.com/gohugoio/hugo/identity"
21-
"github.com/gohugoio/hugo/markup/internal"
22-
2326
"github.com/gohugoio/hugo/markup/converter"
27+
"github.com/gohugoio/hugo/markup/internal"
2428
)
2529

2630
// Provider is the package entry point.
@@ -65,6 +69,9 @@ func (c *pandocConverter) getPandocContent(src []byte, ctx converter.DocumentCon
6569
return src, nil
6670
}
6771
args := []string{"--mathjax"}
72+
if supportsCitations(c.cfg) {
73+
args = append(args[:], "--citeproc")
74+
}
6875
return internal.ExternallyRenderContent(c.cfg, ctx, src, binaryName, args)
6976
}
7077

@@ -77,6 +84,51 @@ func getPandocBinaryName() string {
7784
return ""
7885
}
7986

87+
var versionOnce sync.Once
88+
var pandocVersion string
89+
90+
// getPandocVersion parses the pandoc version output
91+
func getPandocVersion(cfg converter.ProviderConfig) (string, error) {
92+
var err error
93+
94+
versionOnce.Do(func() {
95+
argsv := []any{"--version"}
96+
97+
var out bytes.Buffer
98+
argsv = append(argsv, hexec.WithStdout(&out))
99+
100+
cmd, err := cfg.Exec.New(pandocBinary, argsv...)
101+
if err != nil {
102+
pandocVersion = ""
103+
return
104+
}
105+
106+
err = cmd.Run()
107+
if err != nil {
108+
cfg.Logger.Errorf("%s --version: %v", pandocBinary, err)
109+
}
110+
111+
outbytes := bytes.Replace(out.Bytes(), []byte("\r"), []byte(""), -1)
112+
output := strings.Split(string(outbytes), "\n")[0]
113+
pandocVersion = strings.Split(output, " ")[1]
114+
})
115+
116+
return pandocVersion, err
117+
}
118+
119+
// SupportsCitations returns true for pandoc versions >= 2.11, which include citeproc
120+
func supportsCitations(cfg converter.ProviderConfig) bool {
121+
pandocVersion, err := getPandocVersion(cfg)
122+
supportsCitations := pandocVersion >= "2.11" && err == nil
123+
if htesting.SupportsAll() {
124+
if !supportsCitations {
125+
panic(fmt.Sprintf("pandoc %s does not support citations", pandocVersion))
126+
}
127+
return true
128+
}
129+
return supportsCitations
130+
}
131+
80132
// Supports returns whether Pandoc is installed on this computer.
81133
func Supports() bool {
82134
hasBin := getPandocBinaryName() != ""

Diff for: markup/pandoc/convert_test.go

+110-4
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,124 @@ import (
2525
qt "github.com/frankban/quicktest"
2626
)
2727

28-
func TestConvert(t *testing.T) {
28+
func setupTestConverter(t *testing.T) (*qt.C, converter.Converter, converter.ProviderConfig) {
2929
if !Supports() {
3030
t.Skip("pandoc not installed")
3131
}
3232
c := qt.New(t)
3333
sc := security.DefaultConfig
3434
sc.Exec.Allow = security.NewWhitelist("pandoc")
35-
p, err := Provider.New(converter.ProviderConfig{Exec: hexec.New(sc), Logger: loggers.NewErrorLogger()})
35+
cfg := converter.ProviderConfig{Exec: hexec.New(sc), Logger: loggers.NewErrorLogger()}
36+
p, err := Provider.New(cfg)
3637
c.Assert(err, qt.IsNil)
3738
conv, err := p.New(converter.DocumentContext{})
3839
c.Assert(err, qt.IsNil)
39-
b, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
40+
return c, conv, cfg
41+
}
42+
43+
func TestConvert(t *testing.T) {
44+
c, conv, _ := setupTestConverter(t)
45+
output, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
4046
c.Assert(err, qt.IsNil)
41-
c.Assert(string(b.Bytes()), qt.Equals, "<p>testContent</p>\n")
47+
c.Assert(string(output.Bytes()), qt.Equals, "<p>testContent</p>\n")
48+
}
49+
50+
func runCiteprocTest(t *testing.T, content string, expected string) {
51+
c, conv, cfg := setupTestConverter(t)
52+
if !supportsCitations(cfg) {
53+
t.Skip("pandoc does not support citations")
54+
}
55+
output, err := conv.Convert(converter.RenderContext{Src: []byte(content)})
56+
c.Assert(err, qt.IsNil)
57+
c.Assert(string(output.Bytes()), qt.Equals, expected)
58+
}
59+
60+
func TestGetPandocVersionCallTwice(t *testing.T) {
61+
c, _, cfg := setupTestConverter(t)
62+
63+
version1, err1 := getPandocVersion(cfg)
64+
version2, err2 := getPandocVersion(cfg)
65+
c.Assert(version1, qt.Equals, version2)
66+
c.Assert(err1, qt.IsNil)
67+
c.Assert(err2, qt.IsNil)
68+
}
69+
70+
func TestCiteprocWithHugoMeta(t *testing.T) {
71+
content := `
72+
---
73+
title: Test
74+
published: 2022-05-30
75+
---
76+
testContent
77+
`
78+
expected := "<p>testContent</p>\n"
79+
runCiteprocTest(t, content, expected)
80+
}
81+
82+
func TestCiteprocWithPandocMeta(t *testing.T) {
83+
content := `
84+
---
85+
---
86+
---
87+
...
88+
testContent
89+
`
90+
expected := "<p>testContent</p>\n"
91+
runCiteprocTest(t, content, expected)
92+
}
93+
94+
func TestCiteprocWithBibliography(t *testing.T) {
95+
content := `
96+
---
97+
---
98+
---
99+
bibliography: testdata/bibliography.bib
100+
...
101+
testContent
102+
`
103+
expected := "<p>testContent</p>\n"
104+
runCiteprocTest(t, content, expected)
105+
}
106+
107+
func TestCiteprocWithExplicitCitation(t *testing.T) {
108+
content := `
109+
---
110+
---
111+
---
112+
bibliography: testdata/bibliography.bib
113+
...
114+
@Doe2022
115+
`
116+
expected := `<p><span class="citation" data-cites="Doe2022">Doe and Mustermann
117+
(2022)</span></p>
118+
<div id="refs" class="references csl-bib-body hanging-indent"
119+
role="doc-bibliography">
120+
<div id="ref-Doe2022" class="csl-entry" role="doc-biblioentry">
121+
Doe, Jane, and Max Mustermann. 2022. <span>“A Treatise on Hugo
122+
Tests.”</span> <em>Hugo Websites</em>.
123+
</div>
124+
</div>
125+
`
126+
runCiteprocTest(t, content, expected)
127+
}
128+
129+
func TestCiteprocWithNocite(t *testing.T) {
130+
content := `
131+
---
132+
---
133+
---
134+
bibliography: testdata/bibliography.bib
135+
nocite: |
136+
@*
137+
...
138+
`
139+
expected := `<div id="refs" class="references csl-bib-body hanging-indent"
140+
role="doc-bibliography">
141+
<div id="ref-Doe2022" class="csl-entry" role="doc-biblioentry">
142+
Doe, Jane, and Max Mustermann. 2022. <span>“A Treatise on Hugo
143+
Tests.”</span> <em>Hugo Websites</em>.
144+
</div>
145+
</div>
146+
`
147+
runCiteprocTest(t, content, expected)
42148
}

Diff for: markup/pandoc/testdata/bibliography.bib

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
@article{Doe2022,
2+
author = "Jane Doe and Max Mustermann",
3+
title = "A Treatise on Hugo Tests",
4+
journal = "Hugo Websites",
5+
year = "2022",
6+
}

0 commit comments

Comments
 (0)