Skip to content

Commit d7aaa2a

Browse files
committed
markup: add --citeproc to pandoc converter
Adds the citeproc filter to the pandoc converter. There are several PRs for it this feature already. However, I think simply adding `--citeproc` is the cleanest way to enable this feature, with the option to flesh it out later, e.g., in #7529. Some PRs and issues attempt adding more config options to Hugo which indirectly configure pandoc, but I think simply configuring Pandoc via Pandoc itself is simpler, as it is already possible with two YAML blocks -- one for Hugo, and one for Pandoc: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib ... Document content with @citation! There are other useful options, e.g., #4800 attempts to use `nocite`, which works out of the box with this PR: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib nocite: | @* ... Document content with no citations but a full bibliography: ## Bibliography Other useful options are `csl: ...` and `link-citations: true`, which set the path to a custom CSL file and create HTML links between the references and the bibliography. The following issues and PRs are related: - Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension #6101 Bundles multiple requests, this PR tackles citation parsing. - WIP: Bibliography with Pandoc #4800 Passes the frontmatter to Pandoc and still uses `--filter pandoc-citeproc` instead of `--citeproc`. - Allow configuring Pandoc #7529 That PR is much more extensive and might eventually supersede this PR, but I think --bibliography and --citeproc should be independent options (--bibliography should be optional and citeproc can always be specified). - Pandoc - allow citeproc extension to be invoked, with bibliography. #8610 Similar to #7529, #8610 adds a new config option to Hugo. I think passing --citeproc and letting the users decide on the metadata they want to pass to pandoc is better, albeit uglier.
1 parent 9c2f8ec commit d7aaa2a

File tree

5 files changed

+269
-4
lines changed

5 files changed

+269
-4
lines changed

Diff for: docs/content/en/content-management/bibliography.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Bibliographies in Markdown
3+
linkTitle: Bibliography
4+
description: Include citations and a bibliography in Markdown using LaTeX markup.
5+
categories: [content management]
6+
keywords: [latex,pandoc,citation,reference,bibliography]
7+
menu:
8+
docs:
9+
parent: content-management
10+
weight: 320
11+
weight: 320
12+
toc: true
13+
---
14+
15+
{{< new-in 0.144.0 />}}
16+
17+
## Citations and Bibliographies
18+
19+
[Pandoc](https://pandoc.org) is a universal document converter and can be used to convert markdown files.
20+
21+
With **Pandoc >= 2.11**, you can use [citations](https://pandoc.org/MANUAL.html#extension-citations).
22+
One way is to employ [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX) to cite:
23+
24+
```
25+
---
26+
title: Citation document
27+
---
28+
---
29+
bibliography: assets/bibliography.bib
30+
...
31+
This is a citation: @Doe2022
32+
```
33+
34+
Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc.
35+
Thus, all Pandoc-specific settings should go there.
36+
37+
You can also add all elements from a bibliography file (without citing them explicitly) using:
38+
39+
```
40+
---
41+
title: My Publications
42+
---
43+
---
44+
bibliography: assets/bibliography.bib
45+
nocite: |
46+
@*
47+
...
48+
```
49+
50+
It is also possible to provide a custom [CSL style](https://citationstyles.org/authors/) by passing `csl: path-to-style.csl` as a Pandoc option.

Diff for: docs/content/en/content-management/formats.md

+6
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,12 @@ Hugo passes these CLI flags when calling the Pandoc executable:
111111
--mathjax
112112
```
113113

114+
If your Pandoc has version 2.11 or later, it also passes this CLI flag:
115+
116+
```text
117+
--citeproc
118+
```
119+
114120
[Pandoc]: https://pandoc.org/
115121

116122
### reStructuredText

Diff for: markup/pandoc/convert.go

+71-1
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,14 @@
1515
package pandoc
1616

1717
import (
18+
"bytes"
19+
"strconv"
20+
"strings"
21+
"sync"
22+
1823
"github.com/gohugoio/hugo/common/hexec"
1924
"github.com/gohugoio/hugo/htesting"
2025
"github.com/gohugoio/hugo/identity"
21-
2226
"github.com/gohugoio/hugo/markup/converter"
2327
"github.com/gohugoio/hugo/markup/internal"
2428
)
@@ -64,6 +68,9 @@ func (c *pandocConverter) getPandocContent(src []byte, ctx converter.DocumentCon
6468
return src, nil
6569
}
6670
args := []string{"--mathjax"}
71+
if supportsCitations(c.cfg) {
72+
args = append(args[:], "--citeproc")
73+
}
6774
return internal.ExternallyRenderContent(c.cfg, ctx, src, binaryName, args)
6875
}
6976

@@ -76,6 +83,69 @@ func getPandocBinaryName() string {
7683
return ""
7784
}
7885

86+
type pandocVersion struct {
87+
major, minor int64
88+
}
89+
90+
func (left pandocVersion) greaterThanOrEqual(right pandocVersion) bool {
91+
return left.major > right.major || (left.major == right.major && left.minor >= right.minor)
92+
}
93+
94+
var versionOnce sync.Once
95+
var foundPandocVersion pandocVersion
96+
97+
// getPandocVersion parses the pandoc version output
98+
func getPandocVersion(cfg converter.ProviderConfig) (pandocVersion, error) {
99+
var err error
100+
101+
versionOnce.Do(func() {
102+
argsv := []any{"--version"}
103+
104+
var out bytes.Buffer
105+
argsv = append(argsv, hexec.WithStdout(&out))
106+
107+
cmd, err := cfg.Exec.New(pandocBinary, argsv...)
108+
if err != nil {
109+
cfg.Logger.Errorf("Could not call pandoc: %v", err)
110+
foundPandocVersion = pandocVersion{0, 0}
111+
return
112+
}
113+
114+
err = cmd.Run()
115+
if err != nil {
116+
cfg.Logger.Errorf("%s --version: %v", pandocBinary, err)
117+
foundPandocVersion = pandocVersion{0, 0}
118+
return
119+
}
120+
121+
outbytes := bytes.Replace(out.Bytes(), []byte("\r"), []byte(""), -1)
122+
output := strings.Split(string(outbytes), "\n")[0]
123+
// Split, e.g., "pandoc 2.5" into 2 and 5 and convert them to integers
124+
versionStrings := strings.Split(strings.Split(output, " ")[1], ".")
125+
majorVersion, err := strconv.ParseInt(versionStrings[0], 10, 64)
126+
if err != nil {
127+
println(err)
128+
}
129+
minorVersion, err := strconv.ParseInt(versionStrings[1], 10, 64)
130+
if err != nil {
131+
println(err)
132+
}
133+
foundPandocVersion = pandocVersion{majorVersion, minorVersion}
134+
})
135+
136+
return foundPandocVersion, err
137+
}
138+
139+
// SupportsCitations returns true for pandoc versions >= 2.11, which include citeproc
140+
func supportsCitations(cfg converter.ProviderConfig) bool {
141+
if Supports() {
142+
foundPandocVersion, err := getPandocVersion(cfg)
143+
supportsCitations := foundPandocVersion.greaterThanOrEqual(pandocVersion{2, 11}) && err == nil
144+
return supportsCitations
145+
}
146+
return false
147+
}
148+
79149
// Supports returns whether Pandoc is installed on this computer.
80150
func Supports() bool {
81151
hasBin := getPandocBinaryName() != ""

Diff for: markup/pandoc/convert_test.go

+136-3
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ import (
2525
qt "github.com/frankban/quicktest"
2626
)
2727

28-
func TestConvert(t *testing.T) {
28+
func setupTestConverter(t *testing.T) (*qt.C, converter.Converter, converter.ProviderConfig) {
2929
if !Supports() {
3030
t.Skip("pandoc not installed")
3131
}
@@ -38,7 +38,140 @@ func TestConvert(t *testing.T) {
3838
c.Assert(err, qt.IsNil)
3939
conv, err := p.New(converter.DocumentContext{})
4040
c.Assert(err, qt.IsNil)
41-
b, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
41+
return c, conv, cfg
42+
}
43+
44+
func TestConvert(t *testing.T) {
45+
c, conv, _ := setupTestConverter(t)
46+
output, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
47+
c.Assert(err, qt.IsNil)
48+
c.Assert(string(output.Bytes()), qt.Equals, "<p>testContent</p>\n")
49+
}
50+
51+
func runCiteprocTest(t *testing.T, content string, expected string) {
52+
c, conv, cfg := setupTestConverter(t)
53+
if !supportsCitations(cfg) {
54+
t.Skip("pandoc does not support citations")
55+
}
56+
output, err := conv.Convert(converter.RenderContext{Src: []byte(content)})
4257
c.Assert(err, qt.IsNil)
43-
c.Assert(string(b.Bytes()), qt.Equals, "<p>testContent</p>\n")
58+
c.Assert(string(output.Bytes()), qt.Equals, expected)
59+
}
60+
61+
func TestGetPandocVersionCallTwice(t *testing.T) {
62+
c, _, cfg := setupTestConverter(t)
63+
64+
version1, err1 := getPandocVersion(cfg)
65+
version2, err2 := getPandocVersion(cfg)
66+
c.Assert(version1, qt.Equals, version2)
67+
c.Assert(err1, qt.IsNil)
68+
c.Assert(err2, qt.IsNil)
69+
}
70+
71+
func TestPandocVersionEquality(t *testing.T) {
72+
c := qt.New(t)
73+
v1 := pandocVersion{1, 0}
74+
v2 := pandocVersion{2, 0}
75+
v3 := pandocVersion{2, 2}
76+
v4 := pandocVersion{1, 2}
77+
v5 := pandocVersion{2, 11}
78+
79+
// 1 >= 1 -> true
80+
c.Assert(v1.greaterThanOrEqual(v1), qt.IsTrue)
81+
82+
// 1 >= 2 -> false, 2 >= 1 -> tru
83+
c.Assert(v1.greaterThanOrEqual(v2), qt.IsFalse)
84+
c.Assert(v2.greaterThanOrEqual(v1), qt.IsTrue)
85+
86+
// 2.0 >= 2.2 -> false, 2.2 >= 2.0 -> true
87+
c.Assert(v2.greaterThanOrEqual(v3), qt.IsFalse)
88+
c.Assert(v3.greaterThanOrEqual(v2), qt.IsTrue)
89+
90+
// 2.2 >= 1.2 -> true, 1.2 >= 2.2 -> false
91+
c.Assert(v3.greaterThanOrEqual(v4), qt.IsTrue)
92+
c.Assert(v4.greaterThanOrEqual(v3), qt.IsFalse)
93+
94+
// 2.11 >= 2.2 -> true, 2.2 >= 2.11 -> false
95+
c.Assert(v5.greaterThanOrEqual(v3), qt.IsTrue)
96+
c.Assert(v3.greaterThanOrEqual(v5), qt.IsFalse)
97+
}
98+
99+
func TestCiteprocWithHugoMeta(t *testing.T) {
100+
content := `
101+
---
102+
title: Test
103+
published: 2022-05-30
104+
---
105+
testContent
106+
`
107+
expected := "<p>testContent</p>\n"
108+
runCiteprocTest(t, content, expected)
109+
}
110+
111+
func TestCiteprocWithPandocMeta(t *testing.T) {
112+
content := `
113+
---
114+
---
115+
---
116+
...
117+
testContent
118+
`
119+
expected := "<p>testContent</p>\n"
120+
runCiteprocTest(t, content, expected)
121+
}
122+
123+
func TestCiteprocWithBibliography(t *testing.T) {
124+
content := `
125+
---
126+
---
127+
---
128+
bibliography: testdata/bibliography.bib
129+
...
130+
testContent
131+
`
132+
expected := "<p>testContent</p>\n"
133+
runCiteprocTest(t, content, expected)
134+
}
135+
136+
func TestCiteprocWithExplicitCitation(t *testing.T) {
137+
content := `
138+
---
139+
---
140+
---
141+
bibliography: testdata/bibliography.bib
142+
...
143+
@Doe2022
144+
`
145+
expected := `<p><span class="citation" data-cites="Doe2022">Doe and Mustermann
146+
(2022)</span></p>
147+
<div id="refs" class="references csl-bib-body hanging-indent"
148+
role="doc-bibliography">
149+
<div id="ref-Doe2022" class="csl-entry" role="doc-biblioentry">
150+
Doe, Jane, and Max Mustermann. 2022. <span>“A Treatise on Hugo
151+
Tests.”</span> <em>Hugo Websites</em>.
152+
</div>
153+
</div>
154+
`
155+
runCiteprocTest(t, content, expected)
156+
}
157+
158+
func TestCiteprocWithNocite(t *testing.T) {
159+
content := `
160+
---
161+
---
162+
---
163+
bibliography: testdata/bibliography.bib
164+
nocite: |
165+
@*
166+
...
167+
`
168+
expected := `<div id="refs" class="references csl-bib-body hanging-indent"
169+
role="doc-bibliography">
170+
<div id="ref-Doe2022" class="csl-entry" role="doc-biblioentry">
171+
Doe, Jane, and Max Mustermann. 2022. <span>“A Treatise on Hugo
172+
Tests.”</span> <em>Hugo Websites</em>.
173+
</div>
174+
</div>
175+
`
176+
runCiteprocTest(t, content, expected)
44177
}

Diff for: markup/pandoc/testdata/bibliography.bib

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
@article{Doe2022,
2+
author = "Jane Doe and Max Mustermann",
3+
title = "A Treatise on Hugo Tests",
4+
journal = "Hugo Websites",
5+
year = "2022",
6+
}

0 commit comments

Comments
 (0)