Skip to content

Commit b70e799

Browse files
committed
markup: add --citeproc to pandoc converter
Adds the citeproc filter to the pandoc converter. There are several PRs for it this feature already. However, I think simply adding `--citeproc` is the cleanest way to enable this feature, with the option to flesh it out later, e.g., in #7529. Some PRs and issues attempt adding more config options to Hugo which indirectly configure pandoc, but I think simply configuring Pandoc via Pandoc itself is simpler, as it is already possible with two YAML blocks -- one for Hugo, and one for Pandoc: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib ... Document content with @citation! There are other useful options, e.g., #4800 attempts to use `nocite`, which works out of the box with this PR: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib nocite: | @* ... Document content with no citations but a full bibliography: ## Bibliography Other useful options are `csl: ...` and `link-citations: true`, which set the path to a custom CSL file and create HTML links between the references and the bibliography. The following issues and PRs are related: - Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension #6101 Bundles multiple requests, this PR tackles citation parsing. - WIP: Bibliography with Pandoc #4800 Passes the frontmatter to Pandoc and still uses `--filter pandoc-citeproc` instead of `--citeproc`. - Allow configuring Pandoc #7529 That PR is much more extensive and might eventually supersede this PR, but I think --bibliography and --citeproc should be independent options (--bibliography should be optional and citeproc can always be specified). - Pandoc - allow citeproc extension to be invoked, with bibliography. #8610 Similar to #7529, #8610 adds a new config option to Hugo. I think passing --citeproc and letting the users decide on the metadata they want to pass to pandoc is better, albeit uglier.
1 parent 9c2f8ec commit b70e799

File tree

5 files changed

+266
-5
lines changed

5 files changed

+266
-5
lines changed

Diff for: docs/content/en/content-management/bibliography.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Bibliographies in Markdown
3+
linkTitle: Bibliography
4+
description: Include citations and a bibliography in Markdown using LaTeX markup.
5+
categories: [content management]
6+
keywords: [latex,pandoc,citation,reference,bibliography]
7+
menu:
8+
docs:
9+
parent: content-management
10+
weight: 320
11+
weight: 320
12+
toc: true
13+
---
14+
15+
{{< new-in 0.144.0 />}}
16+
17+
## Citations and Bibliographies
18+
19+
[Pandoc](https://pandoc.org) is a universal document converter and can be used to convert markdown files.
20+
21+
With **Pandoc >= 2.11**, you can use [citations](https://pandoc.org/MANUAL.html#extension-citations).
22+
One way is to employ [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX) to cite:
23+
24+
```
25+
---
26+
title: Citation document
27+
---
28+
---
29+
bibliography: assets/bibliography.bib
30+
...
31+
This is a citation: @Doe2022
32+
```
33+
34+
Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc.
35+
Thus, all Pandoc-specific settings should go there.
36+
37+
You can also add all elements from a bibliography file (without citing them explicitly) using:
38+
39+
```
40+
---
41+
title: My Publications
42+
---
43+
---
44+
bibliography: assets/bibliography.bib
45+
nocite: |
46+
@*
47+
...
48+
```
49+
50+
It is also possible to provide a custom [CSL style](https://citationstyles.org/authors/) by passing `csl: path-to-style.csl` as a Pandoc option.

Diff for: docs/content/en/content-management/formats.md

+6
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,12 @@ Hugo passes these CLI flags when calling the Pandoc executable:
111111
--mathjax
112112
```
113113

114+
If your Pandoc has version 2.11 or later, it also passes this CLI flag:
115+
116+
```text
117+
--citeproc
118+
```
119+
114120
[Pandoc]: https://pandoc.org/
115121

116122
### reStructuredText

Diff for: markup/pandoc/convert.go

+71-1
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,14 @@
1515
package pandoc
1616

1717
import (
18+
"bytes"
19+
"strconv"
20+
"strings"
21+
"sync"
22+
1823
"github.com/gohugoio/hugo/common/hexec"
1924
"github.com/gohugoio/hugo/htesting"
2025
"github.com/gohugoio/hugo/identity"
21-
2226
"github.com/gohugoio/hugo/markup/converter"
2327
"github.com/gohugoio/hugo/markup/internal"
2428
)
@@ -64,6 +68,9 @@ func (c *pandocConverter) getPandocContent(src []byte, ctx converter.DocumentCon
6468
return src, nil
6569
}
6670
args := []string{"--mathjax"}
71+
if supportsCitations(c.cfg) {
72+
args = append(args[:], "--citeproc")
73+
}
6774
return internal.ExternallyRenderContent(c.cfg, ctx, src, binaryName, args)
6875
}
6976

@@ -76,6 +83,69 @@ func getPandocBinaryName() string {
7683
return ""
7784
}
7885

86+
type pandocVersion struct {
87+
major, minor int64
88+
}
89+
90+
func (left pandocVersion) greaterThanOrEqual(right pandocVersion) bool {
91+
return left.major > right.major || (left.major == right.major && left.minor >= right.minor)
92+
}
93+
94+
var versionOnce sync.Once
95+
var foundPandocVersion pandocVersion
96+
97+
// getPandocVersion parses the pandoc version output
98+
func getPandocVersion(cfg converter.ProviderConfig) (pandocVersion, error) {
99+
var err error
100+
101+
versionOnce.Do(func() {
102+
argsv := []any{"--version"}
103+
104+
var out bytes.Buffer
105+
argsv = append(argsv, hexec.WithStdout(&out))
106+
107+
cmd, err := cfg.Exec.New(pandocBinary, argsv...)
108+
if err != nil {
109+
cfg.Logger.Errorf("Could not call pandoc: %v", err)
110+
foundPandocVersion = pandocVersion{0, 0}
111+
return
112+
}
113+
114+
err = cmd.Run()
115+
if err != nil {
116+
cfg.Logger.Errorf("%s --version: %v", pandocBinary, err)
117+
foundPandocVersion = pandocVersion{0, 0}
118+
return
119+
}
120+
121+
outbytes := bytes.Replace(out.Bytes(), []byte("\r"), []byte(""), -1)
122+
output := strings.Split(string(outbytes), "\n")[0]
123+
// Split, e.g., "pandoc 2.5" into 2 and 5 and convert them to integers
124+
versionStrings := strings.Split(strings.Split(output, " ")[1], ".")
125+
majorVersion, err := strconv.ParseInt(versionStrings[0], 10, 64)
126+
if err != nil {
127+
println(err)
128+
}
129+
minorVersion, err := strconv.ParseInt(versionStrings[1], 10, 64)
130+
if err != nil {
131+
println(err)
132+
}
133+
foundPandocVersion = pandocVersion{majorVersion, minorVersion}
134+
})
135+
136+
return foundPandocVersion, err
137+
}
138+
139+
// SupportsCitations returns true for pandoc versions >= 2.11, which include citeproc
140+
func supportsCitations(cfg converter.ProviderConfig) bool {
141+
if Supports() {
142+
foundPandocVersion, err := getPandocVersion(cfg)
143+
supportsCitations := foundPandocVersion.greaterThanOrEqual(pandocVersion{2, 11}) && err == nil
144+
return supportsCitations
145+
}
146+
return false
147+
}
148+
79149
// Supports returns whether Pandoc is installed on this computer.
80150
func Supports() bool {
81151
hasBin := getPandocBinaryName() != ""

Diff for: markup/pandoc/convert_test.go

+133-4
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ import (
2525
qt "github.com/frankban/quicktest"
2626
)
2727

28-
func TestConvert(t *testing.T) {
28+
func setupTestConverter(t *testing.T) (*qt.C, converter.Converter, converter.ProviderConfig) {
2929
if !Supports() {
3030
t.Skip("pandoc not installed")
3131
}
@@ -34,11 +34,140 @@ func TestConvert(t *testing.T) {
3434
var err error
3535
sc.Exec.Allow, err = security.NewWhitelist("pandoc")
3636
c.Assert(err, qt.IsNil)
37-
p, err := Provider.New(converter.ProviderConfig{Exec: hexec.New(sc, "", loggers.NewDefault()), Logger: loggers.NewDefault()})
37+
cfg := converter.ProviderConfig{Exec: hexec.New(sc, "", loggers.NewDefault()), Logger: loggers.NewDefault()}
38+
p, err := Provider.New(cfg)
3839
c.Assert(err, qt.IsNil)
3940
conv, err := p.New(converter.DocumentContext{})
4041
c.Assert(err, qt.IsNil)
41-
b, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
42+
return c, conv, cfg
43+
}
44+
45+
func TestConvert(t *testing.T) {
46+
c, conv, _ := setupTestConverter(t)
47+
output, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
4248
c.Assert(err, qt.IsNil)
43-
c.Assert(string(b.Bytes()), qt.Equals, "<p>testContent</p>\n")
49+
c.Assert(string(output.Bytes()), qt.Equals, "<p>testContent</p>\n")
50+
}
51+
52+
func runCiteprocTest(t *testing.T, content string, expectContained []string, expectNotContained []string) {
53+
c, conv, cfg := setupTestConverter(t)
54+
if !supportsCitations(cfg) {
55+
t.Skip("pandoc does not support citations")
56+
}
57+
output, err := conv.Convert(converter.RenderContext{Src: []byte(content)})
58+
c.Assert(err, qt.IsNil)
59+
for _, expected := range expectContained {
60+
c.Assert(string(output.Bytes()), qt.Contains, expected)
61+
}
62+
for _, notExpected := range expectNotContained {
63+
c.Assert(string(output.Bytes()), qt.Not(qt.Contains), notExpected)
64+
}
65+
}
66+
67+
func TestGetPandocVersionCallTwice(t *testing.T) {
68+
c, _, cfg := setupTestConverter(t)
69+
70+
version1, err1 := getPandocVersion(cfg)
71+
version2, err2 := getPandocVersion(cfg)
72+
c.Assert(version1, qt.Equals, version2)
73+
c.Assert(err1, qt.IsNil)
74+
c.Assert(err2, qt.IsNil)
75+
}
76+
77+
func TestPandocVersionEquality(t *testing.T) {
78+
c := qt.New(t)
79+
v1 := pandocVersion{1, 0}
80+
v2 := pandocVersion{2, 0}
81+
v2_2 := pandocVersion{2, 2}
82+
v1_2 := pandocVersion{1, 2}
83+
v2_11 := pandocVersion{2, 11}
84+
v3_9 := pandocVersion{3, 9}
85+
v1_15 := pandocVersion{1, 15}
86+
87+
c.Assert(v1.greaterThanOrEqual(v1), qt.IsTrue)
88+
89+
c.Assert(v1.greaterThanOrEqual(v2), qt.IsFalse)
90+
c.Assert(v2.greaterThanOrEqual(v1), qt.IsTrue)
91+
92+
c.Assert(v2.greaterThanOrEqual(v2_2), qt.IsFalse)
93+
c.Assert(v2_2.greaterThanOrEqual(v2), qt.IsTrue)
94+
95+
c.Assert(v2_2.greaterThanOrEqual(v1_2), qt.IsTrue)
96+
c.Assert(v1_2.greaterThanOrEqual(v2_2), qt.IsFalse)
97+
98+
c.Assert(v2_11.greaterThanOrEqual(v2_2), qt.IsTrue)
99+
c.Assert(v2_2.greaterThanOrEqual(v2_11), qt.IsFalse)
100+
101+
c.Assert(v3_9.greaterThanOrEqual(v2_11), qt.IsTrue)
102+
c.Assert(v2_11.greaterThanOrEqual(v3_9), qt.IsFalse)
103+
104+
c.Assert(v2_11.greaterThanOrEqual(v1_15), qt.IsTrue)
105+
c.Assert(v1_15.greaterThanOrEqual(v2_11), qt.IsFalse)
106+
}
107+
108+
func TestCiteprocWithHugoMeta(t *testing.T) {
109+
content := `
110+
---
111+
title: Test
112+
published: 2022-05-30
113+
---
114+
testContent
115+
`
116+
expected := []string{"testContent"}
117+
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
118+
runCiteprocTest(t, content, expected, unexpected)
119+
}
120+
121+
func TestCiteprocWithPandocMeta(t *testing.T) {
122+
content := `
123+
---
124+
---
125+
---
126+
...
127+
testContent
128+
`
129+
expected := []string{"testContent"}
130+
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
131+
runCiteprocTest(t, content, expected, unexpected)
132+
}
133+
134+
func TestCiteprocWithBibliography(t *testing.T) {
135+
content := `
136+
---
137+
---
138+
---
139+
bibliography: testdata/bibliography.bib
140+
...
141+
testContent
142+
`
143+
expected := []string{"testContent"}
144+
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
145+
runCiteprocTest(t, content, expected, unexpected)
146+
}
147+
148+
func TestCiteprocWithExplicitCitation(t *testing.T) {
149+
content := `
150+
---
151+
---
152+
---
153+
bibliography: testdata/bibliography.bib
154+
...
155+
@Doe2022
156+
`
157+
expected := []string{"Doe", "Mustermann", "2022", "Treatise"}
158+
runCiteprocTest(t, content, expected, []string{})
159+
}
160+
161+
func TestCiteprocWithNocite(t *testing.T) {
162+
content := `
163+
---
164+
---
165+
---
166+
bibliography: testdata/bibliography.bib
167+
nocite: |
168+
@*
169+
...
170+
`
171+
expected := []string{"Doe", "Mustermann", "2022", "Treatise"}
172+
runCiteprocTest(t, content, expected, []string{})
44173
}

Diff for: markup/pandoc/testdata/bibliography.bib

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
@article{Doe2022,
2+
author = "Jane Doe and Max Mustermann",
3+
title = "A Treatise on Hugo Tests",
4+
journal = "Hugo Websites",
5+
year = "2022",
6+
}

0 commit comments

Comments
 (0)