Skip to content

Commit 038007c

Browse files
committed
markup: add --citeproc to pandoc converter
Adds the citeproc filter to the pandoc converter. There are several PRs for it this feature already. However, I think simply adding `--citeproc` is the cleanest way to enable this feature, with the option to flesh it out later, e.g., in #7529. Some PRs and issues attempt adding more config options to Hugo which indirectly configure pandoc, but I think simply configuring Pandoc via Pandoc itself is simpler, as it is already possible with two YAML blocks -- one for Hugo, and one for Pandoc: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib ... Document content with @citation! There are other useful options, e.g., #4800 attempts to use `nocite`, which works out of the box with this PR: --- title: This is the Hugo YAML block --- --- bibliography: assets/pandoc-yaml-block-bibliography.bib nocite: | @* ... Document content with no citations but a full bibliography: ## Bibliography Other useful options are `csl: ...` and `link-citations: true`, which set the path to a custom CSL file and create HTML links between the references and the bibliography. The following issues and PRs are related: - Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension #6101 Bundles multiple requests, this PR tackles citation parsing. - WIP: Bibliography with Pandoc #4800 Passes the frontmatter to Pandoc and still uses `--filter pandoc-citeproc` instead of `--citeproc`. - Allow configuring Pandoc #7529 That PR is much more extensive and might eventually supersede this PR, but I think --bibliography and --citeproc should be independent options (--bibliography should be optional and citeproc can always be specified). - Pandoc - allow citeproc extension to be invoked, with bibliography. #8610 Similar to #7529, #8610 adds a new config option to Hugo. I think passing --citeproc and letting the users decide on the metadata they want to pass to pandoc is better, albeit uglier.
1 parent 52561d5 commit 038007c

File tree

5 files changed

+211
-5
lines changed

5 files changed

+211
-5
lines changed

Diff for: docs/content/en/content-management/bibliography.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Bibliographies in Markdown
3+
linkTitle: Bibliography
4+
description: Include citations and a bibliography in Markdown using LaTeX markup.
5+
categories: [content management]
6+
keywords: [latex,pandoc,citation,reference,bibliography]
7+
menu:
8+
docs:
9+
parent: content-management
10+
weight: 320
11+
weight: 320
12+
toc: true
13+
---
14+
15+
{{< new-in 0.144.0 />}}
16+
17+
## Citations and Bibliographies
18+
19+
[Pandoc](https://pandoc.org) is a universal document converter and can be used to convert markdown files.
20+
21+
With **Pandoc >= 2.11**, you can use [citations](https://pandoc.org/MANUAL.html#extension-citations).
22+
One way is to employ [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX) to cite:
23+
24+
```
25+
---
26+
title: Citation document
27+
---
28+
---
29+
bibliography: assets/bibliography.bib
30+
...
31+
This is a citation: @Doe2022
32+
```
33+
34+
Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc.
35+
Thus, all Pandoc-specific settings should go there.
36+
37+
You can also add all elements from a bibliography file (without citing them explicitly) using:
38+
39+
```
40+
---
41+
title: My Publications
42+
---
43+
---
44+
bibliography: assets/bibliography.bib
45+
nocite: |
46+
@*
47+
...
48+
```
49+
50+
It is also possible to provide a custom [CSL style](https://citationstyles.org/authors/) by passing `csl: path-to-style.csl` as a Pandoc option.

Diff for: docs/content/en/content-management/formats.md

+6
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,12 @@ Hugo passes these CLI flags when calling the Pandoc executable:
111111
--mathjax
112112
```
113113

114+
If your Pandoc has version 2.11 or later, it also passes this CLI flag:
115+
116+
```text
117+
--citeproc
118+
```
119+
114120
[Pandoc]: https://pandoc.org/
115121

116122
### reStructuredText

Diff for: markup/pandoc/convert.go

+47-1
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,12 @@
1515
package pandoc
1616

1717
import (
18+
"bytes"
19+
"sync"
20+
1821
"github.com/gohugoio/hugo/common/hexec"
1922
"github.com/gohugoio/hugo/htesting"
2023
"github.com/gohugoio/hugo/identity"
21-
2224
"github.com/gohugoio/hugo/markup/converter"
2325
"github.com/gohugoio/hugo/markup/internal"
2426
)
@@ -64,6 +66,9 @@ func (c *pandocConverter) getPandocContent(src []byte, ctx converter.DocumentCon
6466
return src, nil
6567
}
6668
args := []string{"--mathjax"}
69+
if supportsCitations(c.cfg) {
70+
args = append(args[:], "--citeproc")
71+
}
6772
return internal.ExternallyRenderContent(c.cfg, ctx, src, binaryName, args)
6873
}
6974

@@ -76,6 +81,47 @@ func getPandocBinaryName() string {
7681
return ""
7782
}
7883

84+
var pandocSupportsCiteprocOnce sync.Once
85+
var pandocSupportsCiteproc bool
86+
87+
// getPandocSupportsCiteproc runs a dump-args to determine if pandoc knows the --citeproc argument
88+
func getPandocSupportsCiteproc(cfg converter.ProviderConfig) (bool, error) {
89+
var err error
90+
91+
pandocSupportsCiteprocOnce.Do(func() {
92+
argsv := []any{"--dump-args", "--citeproc"}
93+
94+
var out bytes.Buffer
95+
argsv = append(argsv, hexec.WithStdout(&out))
96+
97+
cmd, err := cfg.Exec.New(pandocBinary, argsv...)
98+
if err != nil {
99+
cfg.Logger.Errorf("Could not call pandoc: %v", err)
100+
pandocSupportsCiteproc = false
101+
return
102+
}
103+
104+
err = cmd.Run()
105+
if err != nil {
106+
cfg.Logger.Errorf("%s --dump-args --citeproc: %v", pandocBinary, err)
107+
pandocSupportsCiteproc = false
108+
return
109+
}
110+
pandocSupportsCiteproc = true
111+
})
112+
113+
return pandocSupportsCiteproc, err
114+
}
115+
116+
// supportsCitations returns true if citeproc is available
117+
func supportsCitations(cfg converter.ProviderConfig) bool {
118+
if Supports() {
119+
supportsCiteproc, err := getPandocSupportsCiteproc(cfg)
120+
return supportsCiteproc && err == nil
121+
}
122+
return false
123+
}
124+
79125
// Supports returns whether Pandoc is installed on this computer.
80126
func Supports() bool {
81127
hasBin := getPandocBinaryName() != ""

Diff for: markup/pandoc/convert_test.go

+102-4
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ import (
2525
qt "github.com/frankban/quicktest"
2626
)
2727

28-
func TestConvert(t *testing.T) {
28+
func setupTestConverter(t *testing.T) (*qt.C, converter.Converter, converter.ProviderConfig) {
2929
if !Supports() {
3030
t.Skip("pandoc not installed")
3131
}
@@ -34,11 +34,109 @@ func TestConvert(t *testing.T) {
3434
var err error
3535
sc.Exec.Allow, err = security.NewWhitelist("pandoc")
3636
c.Assert(err, qt.IsNil)
37-
p, err := Provider.New(converter.ProviderConfig{Exec: hexec.New(sc, "", loggers.NewDefault()), Logger: loggers.NewDefault()})
37+
cfg := converter.ProviderConfig{Exec: hexec.New(sc, "", loggers.NewDefault()), Logger: loggers.NewDefault()}
38+
p, err := Provider.New(cfg)
3839
c.Assert(err, qt.IsNil)
3940
conv, err := p.New(converter.DocumentContext{})
4041
c.Assert(err, qt.IsNil)
41-
b, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
42+
return c, conv, cfg
43+
}
44+
45+
func TestConvert(t *testing.T) {
46+
c, conv, _ := setupTestConverter(t)
47+
output, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
48+
c.Assert(err, qt.IsNil)
49+
c.Assert(string(output.Bytes()), qt.Equals, "<p>testContent</p>\n")
50+
}
51+
52+
func runCiteprocTest(t *testing.T, content string, expectContained []string, expectNotContained []string) {
53+
c, conv, cfg := setupTestConverter(t)
54+
if !supportsCitations(cfg) {
55+
t.Skip("pandoc does not support citations")
56+
}
57+
output, err := conv.Convert(converter.RenderContext{Src: []byte(content)})
4258
c.Assert(err, qt.IsNil)
43-
c.Assert(string(b.Bytes()), qt.Equals, "<p>testContent</p>\n")
59+
for _, expected := range expectContained {
60+
c.Assert(string(output.Bytes()), qt.Contains, expected)
61+
}
62+
for _, notExpected := range expectNotContained {
63+
c.Assert(string(output.Bytes()), qt.Not(qt.Contains), notExpected)
64+
}
65+
}
66+
67+
func TestGetPandocSupportsCiteprocCallTwice(t *testing.T) {
68+
c, _, cfg := setupTestConverter(t)
69+
70+
supports1, err1 := getPandocSupportsCiteproc(cfg)
71+
supports2, err2 := getPandocSupportsCiteproc(cfg)
72+
c.Assert(supports1, qt.Equals, supports2)
73+
c.Assert(err1, qt.IsNil)
74+
c.Assert(err2, qt.IsNil)
75+
}
76+
77+
func TestCiteprocWithHugoMeta(t *testing.T) {
78+
content := `
79+
---
80+
title: Test
81+
published: 2022-05-30
82+
---
83+
testContent
84+
`
85+
expected := []string{"testContent"}
86+
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
87+
runCiteprocTest(t, content, expected, unexpected)
88+
}
89+
90+
func TestCiteprocWithPandocMeta(t *testing.T) {
91+
content := `
92+
---
93+
---
94+
---
95+
...
96+
testContent
97+
`
98+
expected := []string{"testContent"}
99+
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
100+
runCiteprocTest(t, content, expected, unexpected)
101+
}
102+
103+
func TestCiteprocWithBibliography(t *testing.T) {
104+
content := `
105+
---
106+
---
107+
---
108+
bibliography: testdata/bibliography.bib
109+
...
110+
testContent
111+
`
112+
expected := []string{"testContent"}
113+
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
114+
runCiteprocTest(t, content, expected, unexpected)
115+
}
116+
117+
func TestCiteprocWithExplicitCitation(t *testing.T) {
118+
content := `
119+
---
120+
---
121+
---
122+
bibliography: testdata/bibliography.bib
123+
...
124+
@Doe2022
125+
`
126+
expected := []string{"Doe", "Mustermann", "2022", "Treatise"}
127+
runCiteprocTest(t, content, expected, []string{})
128+
}
129+
130+
func TestCiteprocWithNocite(t *testing.T) {
131+
content := `
132+
---
133+
---
134+
---
135+
bibliography: testdata/bibliography.bib
136+
nocite: |
137+
@*
138+
...
139+
`
140+
expected := []string{"Doe", "Mustermann", "2022", "Treatise"}
141+
runCiteprocTest(t, content, expected, []string{})
44142
}

Diff for: markup/pandoc/testdata/bibliography.bib

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
@article{Doe2022,
2+
author = "Jane Doe and Max Mustermann",
3+
title = "A Treatise on Hugo Tests",
4+
journal = "Hugo Websites",
5+
year = "2022",
6+
}

0 commit comments

Comments
 (0)