-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Initial checklist
- I read the support docs
- I read the contributing guide
- I agree to follow the code of conduct
- I searched issues and couldn’t find anything (or linked relevant results below)
Affected packages and versions
micromark-extension-frontmatter v1.0.0
Link to runnable example
https://codesandbox.io/s/hopeful-matan-nybskz?file=%2Fsrc%2Findex.js
Steps to reproduce
I have a workload where I need to transform Markdown documents that can also include frontmatters to a syntax tree, using parse() on the remark processor.
It breaks when that said YAML frontmatter uses Jekyll-style delimiters (three dashes both on top and on the bottom of the frontmatter). It works, however, when I use Pandoc-style delimiters (three dashes on top, three dots on the bottom). Usage of the plugin then disables the processor's ability to detect lists properly.
Here are the steps to reproduce. Above I have linked a sandbox that is already properly set up. In there, simply exchange the end-fence for the frontmatter between dashes and dots and observe the difference in produced syntax trees.
Processor setup
export function md2ast (markdown: string): Root {
return remark()
.use(remarkFrontmatter, [
// Either Pandoc-style frontmatters ...
{ type: 'yaml', fence: { open: '---', close: '...' } },
// ... or Jekyll/Static site generators-style frontmatters.
{ type: 'yaml', fence: { open: '---', close: '---' } }
])
.use(remarkMath)
.parse(markdown)
}Note that while I am being verbose in explicitly stating the delimiters, exchanging the second definition to simply yaml does not change the effect.
Test document
Use this document and run it through said pipeline. You will notice that, when you use three dashes to end the frontmatter, the list is not correctly detected, whereas, when you exchange that with three dots (Pandoc-style frontmatter), it will be correctly detected.
---
title: "The Devil is in the Details: Ethical Pitfalls in the Sociological use of NLP techniques"
date: 2022-12-05
id: 20221205151411
author: Hendrik Erz
---
# Export Link Removal
Export this file into any format to test out the corresponding LUA filter.
* History of NLP in Sociology (from Mosteller and Wallace to today)
* What methods are in use? (three types: Bayes/simple such as Logistic Regression; Machine Learning such as LDA/random forests; deep learning such as LSTM/BERT)
* What are they being used for?
* This is a worngly written second-indended word
* Where do these methods come from?
* Are there already ethical notes around? Or don’t they care?Expected behavior
Correct behavior with Pandoc-style frontmatter
If you exchange the three dashes with three dots in the test-document, it works as expected.
This is the correct Syntax Tree:
{
"type": "root",
"children": [
{
"type": "yaml",
"value": "title: \"The Devil is in the Details: Ethical Pitfalls in the Sociological use of NLP techniques\"\ndate: 2022-12-05\nid: 20221205151411\nauthor: Hendrik Erz",
"position": {
"start": { "line": 1, "column": 1, "offset": 0 },
"end": { "line": 6, "column": 4, "offset": 160 }
}
},
{
"type": "heading",
"depth": 1,
"children": [
{
"type": "text",
"value": "Export Link Removal",
"position": {
"start": { "line": 8, "column": 3, "offset": 164 },
"end": { "line": 8, "column": 22, "offset": 183 }
}
}
],
"position": {
"start": { "line": 8, "column": 1, "offset": 162 },
"end": { "line": 8, "column": 22, "offset": 183 }
}
},
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "Export this file into any format to test out the corresponding LUA filter.",
"position": {
"start": { "line": 10, "column": 1, "offset": 185 },
"end": { "line": 10, "column": 75, "offset": 259 }
}
}
],
"position": {
"start": { "line": 10, "column": 1, "offset": 185 },
"end": { "line": 10, "column": 75, "offset": 259 }
}
},
{
"type": "list",
"ordered": false,
"start": null,
"spread": false,
"children": [
{
"type": "listItem",
"spread": false,
"checked": null,
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "History of NLP in Sociology (from Mosteller and Wallace to today)",
"position": {
"start": { "line": 12, "column": 3, "offset": 263 },
"end": { "line": 12, "column": 68, "offset": 328 }
}
}
],
"position": {
"start": { "line": 12, "column": 3, "offset": 263 },
"end": { "line": 12, "column": 68, "offset": 328 }
}
}
],
"position": {
"start": { "line": 12, "column": 1, "offset": 261 },
"end": { "line": 12, "column": 68, "offset": 328 }
}
},
{
"type": "listItem",
"spread": false,
"checked": null,
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "What methods are in use? (three types: Bayes/simple such as Logistic Regression; Machine Learning such as LDA/random forests; deep learning such as LSTM/BERT)",
"position": {
"start": { "line": 13, "column": 3, "offset": 331 },
"end": { "line": 13, "column": 161, "offset": 489 }
}
}
],
"position": {
"start": { "line": 13, "column": 3, "offset": 331 },
"end": { "line": 13, "column": 161, "offset": 489 }
}
}
],
"position": {
"start": { "line": 13, "column": 1, "offset": 329 },
"end": { "line": 13, "column": 161, "offset": 489 }
}
},
{
"type": "listItem",
"spread": false,
"checked": null,
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "What are they being used for?",
"position": {
"start": { "line": 14, "column": 3, "offset": 492 },
"end": { "line": 14, "column": 32, "offset": 521 }
}
}
],
"position": {
"start": { "line": 14, "column": 3, "offset": 492 },
"end": { "line": 14, "column": 32, "offset": 521 }
}
},
{
"type": "list",
"ordered": false,
"start": null,
"spread": false,
"children": [
{
"type": "listItem",
"spread": false,
"checked": null,
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "This is a worngly written second-indended word",
"position": {
"start": { "line": 15, "column": 7, "offset": 528 },
"end": { "line": 15, "column": 53, "offset": 574 }
}
}
],
"position": {
"start": { "line": 15, "column": 7, "offset": 528 },
"end": { "line": 15, "column": 53, "offset": 574 }
}
}
],
"position": {
"start": { "line": 15, "column": 5, "offset": 526 },
"end": { "line": 15, "column": 53, "offset": 574 }
}
}
],
"position": {
"start": { "line": 15, "column": 5, "offset": 526 },
"end": { "line": 15, "column": 53, "offset": 574 }
}
}
],
"position": {
"start": { "line": 14, "column": 1, "offset": 490 },
"end": { "line": 15, "column": 53, "offset": 574 }
}
},
{
"type": "listItem",
"spread": false,
"checked": null,
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "Where do these methods come from?",
"position": {
"start": { "line": 16, "column": 3, "offset": 577 },
"end": { "line": 16, "column": 36, "offset": 610 }
}
}
],
"position": {
"start": { "line": 16, "column": 3, "offset": 577 },
"end": { "line": 16, "column": 36, "offset": 610 }
}
}
],
"position": {
"start": { "line": 16, "column": 1, "offset": 575 },
"end": { "line": 16, "column": 36, "offset": 610 }
}
},
{
"type": "listItem",
"spread": false,
"checked": null,
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "Are there already ethical notes around? Or don’t they care?",
"position": {
"start": { "line": 17, "column": 3, "offset": 613 },
"end": { "line": 17, "column": 62, "offset": 672 }
}
}
],
"position": {
"start": { "line": 17, "column": 3, "offset": 613 },
"end": { "line": 17, "column": 62, "offset": 672 }
}
}
],
"position": {
"start": { "line": 17, "column": 1, "offset": 611 },
"end": { "line": 17, "column": 62, "offset": 672 }
}
}
],
"position": {
"start": { "line": 12, "column": 1, "offset": 261 },
"end": { "line": 17, "column": 62, "offset": 672 }
}
}
],
"position": {
"start": { "line": 1, "column": 1, "offset": 0 },
"end": { "line": 18, "column": 1, "offset": 673 }
}
}Actual behavior
Incorrect behavior with Jekyll-style frontmatter
If you just run the above test document (with three dashes), it produces a wrong syntax tree. Observe how it does not detect the list properly.
{
"type": "root",
"children": [
{
"type": "yaml",
"value": "title: \"The Devil is in the Details: Ethical Pitfalls in the Sociological use of NLP techniques\"\ndate: 2022-12-05\nid: 20221205151411\nauthor: Hendrik Erz",
"position": {
"start": { "line": 1, "column": 1, "offset": 0 },
"end": { "line": 6, "column": 4, "offset": 160 }
}
},
{
"type": "heading",
"depth": 1,
"children": [
{
"type": "text",
"value": "Export Link Removal",
"position": {
"start": { "line": 8, "column": 3, "offset": 164 },
"end": { "line": 8, "column": 22, "offset": 183 }
}
}
],
"position": {
"start": { "line": 8, "column": 1, "offset": 162 },
"end": { "line": 8, "column": 22, "offset": 183 }
}
},
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "Export this file into any format to test out the corresponding LUA filter.",
"position": {
"start": { "line": 10, "column": 1, "offset": 185 },
"end": { "line": 10, "column": 75, "offset": 259 }
}
}
],
"position": {
"start": { "line": 10, "column": 1, "offset": 185 },
"end": { "line": 10, "column": 75, "offset": 259 }
}
},
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "* History of NLP in Sociology (from Mosteller and Wallace to today)\n* What methods are in use? (three types: Bayes/simple such as Logistic Regression; Machine Learning such as LDA/random forests; deep learning such as LSTM/BERT)\n* What are they being used for?\n* This is a worngly written second-indended word\n* Where do these methods come from?\n* Are there already ethical notes around? Or don’t they care?",
"position": {
"start": { "line": 12, "column": 1, "offset": 261 },
"end": { "line": 17, "column": 62, "offset": 672 }
}
}
],
"position": {
"start": { "line": 12, "column": 1, "offset": 261 },
"end": { "line": 17, "column": 62, "offset": 672 }
}
}
],
"position": {
"start": { "line": 1, "column": 1, "offset": 0 },
"end": { "line": 18, "column": 1, "offset": 673 }
}
}Runtime
Node v16, Other (please specify in steps to reproduce)
Package manager
yarn v1
OS
macOS
Build and bundle tools
Webpack