Symptom
For some articles whose wikitext contains a {{multiple issues}}
(or similar) template wrapping nested templates separated by
newlines, the WhoColor-extended HTML contains a MediaWiki preview
warning at the top:
Preview warning: Page using Template:Multiple issues with unknown parameter <templatestyles>...</templatestyles><table ...>
Example: en/Icaro at rev 1316552020 (token list captured
2026-05-24). Surfaced as 1/20 articles flagged in the Wiki Experts
course parity suite.
TL;DR
This is not a Rust port bug — our whocolor_wikitext.rs matches
the upstream Python WhoColor.parser.WikiMarkupParser exactly. Both
emit the same <span class="editor-token ...">}}</span> around the
outer }} of {{multiple issues|…}}. Production
(https://wikiwho-api.wmcloud.org) and our deployment
(https://wikiwho-rs.wmcloud.org) return byte-identical HTML modulo
trailing MW parser-cache metadata (server hostname, timestamps,
Lua/CPU timings, Render ID).
So we inherit this bug from upstream. Fixing it would mean
deliberately diverging from production — which we won't do without
explicit reason, per the project's parity-or-die quality bar.
Reproduction
scripts/icaro_compare_prod.py fetches both endpoints for en/Icaro
and confirms:
prod html bytes: 68790
ours html bytes: 68790
prod has "Preview warning": True
ours has "Preview warning": True
prod has "unknown parameter": True
ours has "unknown parameter": True
Total byte diff: 194 bytes, all in the trailing MW parser-cache
footer (server hostname, timestamps, Template:* timing values,
Render ID). The span insertion location and surrounding text are
identical.
Minimal Python repro of the upstream bug (using
WhoColor.parser.WikiMarkupParser directly):
input: "{{outer|{{inner-a|x=1}}\n{{inner-b|y=2}}}}"
output: '{{outer|{{inner-a|x=1}}\n{{inner-b|y=2}}<span class="editor-token token-editor-1" id="token-17">}}</span>'
Without the newline between {{inner-a}} and {{inner-b}}, no
bleed:
input: "{{Infobox |a = {{nest |[[L1]] |[[L2]]}}|b = end}}"
output: '{{Infobox |a = {{nest |[[L1]] |[[L2]]}}|b = end}}'
Root cause (verified by tracing the Python parser)
In WhoColor/parser.py::WikiMarkupParser.__get_next_special_element:
def __get_next_special_element(self):
next_ = {}
for special_markup in SPECIAL_MARKUPS:
found_markup = self.__get_first_regex(special_markup['start_regex'])
if found_markup is not None and \
(not next_ or next_['start'] > found_markup['start']) and \
found_markup['start'] not in self._jumped_elems:
next_ = special_markup
...
__get_first_regex returns the first match of the regex from
_wiki_text_pos. If that one match is in _jumped_elems, the entire
markup type is dropped for this call — the regex is NOT re-searched
past the jumped position.
Concretely on Icaro: when the parser has just descended into the
outer {{multiple issues at substituted position 54, _jumped_elems = {0, 43, 54}. The first __get_next_special_element call inside
that new frame at pos=54:
| markup |
first match from pos=54 |
in jumped? |
becomes candidate? |
{{ template |
pos=54 (the multi-issues {{ itself) |
yes |
no — entire type skipped |
(=+|;) heading |
pos=100 (the = inside |date=April 2016) |
no |
yes |
(WIKICOLORLB)+ |
pos=113 (newline after inner }}) |
no |
yes (but later) |
So next_special_elem becomes the = at pos=100 — deep inside
the inner {{more citations needed}} body. The parser never
notices {{more citations needed at pos=72 (the nested template
open) and never descends into it.
Cascading effect:
- Multi-issues frame iterates tokens 9-17 (
multiple, issues,
|, {{, more, citations, needed, |, date). For each,
next_special.start=100 < token.end is false, so no descent.
- At token-18
= (end=101), descends into the heading-marker =
at pos=100. Single markup, no_jump=True. Consumes the =.
Returns at pos=101.
- Re-derives
next_special from pos=101 — now finds linebreak at
pos=113.
- Continues. Writes tokens 19 (
april), 20 (2016), 21 (}} —
the inner }} of more-citations-needed). Cursor → 113.
- At token-22
{{ (end=126, the original-research open),
special_elem_end.end=113 < token.end=126 triggers. The
multi-issues frame returns at pos=113, treating the inner }}
as its own end.
- Top-level frame resumes at pos=113. Descends into WIKICOLORLB
(linebreak), then into {{original research…}} normally.
- After original-research exits at pos=161, top-level processes
token-31 }} (the outer multi-issues close) as a regular
token at top-level with add_spans=True — wrapping it in a
<span class="editor-token …" id="token-31">…</span>. That's
the bleed.
Why the no-newline (Curzon) case works: same find_next_special
returns a wrong markup at first (an = inside the inner template),
but after that markup's recursion exits at pos=14, the next
find_next_special_markup from pos=14 correctly finds {{ at
pos=15 (NOT in jumped_elems, NOT shadowed by an earlier {{
match). The parser then descends into the inner template normally.
The newline case fails because after the = recursion exits at
pos=101, the cursor has already passed the inner template's open
at pos=72, so it's lost forever.
To fix upstream
__get_next_special_element needs to find the first match not in
_jumped_elems rather than the first match (and reject the whole
markup type if jumped). The minimal change would be using
re.finditer and skipping jumped positions:
for special_markup in SPECIAL_MARKUPS:
for m in special_markup['start_regex'].finditer(self.wiki_text, self._wiki_text_pos):
if m.start() not in self._jumped_elems:
found_markup = {'str': m.group(), 'start': m.start()}
break
else:
continue
if not next_ or next_['start'] > found_markup['start']:
next_ = special_markup
next_['start'] = found_markup['start']
...
This would make the Icaro case work, and shouldn't regress the
no-newline case since the existing first-match behavior is
preserved for any markup whose first match isn't jumped.
Why we're not fixing it here
Per CLAUDE.md:
parity-or-die. Diverging from upstream silently is exactly the
class of change the parity corpus is designed to catch — except
this case is a divergence we'd be introducing intentionally to
fix a bug that all consumers tolerate today. Not worth the risk of
shifting token IDs or breaking a consumer that relies on the
current behavior, unless an actual consumer surfaces a problem.
Right escalation if we ever want to fix it: contribute upstream to
wikimedia/WhoColor (or the
current canonical home) and let the fix flow back.
Reproduction artifacts
/tmp/icaro.wt (8,100 bytes) — captured wikitext
/tmp/icaro_tokens.json (2,144 tokens) — token list from algorithm
scripts/icaro_trace.py — runs the Python upstream parser on the
captured input and shows the bled span in the output
scripts/icaro_compare_prod.py — fetches both production and our
deployment to confirm byte-level parity
scripts/icaro_run_python.py — minimal Curzon-vs-Icaro synthetic
comparison
scripts/icaro_trace_python.py — monkey-patched Python parser
that logs every __parse_wiki_text / __get_next_special_element
/ __get_special_elem_end call in the multi-issues region
Follow-up notes
The current parity-suite (/tmp/whocolor_parity_suite.py) flags
Preview warning and unknown parameter only in our HTML, not
in prod's. A future tightening would flag only asymmetric
warnings (present in ours but not prod) — orthogonal to this issue.
The synthetic regression test
whocolor_wikitext::regression_tests::nested_template_inside_template_does_not_emit_spans
covers the no-newline Curzon-Ultimatum shape and continues to assert
that correctly. This issue is about the newline-separated variant
only.
crates/wikiwho-server/tests/icaro_repro.rs (gated #[ignore]) was
written assuming the bleed was a Rust port bug; its assertion
(span_count == 0) is wrong vs production behavior. Should be
deleted or rewritten to assert parity-with-production (1 span at
token-31) once we close this issue.
Symptom
For some articles whose wikitext contains a
{{multiple issues}}(or similar) template wrapping nested templates separated by
newlines, the WhoColor-extended HTML contains a MediaWiki preview
warning at the top:
Example: en/Icaro at rev 1316552020 (token list captured
2026-05-24). Surfaced as 1/20 articles flagged in the Wiki Experts
course parity suite.
TL;DR
This is not a Rust port bug — our
whocolor_wikitext.rsmatchesthe upstream Python
WhoColor.parser.WikiMarkupParserexactly. Bothemit the same
<span class="editor-token ...">}}</span>around theouter
}}of{{multiple issues|…}}. Production(
https://wikiwho-api.wmcloud.org) and our deployment(
https://wikiwho-rs.wmcloud.org) return byte-identical HTML modulotrailing MW parser-cache metadata (server hostname, timestamps,
Lua/CPU timings, Render ID).
So we inherit this bug from upstream. Fixing it would mean
deliberately diverging from production — which we won't do without
explicit reason, per the project's parity-or-die quality bar.
Reproduction
scripts/icaro_compare_prod.pyfetches both endpoints for en/Icaroand confirms:
Total byte diff: 194 bytes, all in the trailing MW parser-cache
footer (server hostname, timestamps, Template:* timing values,
Render ID). The span insertion location and surrounding text are
identical.
Minimal Python repro of the upstream bug (using
WhoColor.parser.WikiMarkupParserdirectly):Without the newline between
{{inner-a}}and{{inner-b}}, nobleed:
Root cause (verified by tracing the Python parser)
In
WhoColor/parser.py::WikiMarkupParser.__get_next_special_element:__get_first_regexreturns the first match of the regex from_wiki_text_pos. If that one match is in_jumped_elems, the entiremarkup type is dropped for this call — the regex is NOT re-searched
past the jumped position.
Concretely on Icaro: when the parser has just descended into the
outer
{{multiple issuesat substituted position 54,_jumped_elems = {0, 43, 54}. The first__get_next_special_elementcall insidethat new frame at
pos=54:{{template{{itself)(=+|;)heading=inside|date=April 2016)(WIKICOLORLB)+}})So
next_special_elembecomes the=at pos=100 — deep insidethe inner
{{more citations needed}}body. The parser nevernotices
{{more citations neededat pos=72 (the nested templateopen) and never descends into it.
Cascading effect:
multiple,issues,|,{{,more,citations,needed,|,date). For each,next_special.start=100 < token.endis false, so no descent.=(end=101), descends into the heading-marker=at pos=100. Single markup, no_jump=True. Consumes the
=.Returns at pos=101.
next_specialfrom pos=101 — now finds linebreak atpos=113.
april), 20 (2016), 21 (}}—the inner
}}of more-citations-needed). Cursor → 113.{{(end=126, the original-research open),special_elem_end.end=113 < token.end=126triggers. Themulti-issues frame returns at pos=113, treating the inner
}}as its own end.
(linebreak), then into
{{original research…}}normally.token-31
}}(the outer multi-issues close) as a regulartoken at top-level with
add_spans=True— wrapping it in a<span class="editor-token …" id="token-31">…</span>. That'sthe bleed.
Why the no-newline (Curzon) case works: same
find_next_specialreturns a wrong markup at first (an
=inside the inner template),but after that markup's recursion exits at pos=14, the next
find_next_special_markupfrom pos=14 correctly finds{{atpos=15 (NOT in jumped_elems, NOT shadowed by an earlier
{{match). The parser then descends into the inner template normally.
The newline case fails because after the
=recursion exits atpos=101, the cursor has already passed the inner template's open
at pos=72, so it's lost forever.
To fix upstream
__get_next_special_elementneeds to find the first match not in_jumped_elemsrather than the first match (and reject the wholemarkup type if jumped). The minimal change would be using
re.finditerand skipping jumped positions:This would make the Icaro case work, and shouldn't regress the
no-newline case since the existing first-match behavior is
preserved for any markup whose first match isn't jumped.
Why we're not fixing it here
Per CLAUDE.md:
parity-or-die. Diverging from upstream silently is exactly the
class of change the parity corpus is designed to catch — except
this case is a divergence we'd be introducing intentionally to
fix a bug that all consumers tolerate today. Not worth the risk of
shifting token IDs or breaking a consumer that relies on the
current behavior, unless an actual consumer surfaces a problem.
Right escalation if we ever want to fix it: contribute upstream to
wikimedia/WhoColor (or the
current canonical home) and let the fix flow back.
Reproduction artifacts
/tmp/icaro.wt(8,100 bytes) — captured wikitext/tmp/icaro_tokens.json(2,144 tokens) — token list from algorithmscripts/icaro_trace.py— runs the Python upstream parser on thecaptured input and shows the bled span in the output
scripts/icaro_compare_prod.py— fetches both production and ourdeployment to confirm byte-level parity
scripts/icaro_run_python.py— minimal Curzon-vs-Icaro syntheticcomparison
scripts/icaro_trace_python.py— monkey-patched Python parserthat logs every
__parse_wiki_text/__get_next_special_element/
__get_special_elem_endcall in the multi-issues regionFollow-up notes
The current parity-suite (
/tmp/whocolor_parity_suite.py) flagsPreview warningandunknown parameteronly in our HTML, notin prod's. A future tightening would flag only asymmetric
warnings (present in ours but not prod) — orthogonal to this issue.
The synthetic regression test
whocolor_wikitext::regression_tests::nested_template_inside_template_does_not_emit_spanscovers the no-newline Curzon-Ultimatum shape and continues to assert
that correctly. This issue is about the newline-separated variant
only.
crates/wikiwho-server/tests/icaro_repro.rs(gated#[ignore]) waswritten assuming the bleed was a Rust port bug; its assertion
(
span_count == 0) is wrong vs production behavior. Should bedeleted or rewritten to assert parity-with-production (1 span at
token-31) once we close this issue.