escape lone carriage return in _escape_char by alhudz · Pull Request #1462 · collective/icalendar

alhudz · 2026-06-16T06:15:02Z

Repro: add a SUMMARY (or any TEXT value) built from untrusted input that contains a lone \r, e.g. event.add("SUMMARY", "safe\rINJECTED:evil"), then call to_ical().
Expected: the carriage return is escaped to \n like other line breaks.
Actual: the raw \r survives into the content line (SUMMARY:safe\rINJECTED:evil), so a lenient consumer that treats a bare CR as a line break sees an injected property.
Cause: _escape_char escapes \r\n and \n but not a lone \r; the parameter escaper rfc_6868_escape already maps \r to ^n, the TEXT path did not.
Fix: escape a lone \r as \n after the existing replacements.

The only fixture that round-tripped a raw CR used non-standard \r\r\n line endings; corrected it to \r\n.

📚 Documentation preview 📚: https://icalendar--1462.org.readthedocs.build/en/1462/

read-the-docs-community · 2026-06-16T06:16:03Z

Documentation build overview

📚 icalendar | 🛠️ Build #33229923 | 📁 Comparing 3686e75 against latest (5ad270b)

🔍 Preview build

2 files changed

± 404.html
± _modules/icalendar/parser/string.html

stevepiercy

This looks good, with only a minor tweaks. Would you please take care of them? Thank you!

stevepiercy · 2026-06-16T09:58:14Z

@@ -0,0 +1 @@
+Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz


icalendar uses American English spelling and style, per https://icalendar.readthedocs.io/en/stable/contribute/documentation/style-guide.html#spelling-and-grammar

Suggested change

Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz

Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialized content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz

Done, switched it to serialized.

stevepiercy · 2026-06-16T10:01:15Z

+    ],
+)
+def test_escape_char_escapes_lone_carriage_return(value, expected):
+    """A lone ``\\r`` must be escaped, not left raw in the content line."""


Use r whenever you escape characters in a docstring, and adjust content to minimize escaping.

Suggested change

"""A lone ``\\r`` must be escaped, not left raw in the content line."""

r"""A lone ``\r`` must be escaped, not left raw in the content line."""

Done, made it a raw r""" docstring so the backslash-r doesn't need doubling.

niccokunzmann · 2026-06-16T12:20:36Z

Does RFC 5545 mention escaping of these or is it making them invalid as input?
I think the 6868 refers to parameters only? Could you help me understand that in these contexts?

alhudz · 2026-06-16T15:22:45Z

RFC 5545 doesn't give a bare \r its own escape. The TEXT type (§3.3.11) defines ESCAPED-CHAR = "\\" / "\;" / "\," / "\N" / "\n", so the only way a line break appears inside a value is the \n/\N escape. A literal CR is a control character (CONTROL = %x00-08 / %x0A-1F / %x7F, and CR is %x0D), and TSAFE-CHAR is everything except those controls. So a raw \r in a TEXT value is invalid input rather than something with an escape form.

You're right that 6868 is parameters only (caret encoding for parameter values), so it doesn't govern TEXT. I only mentioned it as the precedent that the parameter path already drops the raw \r; the TEXT path didn't.

On the to_ical side this is a serialisation question, not validation: _escape_char already normalises \r\n and \n to the \n escape instead of rejecting, so escaping a lone \r the same way keeps that behaviour and guarantees we never emit a raw control char into the stream. The alternative would be to raise on the invalid input, but that's a much bigger behaviour change for a path that currently sanitises.

angatha · 2026-06-16T18:39:30Z

Ok, you added another "nive to have" case that will instead of writing an invalid sequence interpret '\r' as a newline, which is escaped as r'\n'. Sounds good to me.

vText('\n').to_ical().hex() # '5c6e'
vText('\r').to_ical().hex() # now also '5c6e' instead of '0d', whcih would be invalid for TEXT

alhudz · 2026-06-17T07:13:49Z

Right, that's exactly it. A lone \r now normalises to the \n escape the same way \r\n and \n already did, so a TEXT value never carries a raw 0d into the stream.

stevepiercy · 2026-06-17T09:11:31Z

I want to get clarity on the RFC before we go further.

From RFC 5545, § 3.3.11, Text:

       TSAFE-CHAR = WSP / %x21 / %x23-2B / %x2D-39 / %x3C-5B /
                    %x5D-7E / NON-US-ASCII
          ; Any character except CONTROLs not needed by the current
          ; character set, DQUOTE, ";", ":", "\", ","

%x0D is not WSP (see below), it's not within those hexadecimal ranges, and it's not NON-US-ASCII.

It is defined as CONTROL, per RFC 5545, § 3.1, Content Lines.

     CONTROL       = %x00-08 / %x0A-1F / %x7F
     ; All the controls except HTAB

%x0D is not needed by the current character set, or any character set, AFAIK, which is consistent with the syntax for TSAFE-CHAR (text-safe character).

For a definition of WSP, see RFC 5234, Appendix B.1.

         WSP            =  SP / HTAB
                                ; white space

AFAICT, RFC 5545 and its subsequent updates don't mention how to treat a lone CR or its equivalents such as %x0D. Therefore I'd say that no treatment is required and it should be left as is.

Although it may seem reasonable to convert \r to \n, I'm afraid it wouldn't be compliant with the RFC. We could ask for clarity by writing the standards group.

@alhudz did you use AI to generate any part of this issue, including its description and your subsequent comments? If so, then you must follow our Responsible AI use policy. Please let us know. Thank you!

angatha · 2026-06-17T09:26:10Z

Critical is that this influences a round trip from ical to python object back to python. Invalid ical containing '\n' without a following '\n' now becomes valid, but the same is true for one with the sequence '\r\n'.
I see no problem in making the python interface for creating xText instances forgiving, as it is already for '\r\n'.
However, what does the spec say about parsing invalid input?

stevepiercy · 2026-06-17T10:18:47Z

@angatha thanks for talking this through with me. I think I'm OK with replacing a lone \r with \n as long as we make the distinction between RFC specification and Python implementation. We blur that line (break). hahaha! For example:

icalendar/src/icalendar/parser/string.py

Lines 9 to 32 in 24b7db5

    
           def _escape_char(text: str | bytes) -> str: 
        
               r"""Format value according to iCalendar TEXT escaping rules. 
        
               Escapes special characters in text values according to :rfc:`5545#section-3.3.11` 
        
               rules. 
        
               The order of replacements matters to avoid double-escaping. 
        
               Parameters: 
        
                   text: The text to escape. 
        
               Returns: 
        
                   The escaped text with special characters escaped. 
        
               Note: 
        
                   The replacement order is critical: 
        
                   1. ``\N`` -> ``\n`` (normalize newlines to lowercase) 
        
                   2. ``\`` -> ``\\`` (escape backslashes) 
        
                   3. ``;`` -> ``\;`` (escape semicolons) 
        
                   4. ``,`` -> ``\,`` (escape commas) 
        
                   5. ``\r\n`` -> ``\n`` (normalize line endings) 
        
                   6. ``"\n"`` -> ``r"\n"`` (transform a newline character to a literal, or raw, 
        
                      newline character) 
        
               """

Step 5 is a Python implementation not explicitly specified in the RFC. However, in the Description section of RFC 5545, § 3.3.11, Text:

      An intentional formatted text line break MUST only be included in
      a "TEXT" property value by representing the line break with the
      character sequence of BACKSLASH, followed by a LATIN SMALL LETTER
      N or a LATIN CAPITAL LETTER N, that is "\n" or "\N".

I read that as meaning \r is implicitly "an intentional formatted text line break."

Here's the only reference I found for how to handle invalid property values, assuming that's what you mean by invalid input.

https://datatracker.ietf.org/doc/html/rfc7529.html#section-4.1

niccokunzmann · 2026-06-17T11:04:19Z

Thanks for all the research. We have several issues in that sense.

\r, \r\n and \n should all be \n in the result of vText.to_ical
in the parameters, it seems that this is the same intention, we have the RFC 6868. Does that mention \r?
vText to vText comparism is another matter. We can assume though that \r will never occur inside of vText and replace it accordingly, so it is never in the content, see point 1.

It is good not to forget these points,linkk this PR in the changelog. Maybe good to open an issue, to allow the checkmarking of independent edits.

RFC 6868:

the character sequence ^n (U+005E, U+006E) is decoded into an
appropriate formatted line break according to the type of system
being used

As a result, we should test different outcomes based on the platform (new issue) and also that \r, \r\n and \n are converted to ^n.

That is something to keep track of.

alhudz · 2026-06-18T12:29:29Z

On the AI question: I lean on one a bit for research and to double-check my RFC reading, but the analysis and the patch here are mine.

On the change: I pushed a docstring note in _escape_char to keep the RFC vs implementation line clear (per @stevepiercy). Steps 5 to 7 are line-ending normalisation, an implementation convenience, since RFC 5545 only gives \n/\N an escape and never a bare \r. Also added the PR link to the news fragment.

@niccokunzmann on your points:

Done here. \r, \r\n and \n all serialise to the \n escape in vText.to_ical.
RFC 6868's encoding table only names the formatted line break (NL) as ^n; it doesn't single out a bare \r. But rfc_6868_escape already maps \r, \r\n and \n all to ^n, so the parameter path already does this. There's just no test pinning it, and ^n decodes back to os.linesep, so that round trip is platform dependent.
Agreed, vText to vText comparison is separate; with point 1 a raw \r never reaches the content anyway.

This PR only touches the serialise direction (to_ical), not parsing, so the invalid-input round trip @angatha raised is out of scope here.

Opened #1477 to track the parameter tests, the platform-dependent round trip, and point 3 so the independent edits can be checked off.

niccokunzmann · 2026-06-18T14:20:09Z

Thanks! Can you have a look at the failing fuzz tests? Please ask for help - it might mean that

a new tests needs adding to fix the fuzz error. I think, how to do this is in the docs
or the fuzz testing needs changing because the error message changed.

gaoflow · 2026-06-19T19:23:31Z

I traced the CIFuzz failure. The failing calendar is the base64 line immediately before libFuzzer: fuzz target exited:

QkVHSU46VlRJTUVaT05FClRaSUQ6UwwMDQwMDAwLCkVORDpWVElNRVpPTkUK

That decodes to an invalid VTIMEZONE containing a bare \r inside the TZID line:

b"BEGIN:VTIMEZONE\nTZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b\nEND:VTIMEZONE\n"

It reaches dateutil.tz.tzical, which raises ValueError: not enough values to unpack (expected 2, got 1). This looks like a valid fuzz rejection rather than a parser behavior change needed in this PR.

I do not have push access to the fork, but this minimal patch made the local fuzz regression pass:

diff --git a/src/icalendar/tests/fuzzed/__init__.py b/src/icalendar/tests/fuzzed/__init__.py
@@
     "Invalid month:",
     "must have exactly",  # vCard field count validation (ADR, N)
     "must have at least",  # vCard ORG minimum field validation
+    "not enough values to unpack",  # dateutil rejects malformed VTIMEZONE lines
 ]

diff --git a/src/icalendar/tests/fuzzed/test_fuzzed_calendars.py b/src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
@@
 def test_fuzz_v1(fuzz_v1_calendar_path):
@@
         should_walk=True,
     )
+
+
+def test_fuzz_v1_malformed_vtimezone_linebreak():
+    """Malformed VTIMEZONE content lines are valid fuzzing rejections."""
+    calendar = b"BEGIN:VTIMEZONE\nTZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b\nEND:VTIMEZONE\n"
+    fuzz_v1_calendar(
+        icalendar.cal.calendar.Calendar.from_ical,
+        calendar,
+        multiple=True,
+        should_walk=True,
+    )

Verified locally on pr-1462:

PYTHONPATH=src python -m pytest -q src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
# 3 passed in 0.06s

python -m ruff check src/icalendar/tests/fuzzed/__init__.py src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
# All checks passed!

python -m ruff format --check src/icalendar/tests/fuzzed/__init__.py src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
# 2 files already formatted

git diff --check
# passed

alhudz · 2026-06-20T16:44:00Z

Traced it, and it's exactly the "error message changed" case you flagged, not something unrelated.

The fuzz input is a garbage VTIMEZONE with a bare \r in the TZID:

BEGIN:VTIMEZONE
TZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b
END:VTIMEZONE

dateutil's tzical rejects it both before and after this PR, but the message differs:

before: ValueError: at least one component is needed (matched the allowlist via "component")
after: ValueError: not enough values to unpack (expected 2, got 1)

With the lone \r now escaped, it no longer splits the TZID line for dateutil, so it parses a bit further and trips on a different malformed line. Both are valid rejections of invalid input; the new substring just wasn't in the allowlist.

Followed the fuzz-testing docs: added fuzz_testcase_vtimezone_lone_cr.ics with the decoded input and added "not enough values to unpack" to _value_error_matches. src/icalendar/tests/fuzzed/ passes locally (3 cases).

Thanks @gaoflow, that lines up with what I found.

escape lone carriage return in _escape_char

4a2315d

alhudz requested review from SashankBhamidi, angatha, niccokunzmann and stevepiercy as code owners June 16, 2026 06:15

stevepiercy requested changes Jun 16, 2026

View reviewed changes

use American spelling in news and raw docstring in test

14d8ca6

stevepiercy mentioned this pull request Jun 17, 2026

reject CR and LF in vUri, vCalAddress and vInline values #1468

Open

4 tasks

alhudz mentioned this pull request Jun 17, 2026

Consolidate the handling of lone \r and platform-dependent line breaks #1471

Open

11 tasks

note RFC vs implementation in _escape_char and link PR in news

33a5796

alhudz mentioned this pull request Jun 18, 2026

Track consistent CR/LF normalisation and tests for TEXT and parameter values #1477

Open

4 tasks

add fuzz test case for malformed VTIMEZONE content line

3686e75

		@@ -0,0 +1 @@
		Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz

	"""A lone ``\\r`` must be escaped, not left raw in the content line."""
	r"""A lone ``\r`` must be escaped, not left raw in the content line."""

Uh oh!

Conversation

alhudz commented Jun 16, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

read-the-docs-community Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

stevepiercy left a comment

Choose a reason for hiding this comment

Uh oh!

stevepiercy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

alhudz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

stevepiercy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

alhudz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

niccokunzmann commented Jun 16, 2026

Uh oh!

alhudz commented Jun 16, 2026

Uh oh!

angatha commented Jun 16, 2026

Uh oh!

alhudz commented Jun 17, 2026

Uh oh!

stevepiercy commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angatha commented Jun 17, 2026

Uh oh!

stevepiercy commented Jun 17, 2026

Uh oh!

niccokunzmann commented Jun 17, 2026

Uh oh!

alhudz commented Jun 18, 2026

Uh oh!

niccokunzmann commented Jun 18, 2026

Uh oh!

gaoflow commented Jun 19, 2026

Uh oh!

alhudz commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alhudz commented Jun 16, 2026 •

edited by github-actions Bot

Loading

read-the-docs-community Bot commented Jun 16, 2026 •

edited

Loading

stevepiercy commented Jun 17, 2026 •

edited

Loading