Skip to content

escape lone carriage return in _escape_char#1462

Open
alhudz wants to merge 4 commits into
collective:mainfrom
alhudz:text-escape-lone-cr
Open

escape lone carriage return in _escape_char#1462
alhudz wants to merge 4 commits into
collective:mainfrom
alhudz:text-escape-lone-cr

Conversation

@alhudz

@alhudz alhudz commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Repro: add a SUMMARY (or any TEXT value) built from untrusted input that contains a lone \r, e.g. event.add("SUMMARY", "safe\rINJECTED:evil"), then call to_ical().
Expected: the carriage return is escaped to \n like other line breaks.
Actual: the raw \r survives into the content line (SUMMARY:safe\rINJECTED:evil), so a lenient consumer that treats a bare CR as a line break sees an injected property.
Cause: _escape_char escapes \r\n and \n but not a lone \r; the parameter escaper rfc_6868_escape already maps \r to ^n, the TEXT path did not.
Fix: escape a lone \r as \n after the existing replacements.

The only fixture that round-tripped a raw CR used non-standard \r\r\n line endings; corrected it to \r\n.


📚 Documentation preview 📚: https://icalendar--1462.org.readthedocs.build/en/1462/

@read-the-docs-community

read-the-docs-community Bot commented Jun 16, 2026

Copy link
Copy Markdown

Documentation build overview

📚 icalendar | 🛠️ Build #33229923 | 📁 Comparing 3686e75 against latest (5ad270b)

  🔍 Preview build  

2 files changed
± 404.html
± _modules/icalendar/parser/string.html

@stevepiercy stevepiercy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, with only a minor tweaks. Would you please take care of them? Thank you!

@@ -0,0 +1 @@
Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icalendar uses American English spelling and style, per https://icalendar.readthedocs.io/en/stable/contribute/documentation/style-guide.html#spelling-and-grammar

Suggested change
Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz
Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialized content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched it to serialized.

Comment thread src/icalendar/tests/test_parsing.py Outdated
],
)
def test_escape_char_escapes_lone_carriage_return(value, expected):
"""A lone ``\\r`` must be escaped, not left raw in the content line."""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use r whenever you escape characters in a docstring, and adjust content to minimize escaping.

Suggested change
"""A lone ``\\r`` must be escaped, not left raw in the content line."""
r"""A lone ``\r`` must be escaped, not left raw in the content line."""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, made it a raw r""" docstring so the backslash-r doesn't need doubling.

@niccokunzmann

Copy link
Copy Markdown
Member

Does RFC 5545 mention escaping of these or is it making them invalid as input?
I think the 6868 refers to parameters only? Could you help me understand that in these contexts?

@alhudz

alhudz commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

RFC 5545 doesn't give a bare \r its own escape. The TEXT type (§3.3.11) defines ESCAPED-CHAR = "\\" / "\;" / "\," / "\N" / "\n", so the only way a line break appears inside a value is the \n/\N escape. A literal CR is a control character (CONTROL = %x00-08 / %x0A-1F / %x7F, and CR is %x0D), and TSAFE-CHAR is everything except those controls. So a raw \r in a TEXT value is invalid input rather than something with an escape form.

You're right that 6868 is parameters only (caret encoding for parameter values), so it doesn't govern TEXT. I only mentioned it as the precedent that the parameter path already drops the raw \r; the TEXT path didn't.

On the to_ical side this is a serialisation question, not validation: _escape_char already normalises \r\n and \n to the \n escape instead of rejecting, so escaping a lone \r the same way keeps that behaviour and guarantees we never emit a raw control char into the stream. The alternative would be to raise on the invalid input, but that's a much bigger behaviour change for a path that currently sanitises.

@angatha

angatha commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Ok, you added another "nive to have" case that will instead of writing an invalid sequence interpret '\r' as a newline, which is escaped as r'\n'. Sounds good to me.

vText('\n').to_ical().hex() # '5c6e'
vText('\r').to_ical().hex() # now also '5c6e' instead of '0d', whcih would be invalid for TEXT

@alhudz

alhudz commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Right, that's exactly it. A lone \r now normalises to the \n escape the same way \r\n and \n already did, so a TEXT value never carries a raw 0d into the stream.

@stevepiercy

stevepiercy commented Jun 17, 2026

Copy link
Copy Markdown
Member

I want to get clarity on the RFC before we go further.

From RFC 5545, § 3.3.11, Text:

       TSAFE-CHAR = WSP / %x21 / %x23-2B / %x2D-39 / %x3C-5B /
                    %x5D-7E / NON-US-ASCII
          ; Any character except CONTROLs not needed by the current
          ; character set, DQUOTE, ";", ":", "\", ","

%x0D is not WSP (see below), it's not within those hexadecimal ranges, and it's not NON-US-ASCII.

It is defined as CONTROL, per RFC 5545, § 3.1, Content Lines.

     CONTROL       = %x00-08 / %x0A-1F / %x7F
     ; All the controls except HTAB

%x0D is not needed by the current character set, or any character set, AFAIK, which is consistent with the syntax for TSAFE-CHAR (text-safe character).

For a definition of WSP, see RFC 5234, Appendix B.1.

         WSP            =  SP / HTAB
                                ; white space

AFAICT, RFC 5545 and its subsequent updates don't mention how to treat a lone CR or its equivalents such as %x0D. Therefore I'd say that no treatment is required and it should be left as is.

Although it may seem reasonable to convert \r to \n, I'm afraid it wouldn't be compliant with the RFC. We could ask for clarity by writing the standards group.

@alhudz did you use AI to generate any part of this issue, including its description and your subsequent comments? If so, then you must follow our Responsible AI use policy. Please let us know. Thank you!

@angatha

angatha commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Critical is that this influences a round trip from ical to python object back to python. Invalid ical containing '\n' without a following '\n' now becomes valid, but the same is true for one with the sequence '\r\n'.
I see no problem in making the python interface for creating xText instances forgiving, as it is already for '\r\n'.
However, what does the spec say about parsing invalid input?

@stevepiercy

Copy link
Copy Markdown
Member

@angatha thanks for talking this through with me. I think I'm OK with replacing a lone \r with \n as long as we make the distinction between RFC specification and Python implementation. We blur that line (break). hahaha! For example:

def _escape_char(text: str | bytes) -> str:
r"""Format value according to iCalendar TEXT escaping rules.
Escapes special characters in text values according to :rfc:`5545#section-3.3.11`
rules.
The order of replacements matters to avoid double-escaping.
Parameters:
text: The text to escape.
Returns:
The escaped text with special characters escaped.
Note:
The replacement order is critical:
1. ``\N`` -> ``\n`` (normalize newlines to lowercase)
2. ``\`` -> ``\\`` (escape backslashes)
3. ``;`` -> ``\;`` (escape semicolons)
4. ``,`` -> ``\,`` (escape commas)
5. ``\r\n`` -> ``\n`` (normalize line endings)
6. ``"\n"`` -> ``r"\n"`` (transform a newline character to a literal, or raw,
newline character)
"""

Step 5 is a Python implementation not explicitly specified in the RFC. However, in the Description section of RFC 5545, § 3.3.11, Text:

      An intentional formatted text line break MUST only be included in
      a "TEXT" property value by representing the line break with the
      character sequence of BACKSLASH, followed by a LATIN SMALL LETTER
      N or a LATIN CAPITAL LETTER N, that is "\n" or "\N".

I read that as meaning \r is implicitly "an intentional formatted text line break."

Here's the only reference I found for how to handle invalid property values, assuming that's what you mean by invalid input.

https://datatracker.ietf.org/doc/html/rfc7529.html#section-4.1

@niccokunzmann

Copy link
Copy Markdown
Member

Thanks for all the research. We have several issues in that sense.

  1. \r, \r\n and \n should all be \n in the result of vText.to_ical
  2. in the parameters, it seems that this is the same intention, we have the RFC 6868. Does that mention \r?
  3. vText to vText comparism is another matter. We can assume though that \r will never occur inside of vText and replace it accordingly, so it is never in the content, see point 1.

It is good not to forget these points,linkk this PR in the changelog. Maybe good to open an issue, to allow the checkmarking of independent edits.

RFC 6868:

the character sequence ^n (U+005E, U+006E) is decoded into an
appropriate formatted line break according to the type of system
being used

As a result, we should test different outcomes based on the platform (new issue) and also that \r, \r\n and \n are converted to ^n.

That is something to keep track of.

@alhudz

alhudz commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

On the AI question: I lean on one a bit for research and to double-check my RFC reading, but the analysis and the patch here are mine.

On the change: I pushed a docstring note in _escape_char to keep the RFC vs implementation line clear (per @stevepiercy). Steps 5 to 7 are line-ending normalisation, an implementation convenience, since RFC 5545 only gives \n/\N an escape and never a bare \r. Also added the PR link to the news fragment.

@niccokunzmann on your points:

  1. Done here. \r, \r\n and \n all serialise to the \n escape in vText.to_ical.
  2. RFC 6868's encoding table only names the formatted line break (NL) as ^n; it doesn't single out a bare \r. But rfc_6868_escape already maps \r, \r\n and \n all to ^n, so the parameter path already does this. There's just no test pinning it, and ^n decodes back to os.linesep, so that round trip is platform dependent.
  3. Agreed, vText to vText comparison is separate; with point 1 a raw \r never reaches the content anyway.

This PR only touches the serialise direction (to_ical), not parsing, so the invalid-input round trip @angatha raised is out of scope here.

Opened #1477 to track the parameter tests, the platform-dependent round trip, and point 3 so the independent edits can be checked off.

@niccokunzmann

Copy link
Copy Markdown
Member

Thanks! Can you have a look at the failing fuzz tests? Please ask for help - it might mean that

  • a new tests needs adding to fix the fuzz error. I think, how to do this is in the docs
  • or the fuzz testing needs changing because the error message changed.

@gaoflow

gaoflow commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

I traced the CIFuzz failure. The failing calendar is the base64 line immediately before libFuzzer: fuzz target exited:

QkVHSU46VlRJTUVaT05FClRaSUQ6UwwMDQwMDAwLCkVORDpWVElNRVpPTkUK

That decodes to an invalid VTIMEZONE containing a bare \r inside the TZID line:

b"BEGIN:VTIMEZONE\nTZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b\nEND:VTIMEZONE\n"

It reaches dateutil.tz.tzical, which raises ValueError: not enough values to unpack (expected 2, got 1). This looks like a valid fuzz rejection rather than a parser behavior change needed in this PR.

I do not have push access to the fork, but this minimal patch made the local fuzz regression pass:

diff --git a/src/icalendar/tests/fuzzed/__init__.py b/src/icalendar/tests/fuzzed/__init__.py
@@
     "Invalid month:",
     "must have exactly",  # vCard field count validation (ADR, N)
     "must have at least",  # vCard ORG minimum field validation
+    "not enough values to unpack",  # dateutil rejects malformed VTIMEZONE lines
 ]

diff --git a/src/icalendar/tests/fuzzed/test_fuzzed_calendars.py b/src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
@@
 def test_fuzz_v1(fuzz_v1_calendar_path):
@@
         should_walk=True,
     )
+
+
+def test_fuzz_v1_malformed_vtimezone_linebreak():
+    """Malformed VTIMEZONE content lines are valid fuzzing rejections."""
+    calendar = b"BEGIN:VTIMEZONE\nTZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b\nEND:VTIMEZONE\n"
+    fuzz_v1_calendar(
+        icalendar.cal.calendar.Calendar.from_ical,
+        calendar,
+        multiple=True,
+        should_walk=True,
+    )

Verified locally on pr-1462:

PYTHONPATH=src python -m pytest -q src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
# 3 passed in 0.06s

python -m ruff check src/icalendar/tests/fuzzed/__init__.py src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
# All checks passed!

python -m ruff format --check src/icalendar/tests/fuzzed/__init__.py src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
# 2 files already formatted

git diff --check
# passed

@alhudz

alhudz commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Traced it, and it's exactly the "error message changed" case you flagged, not something unrelated.

The fuzz input is a garbage VTIMEZONE with a bare \r in the TZID:

BEGIN:VTIMEZONE
TZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b
END:VTIMEZONE

dateutil's tzical rejects it both before and after this PR, but the message differs:

  • before: ValueError: at least one component is needed (matched the allowlist via "component")
  • after: ValueError: not enough values to unpack (expected 2, got 1)

With the lone \r now escaped, it no longer splits the TZID line for dateutil, so it parses a bit further and trips on a different malformed line. Both are valid rejections of invalid input; the new substring just wasn't in the allowlist.

Followed the fuzz-testing docs: added fuzz_testcase_vtimezone_lone_cr.ics with the decoded input and added "not enough values to unpack" to _value_error_matches. src/icalendar/tests/fuzzed/ passes locally (3 cases).

Thanks @gaoflow, that lines up with what I found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants