escape lone carriage return in _escape_char#1462
Conversation
stevepiercy
left a comment
There was a problem hiding this comment.
This looks good, with only a minor tweaks. Would you please take care of them? Thank you!
| @@ -0,0 +1 @@ | |||
| Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz | |||
There was a problem hiding this comment.
icalendar uses American English spelling and style, per https://icalendar.readthedocs.io/en/stable/contribute/documentation/style-guide.html#spelling-and-grammar
| Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialised content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz | |
| Escape a lone ``\r`` as ``\n`` in :func:`icalendar.parser.string._escape_char`, used by :meth:`vText.to_ical <icalendar.prop.text.vText.to_ical>`. A carriage return not followed by a line feed was previously left raw in the serialized content line, so a ``SUMMARY`` or ``DESCRIPTION`` built from untrusted text could carry a control character into the iCalendar stream and split the line for lenient consumers. ``\r\n`` and ``\n`` were already escaped, and the parameter escaper already mapped ``\r`` to ``^n``. @alhudz |
There was a problem hiding this comment.
Done, switched it to serialized.
| ], | ||
| ) | ||
| def test_escape_char_escapes_lone_carriage_return(value, expected): | ||
| """A lone ``\\r`` must be escaped, not left raw in the content line.""" |
There was a problem hiding this comment.
Use r whenever you escape characters in a docstring, and adjust content to minimize escaping.
| """A lone ``\\r`` must be escaped, not left raw in the content line.""" | |
| r"""A lone ``\r`` must be escaped, not left raw in the content line.""" |
There was a problem hiding this comment.
Done, made it a raw r""" docstring so the backslash-r doesn't need doubling.
|
Does RFC 5545 mention escaping of these or is it making them invalid as input? |
|
You're right that On the |
|
Ok, you added another "nive to have" case that will instead of writing an invalid sequence interpret vText('\n').to_ical().hex() # '5c6e'
vText('\r').to_ical().hex() # now also '5c6e' instead of '0d', whcih would be invalid for TEXT |
|
Right, that's exactly it. A lone |
|
I want to get clarity on the RFC before we go further. From RFC 5545, § 3.3.11, Text:
It is defined as
For a definition of WSP, see RFC 5234, Appendix B.1. AFAICT, RFC 5545 and its subsequent updates don't mention how to treat a lone Although it may seem reasonable to convert @alhudz did you use AI to generate any part of this issue, including its description and your subsequent comments? If so, then you must follow our Responsible AI use policy. Please let us know. Thank you! |
|
Critical is that this influences a round trip from ical to python object back to python. Invalid ical containing |
|
@angatha thanks for talking this through with me. I think I'm OK with replacing a lone icalendar/src/icalendar/parser/string.py Lines 9 to 32 in 24b7db5 Step 5 is a Python implementation not explicitly specified in the RFC. However, in the Description section of RFC 5545, § 3.3.11, Text: I read that as meaning Here's the only reference I found for how to handle invalid property values, assuming that's what you mean by invalid input. https://datatracker.ietf.org/doc/html/rfc7529.html#section-4.1 |
|
Thanks for all the research. We have several issues in that sense.
It is good not to forget these points,linkk this PR in the changelog. Maybe good to open an issue, to allow the checkmarking of independent edits. RFC 6868:
As a result, we should test different outcomes based on the platform (new issue) and also that \r, \r\n and \n are converted to ^n. That is something to keep track of. |
|
On the AI question: I lean on one a bit for research and to double-check my RFC reading, but the analysis and the patch here are mine. On the change: I pushed a docstring note in @niccokunzmann on your points:
This PR only touches the serialise direction ( Opened #1477 to track the parameter tests, the platform-dependent round trip, and point 3 so the independent edits can be checked off. |
|
Thanks! Can you have a look at the failing fuzz tests? Please ask for help - it might mean that
|
|
I traced the CIFuzz failure. The failing calendar is the base64 line immediately before That decodes to an invalid b"BEGIN:VTIMEZONE\nTZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b\nEND:VTIMEZONE\n"It reaches I do not have push access to the fork, but this minimal patch made the local fuzz regression pass: diff --git a/src/icalendar/tests/fuzzed/__init__.py b/src/icalendar/tests/fuzzed/__init__.py
@@
"Invalid month:",
"must have exactly", # vCard field count validation (ADR, N)
"must have at least", # vCard ORG minimum field validation
+ "not enough values to unpack", # dateutil rejects malformed VTIMEZONE lines
]
diff --git a/src/icalendar/tests/fuzzed/test_fuzzed_calendars.py b/src/icalendar/tests/fuzzed/test_fuzzed_calendars.py
@@
def test_fuzz_v1(fuzz_v1_calendar_path):
@@
should_walk=True,
)
+
+
+def test_fuzz_v1_malformed_vtimezone_linebreak():
+ """Malformed VTIMEZONE content lines are valid fuzzing rejections."""
+ calendar = b"BEGIN:VTIMEZONE\nTZID:S\x0c\x0c\r\x0c\x0c\x0c\x0c\x0b\nEND:VTIMEZONE\n"
+ fuzz_v1_calendar(
+ icalendar.cal.calendar.Calendar.from_ical,
+ calendar,
+ multiple=True,
+ should_walk=True,
+ )Verified locally on |
|
Traced it, and it's exactly the "error message changed" case you flagged, not something unrelated. The fuzz input is a garbage
With the lone Followed the fuzz-testing docs: added Thanks @gaoflow, that lines up with what I found. |
Repro: add a
SUMMARY(or anyTEXTvalue) built from untrusted input that contains a lone\r, e.g.event.add("SUMMARY", "safe\rINJECTED:evil"), then callto_ical().Expected: the carriage return is escaped to
\nlike other line breaks.Actual: the raw
\rsurvives into the content line (SUMMARY:safe\rINJECTED:evil), so a lenient consumer that treats a bare CR as a line break sees an injected property.Cause:
_escape_charescapes\r\nand\nbut not a lone\r; the parameter escaperrfc_6868_escapealready maps\rto^n, theTEXTpath did not.Fix: escape a lone
\ras\nafter the existing replacements.The only fixture that round-tripped a raw CR used non-standard
\r\r\nline endings; corrected it to\r\n.📚 Documentation preview 📚: https://icalendar--1462.org.readthedocs.build/en/1462/