Skip to content

Unclear if BytesText may contain entities or not #915

@main--

Description

@main--

As of #766 it seems that BytesText is not supposed to contain entity references any more, as those are instead returned as BytesRef events. However, the documentation on BytesText::decode still says:

quick-xml/src/events/mod.rs

Lines 583 to 584 in 655691c

/// This will allocate if the value contains any escape sequences or in
/// non-UTF-8 encoding.

implying that the function does perform unescaping. This is not true however, BytesText::decode does not unescape since #766 and will never allocate in UTF-8 mode.

However it seems that BytesText::new wasn't updated accordingly:

quick-xml/src/events/mod.rs

Lines 549 to 554 in 655691c

/// Creates a new `BytesText` from a string. The string is expected not to
/// be escaped.
#[inline]
pub fn new(content: &'a str) -> Self {
Self::from_escaped(escape(content))
}

The problem can be demonstrated like this:

assert_eq!("A & B", BytesText::new("A & B").decode().unwrap());

The assertion fails, because the ampersand is escaped by BytesText::new and then not unescaped by decode. A possible workaround would be using BytesText::from_escaped instead of new, but passing an unescaped ampersand to a function which explicitly calls for escaped data is arguably an API violation.

I think one can reasonably expect that creating a BytesText from a string and then decoding it back into a string should always return exactly the original string. If it doesn't, then the documentation should at least clearly point out why this is not the case and how the user is expected to deal with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions