Skip to content

Conversation

@Mingun
Copy link
Collaborator

@Mingun Mingun commented Jul 26, 2025

Because we support XML and HTML parsing and the rules for EOL normalization is differs between them, this PR introduces two new methods for BytesText, BytesCData and BytesRef in addition to decode:

  • xml_content()
  • html_content()

XML rules: https://www.w3.org/TR/xml11/#sec-line-ends
HTML rules: https://infra.spec.whatwg.org/#normalize-newlines

The new methods does not apply to attribute value normalization, this is left for 379.

Closes #806 (when use xml_content())

…ne Handling" section of XML 1.1 spec

Also implement EOL normalization for HTML as described in "normalize newlines" section of HTML spec

https://www.w3.org/TR/xml11/#sec-line-ends
https://infra.spec.whatwg.org/#normalize-newlines
@Mingun Mingun added enhancement serde Issues related to mapping from Rust types to XML labels Jul 26, 2025
@Mingun Mingun requested a review from dralley July 26, 2025 21:45
src/encoding.rs Outdated
match bytes {
Cow::Borrowed(bytes) => {
let text = self.decode(bytes)?;
match normalize_html_eols(&text) {
Copy link
Collaborator

@dralley dralley Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the normalization function is the only difference between html_content() and xml_content() as appears to be the case, then to avoid duplicating the function body you could write a single utility function and pass in the normalization function as an argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do.

Mingun added 2 commits July 30, 2025 00:45
…not in attributes yet)

Use `xml_content` instead of `decode` in serde deserializer and tests
@Mingun Mingun force-pushed the eol-normalization branch from 085c142 to 38b44d4 Compare July 29, 2025 19:47
@Mingun Mingun merged commit 6cf8c9f into tafia:master Jul 29, 2025
7 checks passed
@Mingun Mingun deleted the eol-normalization branch July 29, 2025 19:57
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 84.79532% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.52%. Comparing base (254fbd2) to head (38b44d4).
⚠️ Report is 58 commits behind head on master.

Files with missing lines Patch % Lines
src/events/mod.rs 31.57% 13 Missing ⚠️
src/escape.rs 93.70% 8 Missing ⚠️
benches/macrobenches.rs 0.00% 4 Missing ⚠️
benches/microbenches.rs 0.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #884      +/-   ##
==========================================
- Coverage   60.74%   55.52%   -5.22%     
==========================================
  Files          41       42       +1     
  Lines       16044    15511     -533     
==========================================
- Hits         9746     8613    -1133     
- Misses       6298     6898     +600     
Flag Coverage Δ
unittests 55.52% <84.79%> (-5.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement serde Issues related to mapping from Rust types to XML

Projects

None yet

Development

Successfully merging this pull request may close these issues.

End-of-Line Handling seems not confront to xml specification

3 participants