Skip to content

Support for non-UTF encoding in xml parser #332

@armanbilge

Description

@armanbilge

Very excited about the enhanced XML support in 1.4.0 :) I've been experimenting with it in http4s/http4s-scala-xml#25 and running into trouble with non UTF encodings. FTR I'm no expert in these things :)

For example this request:

Content-Type: application/xml

<?xml version="1.0" encoding="iso-8859-1"?><hello name="Günther"/>

as used in this test:
https://github.com/http4s/http4s-scala-xml/blob/1ca64f2ab7ef500d384d2ec5f8caf88df600e6a6/scala-xml/src/test/scala/org/http4s/scalaxml/ScalaXmlSuite.scala#L198-L209

Furthermore the RFC specifies:

Since the charset parameter is not provided in the Content-Type
header and there is no overriding BOM, conformant XML processors must
treat the "iso-8859-1" encoding as authoritative.  Conformant XML-
unaware MIME processors should make no assumptions about the
character encoding of the XML MIME entity.

https://datatracker.ietf.org/doc/html/rfc7303#section-8.3

I'm not sure if there is a way to support this without an XML parser that operates directly on bytes instead of chars/strings 😕 any thoughts? Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions