Skip to content

Implement charset detection from the first 1024 bytes of the HTML #531

Open
@krichprollsch

Description

@krichprollsch

charset

In browser.zig, in case of document HTML, we should try to determine the charset from a meta tag in the first 1024 bytes of the document.

The meta element can be used, and the charset attribute is preferred [html5:0]. If there is no HTTP declaration or BOM, a meta element must be used [html5:14]. Any meta declaration must use an ascii-compatible encoding [html5:14] [html5:16]. The implication of this is that UTF-16 encoded pages must not use a meta declaration. Any meta declaration must fit in the first 1024 bytes of page [html5:12] [html5:23].
https://www.w3.org/International/articles/spec-summaries/encoding
https://www.w3.org/International/questions/qa-html-encoding-declarations

If we found no charset, we should mime.charset and finally utf-8 by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions