Skip to content

wrong metadata charset tags #301

Open
@yuval-herman

Description

@yuval-herman

Firstly thank you for working on this library! It's a big help!

I was using your library to scrap some old sites with node fetch and came across a strange issue.
While scraping this site specifically, I got this error out of iconv:

Error: Encoding not recognized: 'visual' (searched as: 'visual')

This was caused by this tag in one the subframes(which I also scrap) in the page:

<meta http-equiv="Content-Type" content="text/html; charset=visual">

A solution would be to check if the meta content tag hold garbage data before committing to it.

Sample code to reproduce:

import fetch from "node-fetch"
import convertBody from "fetch-charset-detection"

fetch(
	"https://www.gov.il/apps/elections/Elections-knesset-15/heb/banner.html"
).then((res) =>
	res
		.arrayBuffer()
		.then((buf) => convertBody(buf))
		.then(console.log)
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions