Skip to content

warc-indexer needs file.encoding="UTF8" #311

@tokee

Description

@tokee

@trym-b discovered that the warc-indexer needs the environment file encoding to be UTF-8, in order to produce Solr documents with ... UTF-8 encoding.

This can be achieved by setting

JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"

before calling the warc-indexer JAR, but the real solution is to explicitly set UTF-8 as the output encoding in the Java code where relevant. On a larger scale, using Forbidden APIs Checker would guard against variations of the problem, but experience says that enabling that check for a large project is a daunting task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions