-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Labels
Description
@trym-b discovered that the warc-indexer needs the environment file encoding to be UTF-8, in order to produce Solr documents with ... UTF-8 encoding.
This can be achieved by setting
JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"
before calling the warc-indexer JAR, but the real solution is to explicitly set UTF-8 as the output encoding in the Java code where relevant. On a larger scale, using Forbidden APIs Checker would guard against variations of the problem, but experience says that enabling that check for a large project is a daunting task.