Skip to content

Read files with different encodings #278

@Jolanrensen

Description

@Jolanrensen

As was brought to our attention in Kotlin/dataframe#1555, it appears that Deephaven CSV reads files as whatever the default encoding is in the JVM.

This can produce some unexpected results if you are reading a file that has a different encoding. For instance, let's say your JVM is running in US-ASCII (this can be reproduced by adding JAVA_TOOL_OPTIONS=-Dfile.encoding=US-ASCII to the environment variables) and you're trying to parse a UTF-8 csv like this: scratch.csv.

Image

whereas if we set the default encoding of the JVM to UTF-8:

Image

I wonder if it's possible if you could add a setting to CsvSpecs that can make it possible to read from any given encoding without having to change the environment variable.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions