Read files with different encodings

As was brought to our attention in https://github.com/Kotlin/dataframe/issues/1555, it appears that Deephaven CSV reads files as whatever the default encoding is in the JVM.

This can produce some unexpected results if you are reading a file that has a different encoding. For instance, let's say your JVM is running in US-ASCII (this can be reproduced by adding `JAVA_TOOL_OPTIONS=-Dfile.encoding=US-ASCII` to the environment variables) and you're trying to parse a UTF-8 csv like this: [scratch.csv](https://github.com/user-attachments/files/23477708/scratch.csv).

<img width="1565" height="499" alt="Image" src="https://github.com/user-attachments/assets/c56c331e-0856-421b-afc6-e8b2e3a8a077" />

whereas if we set the default encoding of the JVM to UTF-8:

<img width="1565" height="499" alt="Image" src="https://github.com/user-attachments/assets/fec87254-b905-46b0-a221-58f4b6233eae" />

I wonder if it's possible if you could add a setting to `CsvSpecs` that can make it possible to read from any given encoding without having to change the environment variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read files with different encodings #278

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Read files with different encodings #278

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions