7904021: Parsing group files using non-UTF-8 encoding fails #267

psoujany · 2025-06-05T04:35:14Z

We make use of jtreg to execute openjdk tests for JDK11/17/21 releases on non-UTF-8 returning platforms. We found latest jtreg code is using Files.newBufferedReader(path) to read group files data(TEST.GROUPS) from openjdk via GroupManager (https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/config/GroupManager.java#L102C44-L102C61).

This code defaults to return BufferedReader as UTF-8 instance. We see discrepancies when using this version of jtreg on non-UTF-8 platforms where defaultCharset() is non-UTF-8(JDK11 and JDK17).

Hence, we would like to propose a fix of using default.Charset() with Files.newBufferedWriter(Path path, Charset cs) instead of Files.newBufferedReader(path) and Files.readString(Path) to Files.readString(Path,Charset cs) in below jtreg files :
https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/config/GroupManager.java#L102C44-L102C61
https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/config/ExtraPropDefns.java#L309

We've also tested this fix on OpenJDK supported platforms like Linux, Windows, MAC.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

CODETOOLS-7904021: Parsing group files using non-UTF-8 encoding fails (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jtreg.git pull/267/head:pull/267
$ git checkout pull/267

Update a local copy of the PR:
$ git checkout pull/267
$ git pull https://git.openjdk.org/jtreg.git pull/267/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 267

View PR using the GUI difftool:
$ git pr show -t 267

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jtreg/pull/267.diff

Using Webrev

Link to Webrev Comment

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

CODETOOLS-7904021: Parsing group files using non-UTF-8 encoding fails (Bug - P4)

Reviewers

Christian Stein (@sormuras - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jtreg.git pull/267/head:pull/267
$ git checkout pull/267

Update a local copy of the PR:
$ git checkout pull/267
$ git pull https://git.openjdk.org/jtreg.git pull/267/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 267

View PR using the GUI difftool:
$ git pr show -t 267

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jtreg/pull/267.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-06-05T04:35:20Z

👋 Welcome back psoujany! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-06-05T04:35:43Z

@psoujany This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

7904021: Parsing group files using non-UTF-8 encoding fails

Reviewed-by: cstein

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 6 new commits pushed to the master branch:

64d4a06: 7904055: Update jtreg to bundle JUnit 5.13.3
4c1ef35: 7904053: Update version to 8
0a8cbb5: 7902847: Class directory of a test case should be always used to compile a library
... and 3 more: https://git.openjdk.org/jtreg/compare/666453dc24ce74ec081664c7d048721ad2a4da81...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@sormuras) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

mlbridge · 2025-06-05T04:48:13Z

Webrevs

00: Full (6f38fe91)

sormuras · 2025-06-09T20:47:59Z

Looking at the former implementation, without NIO API usage, the file read by those methods were assumed to be encoded in ISO 8859-1. Defaulting to UTF-8 was a breaking change, although storing files in UTF-8 nowadays is a common pattern.

Using Charset.defaultCharset() as proposed in the pull request introduces yet another behaviour. Maybe better? Another solution would be to revert to use the original ISO 8859-1. 🤔

sormuras · 2025-06-10T10:07:56Z

Please update the title of this PR to read: Parsing group files using non-UTF-8 encoding fails

Also, fix the PR body to appear not empty.

psoujany · 2025-06-10T11:03:02Z

@sormuras I've updated the PR title and body. Thank you.

sormuras · 2025-06-16T16:23:51Z

The two places changed in this PR aren't all places in which jtreg reads input from files. Not touch those other places might lead to unexpected/divergent behaviour.

With https://openjdk.org/jeps/400 UTF-8 is the default charset of the standard Java APIs. Yes, that relates to Java 18+, but did you try to store those group files in UTF-8 encoding in your local environment?

Did you try passing file.encoding as a system property to the jtreg runtime? For example: jtreg -J-Dfile.encoding=ISO-8859-1 ...

psoujany · 2025-06-18T11:21:36Z

In our testing we noticed these 2 places which resolved our issue, will check other places where the change is required.

Yes, we tried keeping TEST.groups in UTF-8 this led to asking .java files too in UTF-8. If we place .java files in UTF-8 then we had encountered javac compilation issues due to encoding mismatch(non UTF-8) platforms.

We also tried passing file.encoding to jtreg but still faced Malformed Error. Hence, we used Charset.defaultCharset().

sormuras · 2025-07-08T06:27:20Z

Right, using Charset.defaultCharset() would also enable support of file.encoding.

And before the conversion to use NIO, the Properties.load(InputStream) method was used. It has:

Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section @jls 3.3 of The Java Language Specification

Since jtreg 7, with CODETOOLS-7903091 included, the Properties.load(Reader) is called - which gets the now UTF-8 encoded reader from Files.newBufferedReader(file).

Thus, all-in-all, your change resolves a regression. In the light of that, I'll approve and sponsor this pull request.

psoujany · 2025-07-08T08:02:15Z

/integrate

openjdk · 2025-07-08T08:03:08Z

@psoujany
Your change (at version 6f38fe9) is now ready to be sponsored by a Committer.

sormuras · 2025-07-08T08:05:10Z

/sponsor

openjdk · 2025-07-08T08:05:53Z

Going to push as commit 439cb91.
Since your change was applied there have been 6 commits pushed to the master branch:

64d4a06: 7904055: Update jtreg to bundle JUnit 5.13.3
4c1ef35: 7904053: Update version to 8
0a8cbb5: 7902847: Class directory of a test case should be always used to compile a library
... and 3 more: https://git.openjdk.org/jtreg/compare/666453dc24ce74ec081664c7d048721ad2a4da81...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-07-08T08:06:01Z

@sormuras @psoujany Pushed as commit 439cb91.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

psoujany · 2025-09-10T13:53:39Z

Hi @sormuras , I'm looking for this change to be present in jtreg 7.3.1 which is the minimum jtreg version for JDK11 and 17. Could you please help me in getting this PR merged to jtreg7.3.1 binaries with this fix. Thank you.

psoujany added 2 commits May 7, 2025 18:32

Read InputStream in defaultCharset

667888d

Merge branch 'master' of github.com:psoujany/jtreg

6f38fe9

openjdk bot added the rfr Pull request is ready for review label Jun 5, 2025

psoujany changed the title ~~7904021: Parsing group files(TEST.GROUPS) on non-UTF-8 encoding platforms fails with java.nio.charset.MalformedInputException~~ 7904021: Parsing group files using non-UTF-8 encoding fails Jun 10, 2025

psoujany changed the title ~~7904021: Parsing group files using non-UTF-8 encoding fails~~ Parsing group files using non-UTF-8 encoding fails Jun 10, 2025

openjdk bot removed the rfr Pull request is ready for review label Jun 10, 2025

psoujany changed the title ~~Parsing group files using non-UTF-8 encoding fails~~ 7904021: Parsing group files using non-UTF-8 encoding fails Jun 10, 2025

openjdk bot added the rfr Pull request is ready for review label Jun 10, 2025

sormuras approved these changes Jul 8, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jul 8, 2025

openjdk bot added the sponsor Pull request is ready to be sponsored label Jul 8, 2025

openjdk bot added the integrated Pull request has been integrated label Jul 8, 2025

openjdk bot closed this Jul 8, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jul 8, 2025

psoujany mentioned this pull request Sep 8, 2025

Use jtreg_8_2(latest jtreg version) on z/OS across JDK11/17 releases adoptium/TKG#741

Merged

7904021: Parsing group files using non-UTF-8 encoding fails #267

7904021: Parsing group files using non-UTF-8 encoding fails #267

Uh oh!

Conversation

psoujany commented Jun 5, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Jun 5, 2025

Uh oh!

openjdk bot commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Jun 5, 2025

Webrevs

Uh oh!

sormuras commented Jun 9, 2025

Uh oh!

sormuras commented Jun 10, 2025

Uh oh!

psoujany commented Jun 10, 2025

Uh oh!

sormuras commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psoujany commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sormuras commented Jul 8, 2025

Uh oh!

psoujany commented Jul 8, 2025

Uh oh!

openjdk bot commented Jul 8, 2025

Uh oh!

sormuras commented Jul 8, 2025

Uh oh!

openjdk bot commented Jul 8, 2025

Uh oh!

openjdk bot commented Jul 8, 2025

Uh oh!

psoujany commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

psoujany commented Jun 5, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Jun 5, 2025 •

edited

Loading

sormuras commented Jun 16, 2025 •

edited

Loading

psoujany commented Jun 18, 2025 •

edited

Loading