Skip to content

Conversation

@timtebeek
Copy link
Member

@timtebeek timtebeek commented Jan 10, 2026

What's changed?

  • Added rewrite-docker, with a DockerParser, Docker interface and nested types, as well as some initial recipes
  • Verified locally against 1492 docker images from these repositories:
bitnami/containers@main
CentOS/CentOS-Dockerfiles@master
chainguard-images/images@main
docker-library/official-images@master
docker/awesome-compose@master
dotnet/dotnet-docker@main
fedora-cloud/Fedora-Dockerfiles@master
GoogleContainerTools/distroless@main
jessfraz/dockerfiles@master
kubernetes/kubernetes@master
linuxserver/[email protected]
moby/moby@master
vimagick/dockerfiles@master

This only leaves one type of issue with backticks and newlines in some 20 dotnet images that I think will be rare in practice.

  • The lexer now maintains a field whether we are at the start of a logical (non-heredocs / multiline) line before parsing keywords, which simplifies a lot of the logic we had for escapes/keywords/state tracking.

What's your motivation?

We've had some use cases pop up recently that aim to modify Dockerfiles; while that's possible with text based replacements, it would be better with a structured parser and model to modify.

Anything in particular you'd like reviewers to focus on?

Are we confident enough now in the current tree classes and ANTLR based parser to roll this out?

Have you considered any alternatives or workarounds?

It's been proposed to write a manual parser, but not yet clear what the benefit would be there.

Any additional context

There was an earlier attempt that had been reverted in 2eae35a at the time over some concerns, and because we were just before a release.

Before that there was an externally developed manual parser, with a different model

timtebeek and others added 30 commits January 10, 2026 00:15
- Add closingBracketPrefix field to Docker.ExecForm to preserve
  trailing whitespace before closing bracket (e.g., "CMD [ "/bin/bash" ]")
- Update DockerParserVisitor to capture the closing bracket prefix
- Update DockerPrinter to print the closing bracket prefix
- Add LocalDockerParserTest for batch testing against docker-images
- Enhance LocalDockerParser with per-file error reporting and grouping

This fixes print idempotence issues for CMD, ENTRYPOINT instructions
that use JSON array form with trailing whitespace.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add hasEquals field to LabelPair to distinguish old format (key value)
  from new format (key=value)
- Add labelOldValue grammar rule to allow instruction keywords in
  old-style LABEL values
- Add closingBracketPrefix to Volume and Shell for JSON array whitespace
- Fix NPE when parsing malformed VOLUME/SHELL instructions

Parser success rate: 1123/1556 files (72%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixes FROM --platform=$VAR image parsing where the environment variable
was incorrectly not being captured as part of the flag value.

Parser success rate: 1171/1556 files (75%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix parseFlagValue to properly parse environment variables in flag values
- Stop flag value parsing at whitespace boundaries to prevent greedy matching
- Organize DockerParserTest into separate test classes per instruction type:
  - FromTest, RunTest, CmdTest, EntrypointTest, LabelTest, EnvTest, ArgTest
  - CopyTest, AddTest, ExposeTest, VolumeTest, ShellTest, OnbuildTest
  - HealthcheckTest, UserTest, WorkdirTest, StopsignalTest, MaintainerTest
- Keep general integration tests in DockerParserTest

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Exclude '<' from UNQUOTED_TEXT starting character class to prevent
  heredoc marker '<<' from being consumed as UNQUOTED_TEXT
- Add HEREDOC_CONTENT token for heredoc body content
- Add HP_UNQUOTED_TEXT rule in HEREDOC_PREAMBLE for destination paths
- Add defensive bounds checks in parseText and visitArgument to prevent
  StringIndexOutOfBoundsException with invalid token ranges
- Add LexerDebugTest and HeredocTest for heredoc parsing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add execForm field to Docker.Add and Docker.Copy classes
- Update DockerParser.g4 to support jsonArray in ADD/COPY instructions
- Update DockerParserVisitor to parse JSON arrays into ExecForm
- Update DockerPrinter to print exec form for ADD/COPY

This enables parsing Dockerfiles like:
  COPY ["src", "dest"]
  COPY --from=installer ["/dotnet", "/usr/share/dotnet"]

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Move stageName column to be the second column (after sourceFile) in
DockerBaseImages and DockerExposedPorts data tables for consistent ordering.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Allow SHELL, USER, and AS keywords to appear in shell form text, so commands
like `useradd --shell /bin/false` parse correctly. Also allows keywords like
SHELL as ENV variable names.

Changes:
- Add shellFormText rule for shell form that allows safe keywords
- Add envSafeKeyword rule for ENV keys
- Add shellSafeKeyword rule for shell form text (SHELL, USER, AS only)
- Add tests for keyword handling in RUN, ENV, and HEALTHCHECK instructions

Parser now successfully parses 1541/1556 files (99% success rate).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Allow environment variables in VOLUME path and EXPOSE port specifications.
For example: VOLUME ${DATA_DIR} and EXPOSE ${PORT}.

Changes:
- Add ENV_VAR token to port and volumePath grammar rules
- Update DockerParserVisitor to handle ENV_VAR in convertPort and volumePath
- Add createEnvVar helper method for consistent environment variable parsing
- Add tests for environment variables in VOLUME and EXPOSE

Parser now successfully parses 1554/1556 files (99.87% success rate).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The heredoc lexer was incorrectly pushing both the heredoc marker (EOT)
and any subsequent text (bash) onto the identifier stack. When looking
for the terminator, it would compare against "bash" instead of "EOT".

This fix adds a flag to only capture the first identifier after <<.
Also adds test for heredoc with bash interpreter pattern.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The lexer was entering JSON mode on [ which broke shell test expressions
like `if [ ! -f file ]`. Now JSON arrays are parsed without mode switching,
allowing [ ] , to appear in shell form text.

Also added support for escaped characters (like \;) at the start of
unquoted text tokens, fixing `find -exec ... {} \;` patterns.

Tests added for shell test brackets and find -exec patterns.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Files using `LABEL maintainer "name"` were failing because the lexer
matched `maintainer` as MAINTAINER keyword, not UNQUOTED_TEXT.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Shell single-quoted strings are literal - no escape processing occurs.
Changed SINGLE_QUOTED_STRING to allow any character except ' and newlines,
fixing regex patterns like '^\(root\|app\):'.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds labelKeyWithKeyword rule that permits all instruction keywords
(RUN, COPY, ENV, etc.) as label keys when using key=value format.
The old-style space-separated format continues to only allow MAINTAINER
to avoid consuming the next instruction as a label key.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds COMMAND_SUBST token for $(command) and BACKTICK_SUBST for
`command` shell substitutions. These tokens are allowed in shell
form text, text elements, and other places where ENV_VAR appears.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds SPECIAL_VAR token for shell special variables like $!, $$, $?,
$#, $@, $*, and positional parameters $0-$9. These are now allowed
in shell form text, text elements, and other places where ENV_VAR
appears.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Support backtick (`) as line continuation character alongside backslash
- Support inline line continuation inside double-quoted strings
- Allow any character after backslash in escape sequences (for Windows paths)
- Restrict BACKTICK_SUBST to not match backtick followed by whitespace/newline
- Update LocalDockerParser to accept output file path as second argument
- Add Windows-style Dockerfile tests

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@timtebeek timtebeek moved this from In Progress to Ready to Review in OpenRewrite Jan 12, 2026
@timtebeek timtebeek requested a review from jkschneider January 13, 2026 17:15
Copy link
Member

@sambsnyd sambsnyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Comment on lines +967 to +968
String name;
boolean braced;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any spaces allowed if there are braces? All of the examples show ${foo} and never ${ foo }, but do we know that docker refuses to accept the latter? Might need more than a boolean if docker does not fail the latter with a terminal error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Ready to Review

Development

Successfully merging this pull request may close these issues.

4 participants