Skip to content

fix sanitation for --strip-ansi#3725

Closed
curious-rabbit wants to merge 3 commits into
sharkdp:masterfrom
curious-rabbit:strip-ansi
Closed

fix sanitation for --strip-ansi#3725
curious-rabbit wants to merge 3 commits into
sharkdp:masterfrom
curious-rabbit:strip-ansi

Conversation

@curious-rabbit

@curious-rabbit curious-rabbit commented May 5, 2026

Copy link
Copy Markdown

--strip-ansi=always left two classes of ANSI escape sequence in the output:

  • 8-bit C1 introducers (U+0090, U+0098, U+009B, U+009D, U+009E, U+009F). On terminals that interpret 8-bit C1 in UTF-8 (kitty, by default; older xterm and VTE configurations), these are the single-codepoint equivalents of ESC P, ESC X, ESC [, ESC ], ESC ^, ESC _ . They introduce DCS, SOS, CSI, OSC, PM, and APC sequences respectively. Bat's parser treated them as text.
  • DCS / SOS / PM / APC bodies. Even when introduced by the 7-bit ESC P/X/^/_, the body up to the string terminator was emitted as text and survived strip_ansi.

This patch teaches EscapeSequenceOffsetsIterator to recognise the 8-bit introducers as their 7-bit equivalents, and to consume string-terminated bodies (via either form of introducer) as a single opaque Unknown segment. Both then drop out of strip_ansi along with the existing CSI/OSC handling.

A small unit test in src/preprocessor.rs covers the three new cases (8-bit CSI, 7-bit DCS body, 8-bit DCS body with 8-bit ST).

Note:
--strip-ansi=always is bypassed entirely when bat selects SimplePrinter (i.e. when stdout is piped or --color=never is set), so bat --strip-ansi=always file | grep … returns the raw escape sequences.
The existing strip_ansi_does_not_affect_simple_printer test seems to lock this in deliberately. is that still the intended scope, or shoudl this be fixed? strip-ansi=always implies it always filters but thats not true

@keith-hall

Copy link
Copy Markdown
Collaborator

Note:
--strip-ansi=always is bypassed entirely when bat selects SimplePrinter (i.e. when stdout is piped or --color=never is set), so bat --strip-ansi=always file | grep … returns the raw escape sequences.
The existing strip_ansi_does_not_affect_simple_printer test seems to lock this in deliberately. is that still the intended scope, or shoudl this be fixed? strip-ansi=always implies it always filters but thats not true

I believe the intention is to leave the output unchanged when piping in scripts (and cat is an alias to bat and strip-ansi comes from the config file instead of the command line...)
Probably bat could benefit from a refactor which would always honor command line arguments, as this is quite a common problem I think, but maybe it makes sense to leave it as it is for this PR, what do you think?

@curious-rabbit

Copy link
Copy Markdown
Author

Closed in favor of #3729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants