Skip to content

[fish] Fix for newlines and other characters #4334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bitraid
Copy link
Contributor

@bitraid bitraid commented Apr 2, 2025

These changes fix issues with newlines in names, and escaped newlines and other characters in command line (more details in commit messages). For the last fix, the __fzf_parse_commandline function is rewritten, which now takes advantage of commands/options present in newer fish versions, while maintaining compatibility with the oldest supported version, which is actually v3.1b. All fish versions were individually tested, and they all seem to work, even v3.1.1 despite the reported issues with eval. Dropping support for versions older than v3.2.0 would simplify the code a little, but I think it's better to keep it for now, because v3.1.2 is on Debian 11.

I also have some observations relating the display of file/dir names by fzf:

  • When the --read0 option is not set, the ␤ symbol is not shown for trailing newlines in file names, while it is shown for directory names (after 6fa8295, where the path separator is appended). I think it would be better to always show trailing newlines, so that file foo can be distinguished from foo\n.
  • Except for new line and carriage return, no symbol is inserted for other whitespace characters: bell (could have U+2407 ␇), backspace (could have U+2408 ␈), form feed (could have U+240C ␌), vertical tab (could have U+240B ␋).
  • Tabs (horizontal) are replaced by spaces, making foo\tbar undistinguished from foo bar (maybe it could be replaced by U+2409 ␉ instead, only for file/dir names).
  • The path separator that is appended for directories (6fa8295), is also applied to the root directory (causing it to be displayed as //): fzf --walker{=dir,-root=/}

CTRL-T/ALT-C now works correctly when selecting files or directories
that contain newlines in their names. When external commands defined by
$FZF_CTRL_T_COMMAND/$FZF_ALT_C_COMMAND are used (for example the fd
command with -0 switch), the --read0 option must also be set through
$FZF_CTRL_T_OPTS/$FZF_ALT_C_OPTS.
@junegunn
Copy link
Owner

junegunn commented Apr 9, 2025

Thank, I've been quite busy lately. Will review when I get some time.

junegunn added a commit that referenced this pull request Apr 11, 2025
junegunn added a commit that referenced this pull request Apr 11, 2025
#4334 (comment)

    # Should display foo␊
    echo -en "foo\n" | fzf --read0  --no-multi-line
@junegunn
Copy link
Owner

  • When the --read0 option is not set, the ␤ symbol is not shown for trailing newlines in file names

Fixed in 0edb5d5.

echo -en "foo\n" | fzf --read0  --no-multi-line

--read0 partly assumes the possibility that the user is handling multi-line entries, so it enables multi-line display mode, but that doesn't seem like a good decision for file paths, and you can override this with --no-multi-line.

  • The path separator that is appended for directories (6fa8295), is also applied to the root directory (causing it to be displayed as //): fzf --walker{=dir,-root=/}

Thanks, fixed in 9ffc2c7

  • Except for new line and carriage return, no symbol is inserted for other whitespace characters: bell (could have U+2407 ␇), backspace (could have U+2408 ␈), form feed (could have U+240C ␌), vertical tab (could have U+240B ␋).

It's very unlikely that fzf will see those characters, but why not? Sounds good to me. I'll think about it.

  • Tabs (horizontal) are replaced by spaces, making foo\tbar undistinguished from foo bar (maybe it could be replaced by U+2409 ␉ instead, only for file/dir names).

fzf manually processes tab characters to support a custom --tabstop setting, and it's quite useful for aligning and spacing out fields. But I agree that is not the case for file/dir names. Automatically determining whether to display ␉ or to replace with spaces is a bit tricky, because fzf is a text filter that doesn't know if an entry is a path or not. While we could only display ␉ when the built-in reader is used, then we would not be able to enable this mode in other cases. One idea might be to allow --tabstop=0 as a special value to enable this mode.

@junegunn
Copy link
Owner

I've done some basic testing and the patch seems to work expected, thanks. That said, the changes are a bit dense for me since I'm not very familiar with fish scripting, so I’d like to take a bit more time to fully understand the details.

@bitraid
Copy link
Contributor Author

bitraid commented Apr 12, 2025

Thanks for looking into this, and for the fixes. There is no rush, please take all the time you need, and let me know if any part of the code needs clarification.

@junegunn
Copy link
Owner

Could you share the test cases you used to validate the feature? The processing logic is fairly complex, so seeing some examples would help clarify the intent behind your code.

One strange thing I noticed, vim ./<CTRL-T> starts fzf with . as the query.

This is a rewrite of __fzf_parse_commandline function, that fixes the
following issues, when CTRL-T/ALT-C is used and current command line
token contains:
- Escaped newlines (\n): This never worked correctly, but after 282884a,
  the string would split, and the script would enter an infinite loop
  while trying to set $dir.
- Escaped bell (\a, \cg), backspace (\b), form feed (\v, \cl), carriage
  return (\r), vertical tab (\v, \ck): walker-root would not set
  correctly for existing directories containing any of those characters.
- Regular expression special characters (^, +, ? etc): $dir would not be
  be stripped from $fzf_query if it contained any of those characters.

The lowest supported fish version is v3.1b. For optimal operation, the
function uses more recent commands when supported by the running
version. Specifically, for versions equal or newer than:
- v3.2.0: Sets variables using PCRE2 capture groups of `string match
  --regex` when needing to preserve any trailing newlines and
  simultaneously omit the extra newline that is appended by `string
  collect -N`.
- v3.5.0: Uses the builtin path command for path normalization, dirname
  extraction and existing directories check.
- v4.0.0: Uses the --tokens-expanded option of commandline, for
  expansion and dealing with unbalanced quotes and incomplete escape
  sequences. It also uses the regex style of string-escape, to prepare
  variable contents for regex operations. This is not used in older
  versions, because they don't escape newlines.
@bitraid
Copy link
Contributor Author

bitraid commented Apr 14, 2025

One strange thing I noticed, vim ./ starts fzf with . as the query.

The condition that strips $dir from $fzf_query was not met in this case. I fixed and updated the commit.


Below, I give one example for each fix of the second commit:

Escaped newlines (\n): This never worked correctly, but after 282884a,
the string would split, and the script would enter an infinite loop
while trying to set $dir.

ls foo\nbar # <ctrl-t>

Escaped bell (\a, \cg), backspace (\b), form feed (\v, \cl), carriage
return (\r), vertical tab (\v, \ck): walker-root would not set
correctly for existing directories containing any of those characters.

mkdir foo\rbar
touch foo\rbar/baz
ls foo\rbar/ # <ctrl-t>

Regular expression special characters (^, +, ? etc): $dir would not be
be stripped from $fzf_query if it contained any of those characters.

mkdir foo+bar
ls foo+bar/ # <ctrl-t>

To make it easier to understand the code, I list below the function for fish v4+ only, and then explain what each part does and how it is adjusted for older versions:

    set -l fzf_query ''
    set -l prefix ''
    set -l dir '.'

    # Set $prefix and expanded $fzf_query with preserved trailing newlines.
    string match -q -r -- '(?<prefix>^-[^\s=]+=)?(?<fzf_query>[\s\S]*?(?=\n?$)$)' \
      (commandline --current-token --tokens-expanded | string collect -N)

    if test -n "$fzf_query"
      # Normalize path in $fzf_query, set $dir to the longest existing directory.
      set -- fzf_query (path normalize -- $fzf_query)
      set -- dir $fzf_query
      while not path is -d $dir
        set -- dir (path dirname $dir)
      end

      if not string match -q -- '.' $dir; or string match -q -r -- '^\./|^\.$' $fzf_query
        # Strip $dir from $fzf_query - preserve trailing newlines.
        string match -q -r -- '^'(string escape --style=regex -- $dir)'/?(?<fzf_query>[\s\S]*)' $fzf_query
      end
    end

    string escape -n -- "$dir" "$fzf_query" "$prefix"
  1. The $prefix and $fzf_query variables are set by the capture group of string-match on the output of commandline command. The token expansion is done by the --tokens-expanded switch. The pipe to string collect -N preserves any trailing newlines, but has the undesired effect of always adding one newline, that's why the regex of the fzf_query group doesn't match the last trailing newline. Older version don't have the --tokens-expanded switch, so the expansion is done by eval (after first escaping the string and undoing the escape of leading ~ and variable names). Versions older that v3.2.0, don't support setting variables with string-match, so there are more steps for the operation, and the two extra newlines (there are two calls to string-collect) are removed from the escaped string before passed to eval.

  2. The $dir variable is set by using the built-in path command. The path is first normalized in $fzf_query with the normalize sub-command (squashes duplicate /, collapses ../ with earlier components and removes trailing / and . components). The is sub-command is used to check the existence of a directory, and the dirname sub-command is used to remove basename when a directory doesn't exist (this handles correctly newlines). Versions older that v3.5.0 don't have the path command, so coreutils dirname is used instead (with -z switch, piped to string split0 for handling newlines). Also, the test command is used instead for checking the existence of directories, and partial normalization (squash duplicate '/' and remove trailing /) is done with string-replace piped to string-collect (for preserving trailing newlines). The variable is set with string-match for excluding extra trailing newline, or by using eval for versions older than v3.2.0.

  3. The path in $dir is striped from $fzf_query by using string-match with a regex than skips the leading contents present in $dir, and also any remaining leading /. For proper operation, the contents of $dir are escaped using the regex style of string-escape. On older versions, the regex style of string-escape does not escape newlines, so a simple/single string-replace is performed instead. Since the output is again piped to string-collect in this case, the extra newline is skipped by the regex of string-match, or by using eval for versions older that v3.2.0.

Please let me know if there are parts of the code that I should provide more information.

@junegunn
Copy link
Owner

Thank you for the detailed explanation.

One suggestion. $prefix currently only accepts --foo= or -foo=, but what do you think about also supporting single-letter options without = such as -fFILEPATH? Many tools allow a short option immediately followed by a file path. e.g. wget -ofile https://...

@bitraid
Copy link
Contributor Author

bitraid commented Apr 15, 2025

One suggestion. $prefix currently only accepts --foo= or -foo=, but what do you think about also supporting single-letter options without = such as -fFILEPATH? Many tools allow a short option immediately followed by a file path. e.g. wget -ofile https://...

I like the suggestion, but what do you think of also adding some logic that disables the option prefix when -- is prepended in the current process command (split at logical operators, terminators, and pipes) so that files/dirs starting with - can be accepted?
For example:

cmd-a -ofilename # <ctrl-t> $prefix="-o" and $fzf_query="filename", as suggested
cmd-a -- -filename # <ctrl-t> $prefix="" and $fzf_query="-filename"
cmd-a -- -filename-a; cmd-b -ofilename-b # <ctrl-t> $prefix="-o" and $fzf_query="filename-b"
cmd-a -- -filename-a | cmd-b -ofilename-b # <ctrl-t> $prefix="-o" and $fzf_query="filename-b"

It shouldn't need much code.

@junegunn
Copy link
Owner

It shouldn't need much code.

Really? Then I'm all for it.

@bitraid
Copy link
Contributor Author

bitraid commented Apr 15, 2025

Great!
A small change like the one below adds this functionality. I'll take some time to test it with different fish versions. Should it better be a separate commit?

diff --git a/shell/key-bindings.fish b/shell/key-bindings.fish
index 72b42079..fedc2c68 100644
--- a/shell/key-bindings.fish
+++ b/shell/key-bindings.fish
@@ -51,20 +51,26 @@ function fzf_key_bindings
     set -l -- fish_major (string match -r -- '^\d+' $version)
     set -l -- fish_minor (string match -r -- '^\d+\.(\d+)' $version)[2]
 
+    set -l -- match_regex '(?<fzf_query>[\s\S]*?(?=\n?$)$)'
+    set -l -- prefix_regex ''
+
+    if string match -q -v -- '* -- *' (string sub -e (commandline -Cp) -- (commandline -p))
+      set -- prefix_regex '^-[^\s=]+=|^-[^\s]'
+      set -- match_regex '(?<prefix>'$prefix_regex')?'$match_regex
+    end
+
     # Set $prefix and expanded $fzf_query with preserved trailing newlines.
     if test "$fish_major" -ge 4
       # fish v4.0.0 and newer
-      string match -q -r -- '(?<prefix>^-[^\s=]+=)?(?<fzf_query>[\s\S]*?(?=\n?$)$)' \
-        (commandline --current-token --tokens-expanded | string collect -N)
+      string match -q -r -- $match_regex (commandline --current-token --tokens-expanded | string collect -N)
     else if test "$fish_major" -eq 3 -a "$fish_minor" -ge 2
       # fish v3.2.0 - v3.7.1 (last v3)
-      string match -q -r -- '(?<prefix>^-[^\s=]+=)?(?<fzf_query>[\s\S]*?(?=\n?$)$)' \
-        (commandline --current-token --tokenize | string collect -N)
+      string match -q -r -- $match_regex (commandline --current-token --tokenize | string collect -N)
       eval set -- fzf_query (string escape -n -- $fzf_query | string replace -r -a '^\\\(?=~)|\\\(?=\$\w)' '')
     else
       # fish older than v3.2.0 (v3.1b1 - v3.1.2)
       set -l -- cl_token (commandline --current-token --tokenize | string collect -N)
-      set -- prefix (string match -r -- '^-[^\s=]+=' $cl_token)
+      set -- prefix (string match -r -- $prefix_regex $cl_token)
       set -- fzf_query (string replace -- "$prefix" '' $cl_token | string collect -N)
       eval set -- fzf_query (string escape -n -- $fzf_query | string replace -r -a '^\\\(?=~)|\\\(?=\$\w)|\\\n\\\n$' '')
     end

@junegunn
Copy link
Owner

Thanks. Please push a separate commit to this PR branch, and I'll merge without squashing.

- Support single-letter options without = such as -fFILEPATH
- fish v3.3.0 and newer: Disable option prefix if -- is preceded
@bitraid
Copy link
Contributor Author

bitraid commented Apr 16, 2025

I pushed the change, but modified the code for skipping the option prefix on --, to take effect only on v3.3.0 and newer, because older versions aren't able to return the cursor position relative to the current command process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants