diff --git a/docs/tar.md b/docs/tar.md
index f9e82e507..1d3a05a5a 100644
--- a/docs/tar.md
+++ b/docs/tar.md
@@ -100,7 +100,7 @@ Rule that executes BSD `tar`. Most users should use the [`tar`](#tar) macro, rat
| out | Resulting tar file to write. If absent, `[name].tar` is written. | Label | optional | `None` |
| args | Additional flags permitted by BSD tar; see the man page. | List of strings | optional | `[]` |
| compress | Compress the archive file with a supported algorithm. | String | optional | `""` |
-| compute_unused_inputs | Whether to discover and prune input files that will not contribute to the archive.
Unused inputs are discovered by comparing the set of input files in `srcs` to the set of files referenced by `mtree`. Files not used for content by the mtree specification will not be read by the `tar` tool when creating the archive and can be pruned from the input set using the `unused_inputs_list` [mechanism](https://bazel.build/contribute/codebase#input-discovery).
Benefits: pruning unused input files can reduce the amount of work the build system must perform. Pruned files are not included in the action cache key; changes to them do not invalidate the cache entry, which can lead to higher cache hit rates. Actions do not need to block on the availability of pruned inputs, which can increase the available parallelism of builds. Pruned files do not need to be transferred to remote-execution workers, which can reduce network costs.
Risks: pruning an actually-used input file can lead to unexpected, incorrect results. The comparison performed between `srcs` and `mtree` is currently inexact and may fail to handle handwritten or externally-derived mtree specifications. However, it is safe to use this feature when the lines found in `mtree` are derived from one or more `mtree_spec` rules, filtered and/or merged on whole-line basis only.
Possible values:
- `compute_unused_inputs = 1`: Always perform unused input discovery and pruning. - `compute_unused_inputs = 0`: Never discover or prune unused inputs. - `compute_unused_inputs = -1`: Discovery and pruning of unused inputs is controlled by the --[no]@aspect_bazel_lib//lib:tar_compute_unused_inputs flag. | Integer | optional | `-1` |
+| compute_unused_inputs | Whether to discover and prune input files that will not contribute to the archive.
Unused inputs are discovered by comparing the set of input files in `srcs` to the set of files referenced by `mtree`. Files not used for content by the mtree specification will not be read by the `tar` tool when creating the archive and can be pruned from the input set using the `unused_inputs_list` [mechanism](https://bazel.build/contribute/codebase#input-discovery).
Benefits: pruning unused input files can reduce the amount of work the build system must perform. Pruned files are not included in the action cache key; changes to them do not invalidate the cache entry, which can lead to higher cache hit rates. Actions do not need to block on the availability of pruned inputs, which can increase the available parallelism of builds. Pruned files do not need to be transferred to remote-execution workers, which can reduce network costs.
Risks: pruning an actually-used input file can lead to unexpected, incorrect results. The comparison performed between `srcs` and `mtree` is exact. There are no known circumstances where incorrect results are anticipated.
Possible values:
- `compute_unused_inputs = 1`: Always perform unused input discovery and pruning. - `compute_unused_inputs = 0`: Never discover or prune unused inputs. - `compute_unused_inputs = -1`: Discovery and pruning of unused inputs is controlled by the --[no]@aspect_bazel_lib//lib:tar_compute_unused_inputs flag. | Integer | optional | `-1` |
| mode | A mode indicator from the following list, copied from the tar manpage:
- create: Create a new archive containing the specified items. - append: Like `create`, but new entries are appended to the archive. Note that this only works on uncompressed archives stored in regular files. The -f option is required. - list: List archive contents to stdout. - update: Like `append`, but new entries are added only if they have a modification date newer than the corresponding entry in the archive. Note that this only works on uncompressed archives stored in regular files. The -f option is required. - extract: Extract to disk from the archive. If a file with the same name appears more than once in the archive, each copy will be extracted, with later copies overwriting (replacing) earlier copies. | String | optional | `"create"` |
| mtree | An mtree specification file | Label | required | |
diff --git a/lib/private/BUILD.bazel b/lib/private/BUILD.bazel
index 2e9654044..a788efba6 100644
--- a/lib/private/BUILD.bazel
+++ b/lib/private/BUILD.bazel
@@ -8,6 +8,9 @@ exports_files(
"modify_mtree.awk",
"parse_status_file.jq",
"parse_status_file.yq",
+ "unvis.gawk",
+ "vis_canonicalize.gawk",
+ "vis_escape.gawk",
],
visibility = ["//visibility:public"],
)
@@ -279,9 +282,12 @@ bzl_library(
bzl_library(
name = "tar",
- srcs = ["tar.bzl"],
+ srcs = [
+ "tar.bzl",
+ ],
visibility = ["//lib:__subpackages__"],
deps = [
+ ":strings.bzl",
"@aspect_bazel_lib//lib:paths",
"@bazel_skylib//rules:common_settings",
],
@@ -362,6 +368,9 @@ bzl_library(
name = "strings",
srcs = ["strings.bzl"],
visibility = ["//lib:__subpackages__"],
+ deps = [
+ "@bazel_skylib//lib:types",
+ ],
)
bzl_library(
diff --git a/lib/private/tar.bzl b/lib/private/tar.bzl
index dc5cff549..d28e8eb6e 100644
--- a/lib/private/tar.bzl
+++ b/lib/private/tar.bzl
@@ -103,10 +103,8 @@ parallelism of builds. Pruned files do not need to be transferred to remote-exec
workers, which can reduce network costs.
Risks: pruning an actually-used input file can lead to unexpected, incorrect results. The
-comparison performed between `srcs` and `mtree` is currently inexact and may fail to
-handle handwritten or externally-derived mtree specifications. However, it is safe to use
-this feature when the lines found in `mtree` are derived from one or more `mtree_spec`
-rules, filtered and/or merged on whole-line basis only.
+comparison performed between `srcs` and `mtree` is exact. There are no known
+circumstances where incorrect results are anticipated.
Possible values:
@@ -119,11 +117,15 @@ Possible values:
values = [-1, 0, 1],
),
"_compute_unused_inputs_flag": attr.label(default = Label("//lib:tar_compute_unused_inputs")),
+ "_unvis": attr.label(allow_single_file = True, default = Label("//lib/private:unvis.gawk")),
+ "_vis_canonicalize": attr.label(allow_single_file = True, default = Label("//lib/private:vis_canonicalize.gawk")),
+ "_vis_escape": attr.label(allow_single_file = True, default = Label("//lib/private:vis_escape.gawk")),
}
_mtree_attrs = {
"srcs": attr.label_list(doc = "Files that are placed into the tar", allow_files = True),
"out": attr.output(doc = "Resulting specification file to write"),
+ "_vis_escape": attr.label(allow_single_file = True, default = Label("//lib/private:vis_escape.gawk")),
}
def _add_compression_args(compress, args):
@@ -185,18 +187,12 @@ def _is_unprunable(file):
p = file.path
return p[0].isspace() or p[-1].isspace() or "\n" in p or "\r" in p
-def _fmt_pruanble_inputs_line(file):
+def _fmt_prunable_inputs_line(file):
if _is_unprunable(file):
return None
-
- # The tar.prunable_inputs.txt file has a two columns:
- # 1. vis-encoded paths of the files, used in comparison
- # 2. un-vis-encoded paths of the files, used for reporting back to Bazel after filtering
- path = file.path
- return _vis_encode(path) + " " + path
+ return _vis_encode(file.path)
def _fmt_keep_inputs_line(file):
- # The tar.keep_inputs.txt file has a single column of vis-encoded paths of the files to keep.
return _vis_encode(file.path)
def _configured_unused_inputs_file(ctx, srcs, keep):
@@ -225,7 +221,7 @@ def _configured_unused_inputs_file(ctx, srcs, keep):
.set_param_file_format("multiline")
.add_all(
srcs,
- map_each = _fmt_pruanble_inputs_line,
+ map_each = _fmt_prunable_inputs_line,
),
)
ctx.actions.write(
@@ -243,26 +239,33 @@ def _configured_unused_inputs_file(ctx, srcs, keep):
# * are not found in any content= or contents= keyword in the MTREE
# * are not in the hardcoded KEEP_INPUTS set
#
- # Comparison and filtering of PRUNABLE_INPUTS is performed in the vis-encoded representation, stored in field 1,
- # before being written out in the un-vis-encoded form Bazel understands, from field 2.
+ # Comparison and filtering of PRUNABLE_INPUTS is performed in the vis-encoded representation
+ # before being written out in the un-vis-encoded form Bazel understands.
#
# Note: bsdtar (libarchive) accepts both content= and contents= to identify source file:
# ref https://github.com/libarchive/libarchive/blob/a90e9d84ec147be2ef6a720955f3b315cb54bca3/libarchive/archive_read_support_format_mtree.c#L1640
- #
- # TODO: Make comparison exact by converting all inputs to a canonical vis-encoded form before comparing.
- # See also: https://github.com/bazel-contrib/bazel-lib/issues/794
ctx.actions.run_shell(
outputs = [unused_inputs],
- inputs = [prunable_inputs, keep_inputs, ctx.file.mtree],
+ inputs = [
+ prunable_inputs,
+ keep_inputs,
+ ctx.file.mtree,
+ ctx.file._unvis,
+ ctx.file._vis_canonicalize,
+ ctx.file._vis_escape,
+ ],
tools = [coreutils],
command = '''
- "$COREUTILS" join -v 1 \\
- <("$COREUTILS" sort -u "$PRUNABLE_INPUTS") \\
- <("$COREUTILS" sort -u \\
- <(grep -o '\\bcontents\\?=\\S*' "$MTREE" | "$COREUTILS" cut -d'=' -f 2-) \\
- "$KEEP_INPUTS" \\
- ) \\
- | "$COREUTILS" cut -d' ' -f 2- \\
+ "$COREUTILS" join -v 1 \\
+ <(/opt/homebrew/bin/gawk -bf "$VIS_ESCAPE" "$PRUNABLE_INPUTS" | "$COREUTILS" sort -u) \\
+ <("$COREUTILS" sort -u \\
+ <(grep -o '\\bcontents\\?=\\S*' "$MTREE" \\
+ | "$COREUTILS" cut -d'=' -f 2- \\
+ | /opt/homebrew/bin/gawk -bf "$VIS_CANONICALIZE" \\
+ ) \\
+ <(/opt/homebrew/bin/gawk -bf "$VIS_ESCAPE" "$KEEP_INPUTS") \\
+ ) \\
+ | /opt/homebrew/bin/gawk -bf "$UNVIS" \\
> "$UNUSED_INPUTS"
''',
env = {
@@ -271,6 +274,9 @@ def _configured_unused_inputs_file(ctx, srcs, keep):
"KEEP_INPUTS": keep_inputs.path,
"MTREE": ctx.file.mtree.path,
"UNUSED_INPUTS": unused_inputs.path,
+ "UNVIS": ctx.file._unvis.path,
+ "VIS_CANONICALIZE": ctx.file._vis_canonicalize.path,
+ "VIS_ESCAPE": ctx.file._vis_escape.path,
},
mnemonic = "UnusedTarInputs",
toolchain = "@aspect_bazel_lib//lib:coreutils_toolchain_type",
@@ -278,7 +284,6 @@ def _configured_unused_inputs_file(ctx, srcs, keep):
return unused_inputs
-
# TODO(3.0): Access field directly after minimum bazel_compatibility advanced to or beyond v7.0.0.
def _repo_mapping_manifest(files_to_run):
return getattr(files_to_run, "repo_mapping_manifest", None)
@@ -372,8 +377,10 @@ def _to_rlocation_path(file, workspace):
return workspace + "/" + file.short_path
def _vis_encode(filename):
- # TODO(#794): correctly encode all filenames by using vis(3) (or porting it)
- return filename.replace(" ", "\\040")
+ # Escaping of non-ASCII bytes cannot be performed within Starlark.
+ # After writing content out, a second pass is performed with vis_escape.gawk.
+ # Backslash, newline, and space are not handled by vis_escape.gawk; we encode only these in-process.
+ return filename.replace("\\", "\\134").replace("\n", "\\012").replace(" ", "\\040")
def _expand(file, expander, transform = to_repository_relative_path):
expanded = expander.expand(file)
@@ -400,6 +407,7 @@ def _expand(file, expander, transform = to_repository_relative_path):
def _mtree_impl(ctx):
out = ctx.outputs.out or ctx.actions.declare_file(ctx.attr.name + ".spec")
+ unescaped = ctx.actions.declare_file(ctx.attr.name + ".spec.unescaped")
content = ctx.actions.args()
content.set_param_file_format("multiline")
@@ -444,7 +452,18 @@ def _mtree_impl(ctx):
_mtree_line(_vis_encode(runfiles_dir + "/_repo_mapping"), "file", content = _vis_encode(repo_mapping.path)),
)
- ctx.actions.write(out, content = content)
+ ctx.actions.write(unescaped, content = content)
+ ctx.actions.run_shell(
+ outputs = [out],
+ inputs = [unescaped, ctx.file._vis_escape],
+ command = '/opt/homebrew/bin/gawk -bf "$VIS_ESCAPE" "$UNESCAPED" > "$OUT"',
+ env = {
+ "VIS_ESCAPE": ctx.file._vis_escape.path,
+ "UNESCAPED": unescaped.path,
+ "OUT": out.path,
+ },
+ mnemonic = "VisEscape",
+ )
return DefaultInfo(files = depset([out]), runfiles = ctx.runfiles([out]))
diff --git a/lib/private/unvis.gawk b/lib/private/unvis.gawk
new file mode 100755
index 000000000..a0802e335
--- /dev/null
+++ b/lib/private/unvis.gawk
@@ -0,0 +1,19 @@
+#!/usr/bin/env gawk --characters-as-bytes --file
+#
+# Replace octal escape sequences with the bytes they represent.
+# NOTE: not a fully general unvis program.
+
+BEGIN {
+ for (i = 0x00; i <= 0xFF; i++) {
+ b = sprintf("%c", i)
+ esc = sprintf("\\%03o", i)
+ REPLACE[esc] = b
+ }
+}
+
+{
+ n = split($0, verbatim_parts, /[\\][0-3][0-7][0-7]/, replace_parts)
+ for (i = 1; i < n; i++)
+ printf "%s%s", verbatim_parts[i], REPLACE[replace_parts[i]]
+ printf "%s%s", verbatim_parts[n], RT
+}
diff --git a/lib/private/vis_canonicalize.gawk b/lib/private/vis_canonicalize.gawk
new file mode 100755
index 000000000..5c19bb7e4
--- /dev/null
+++ b/lib/private/vis_canonicalize.gawk
@@ -0,0 +1,45 @@
+#!/usr/bin/env gawk --characters-as-bytes --file
+#
+# Convert lines of vis-encoded content to a bespoke canonical form. After canonicalization, equality checks are trivial.
+# Backslash, space characters, and all characters outside the 95 printable ASCII set are represented using escaped three-digit octal.
+# The remaining characters are not escaped; they represent themselves.
+# Newlines are the record separator and are exempt from replacement, although the escaped special form \n does canonicalized to octal.
+#
+# Input is interpreted as libarchive would, with a wider set of escape sequences:
+# * \\, \a, \b, \f, \n, \r, \t, \v have their conventional C-based meanings
+# * \0 means NUL when not the start of an three-digit octal escape sequence
+# * \s means SPACE
+# * \ is valid as an ordinary backslash when not the start of a valid escape sequence
+#
+# See: https://github.com/libarchive/libarchive/blob/a90e9d84ec147be2ef6a720955f3b315cb54bca3/libarchive/archive_read_support_format_mtree.c#L1942
+
+BEGIN {
+ REPLACE["\\\\"] = "\\134"
+ REPLACE["\\0"] = "\\000"
+ REPLACE["\\a"] = "\\007"
+ REPLACE["\\b"] = "\\010"
+ REPLACE["\\f"] = "\\014"
+ REPLACE["\\n"] = "\\012"
+ REPLACE["\\r"] = "\\015"
+ REPLACE["\\s"] = "\\040"
+ REPLACE["\\t"] = "\\011"
+ REPLACE["\\v"] = "\\013"
+
+ for (i = 0x00; i <= 0xFF; i++) {
+ b = sprintf("%c", i)
+ esc = sprintf("\\%03o", i)
+ if (match(b, /[^[:graph:]]|[\\]/)) {
+ REPLACE[b] = esc
+ REPLACE[esc] = esc
+ } else {
+ REPLACE[esc] = b
+ }
+ }
+}
+
+{
+ n = split($0, verbatim_parts, /[\\][\\0abfnrstv]|[\\][0-3][0-7][0-7]|[^[:graph:]]|[\\]/, replace_parts)
+ for (i = 1; i < n; i++)
+ printf "%s%s", verbatim_parts[i], REPLACE[replace_parts[i]]
+ printf "%s%s", verbatim_parts[n], RT
+}
diff --git a/lib/private/vis_escape.gawk b/lib/private/vis_escape.gawk
new file mode 100755
index 000000000..e09db2b71
--- /dev/null
+++ b/lib/private/vis_escape.gawk
@@ -0,0 +1,21 @@
+#!/usr/bin/env gawk --characters-as-bytes --file
+#
+# Replace most bytes with their octal escape sequences.
+# Backslashes, newlines, and spaces remain in place to preserve newline-delimited records of space-delimited fields
+# while allowing upstream producers to include these delimiters in vis-encoded content.
+
+BEGIN {
+ # Not all entries in REPLACE will be used but over-inclusion is simpler.
+ for (i = 0x00; i <= 0xFF; i++) {
+ b = sprintf("%c", i)
+ esc = sprintf("\\%03o", i)
+ REPLACE[b] = esc
+ }
+}
+
+{
+ n = split($0, verbatim_parts, /[^[:graph:] \\]/, replace_parts)
+ for (i = 1; i < n; i++)
+ printf "%s%s", verbatim_parts[i], REPLACE[replace_parts[i]]
+ printf "%s%s", verbatim_parts[n], RT
+}
diff --git a/lib/tests/BUILD.bazel b/lib/tests/BUILD.bazel
index 5e98a2a8f..05c2c4fbe 100644
--- a/lib/tests/BUILD.bazel
+++ b/lib/tests/BUILD.bazel
@@ -12,6 +12,7 @@ load(":lists_test.bzl", "lists_test_suite")
load(":paths_test.bzl", "paths_test_suite")
load(":strings_tests.bzl", "strings_test_suite")
load(":utils_test.bzl", "utils_test_suite")
+load("//lib:bats.bzl", "bats_test")
exports_files(["a.js"])
@@ -76,3 +77,30 @@ bzl_library(
srcs = ["generate_outputs.bzl"],
visibility = ["//visibility:public"],
)
+
+genrule(
+ name = "coreutils",
+ toolchains = [
+ "@coreutils_toolchains//:resolved_toolchain",
+ ],
+ outs = ["coreutils_bin"],
+ cmd = "cp $(COREUTILS_BIN) $@",
+)
+
+bats_test(
+ name = "vis_encoding",
+ srcs = ["vis_encoding.bats"],
+ size = "small",
+ data = [
+ "//lib/private:vis_escape.gawk",
+ "//lib/private:unvis.gawk",
+ "//lib/private:vis_canonicalize.gawk",
+ ":coreutils",
+ ],
+ env = {
+ "VIS_ESCAPE": "$(location //lib/private:vis_escape.gawk)",
+ "UNVIS": "$(location //lib/private:unvis.gawk)",
+ "VIS_CANONICALIZE": "$(location //lib/private:vis_canonicalize.gawk)",
+ "COREUTILS": "$(rootpath :coreutils)",
+ }
+)
diff --git a/lib/tests/tar/BUILD.bazel b/lib/tests/tar/BUILD.bazel
index d2b499668..34a454b1f 100644
--- a/lib/tests/tar/BUILD.bazel
+++ b/lib/tests/tar/BUILD.bazel
@@ -227,10 +227,12 @@ assert_tar_listing(
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/",
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/",
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/srcdir/",
+ "-rwxr-xr-x 0 0 0 4 Jan 1 2023 lib/tests/tar/srcdir/Unicode® support?🤞",
"-rwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/srcdir/info",
"-rwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/srcdir/pkg",
"-rwxr-xr-x 0 0 0 1 Jan 1 2023 lib/tests/tar/srcdir/space in name.txt",
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/treeartifact/",
+ "-rwxr-xr-x 0 0 0 4 Jan 1 2023 lib/tests/tar/treeartifact/Unicode® support?🤞",
"-rwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/treeartifact/info",
"-rwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/treeartifact/pkg",
"-rwxr-xr-x 0 0 0 1 Jan 1 2023 lib/tests/tar/treeartifact/space in name.txt",
@@ -450,6 +452,7 @@ assert_tar_listing(
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/",
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/",
"drwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/treeartifact/",
+ "-rwxr-xr-x 0 0 0 4 Jan 1 2023 lib/tests/tar/treeartifact/Unicode® support?🤞",
"-rwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/treeartifact/info",
"-rwxr-xr-x 0 0 0 0 Jan 1 2023 lib/tests/tar/treeartifact/pkg",
"-rwxr-xr-x 0 0 0 1 Jan 1 2023 lib/tests/tar/treeartifact/space in name.txt",
@@ -460,8 +463,46 @@ assert_unused_listing(
name = "test_unused_inputs_listed",
actual = ":tar15",
expected = [
+ "lib/tests/tar/unused/Unicode® support?🤞",
"lib/tests/tar/unused/info",
"lib/tests/tar/unused/pkg",
"lib/tests/tar/unused/space in name.txt",
],
)
+
+#############
+# Example 16: custom mtree with alternate escape sequences
+# In explicit or externally-derived mtree specs,
+# there is no need to match the path encoding that would be produced by our mtree macro exactly.
+# All escape sequences supported by bsdtar / libarchive will be understood.
+# This includes \s for SPACE, octal encoding of characters that don't need it, and others.
+# These alternate forms are not necessarily recommended, but they will work.
+
+tar(
+ name = "tar16",
+ srcs = [":treeartifact"],
+ out = "16.tar",
+ compute_unused_inputs = 1,
+ mtree = [
+ r"info uid=0 gid=0 time=1672560000 mode=0755 type=file content=$(location :treeartifact)/\151\156\146\157",
+ r"space\sin\sname.txt uid=0 gid=0 time=1672560000 mode=0755 type=file content=$(location :treeartifact)/space\sin\sname.txt",
+ ],
+)
+
+assert_tar_listing(
+ name = "test_custom_mtree2",
+ actual = ":tar16",
+ expected = [
+ "-rwxr-xr-x 0 0 0 0 Jan 1 2023 info",
+ "-rwxr-xr-x 0 0 0 1 Jan 1 2023 space in name.txt",
+ ],
+)
+
+assert_unused_listing(
+ name = "test_unused_inputs_listed2",
+ actual = ":tar16",
+ expected = [
+ "lib/tests/tar/treeartifact/Unicode® support?🤞",
+ "lib/tests/tar/treeartifact/pkg",
+ ],
+)
diff --git a/lib/tests/tar/asserts.bzl b/lib/tests/tar/asserts.bzl
index 2884e5fd0..81a8aea9d 100644
--- a/lib/tests/tar/asserts.bzl
+++ b/lib/tests/tar/asserts.bzl
@@ -13,7 +13,11 @@ def assert_tar_listing(name, actual, expected):
srcs = [actual],
testonly = True,
outs = ["_{}.listing".format(name)],
- cmd = "$(BSDTAR_BIN) -tvf $(execpath {}) >$@".format(actual),
+ # HACK: under default and POSIX locales, MacOS 15.1 and Ubuntu 22.04 disagree on how files with Unicode filenames should be printed.
+ # LC_ALL=en_US may be inacurate, but by using a dense 8-bit, single-byte encoding,
+ # we achieve the effect of leaving the bytes alone and producing a consistent output to assert against.
+ cmd = "LC_ALL=en_US $(BSDTAR_BIN) -tvf $(execpath {}) >$@".format(actual),
+ #
toolchains = ["@bsd_tar_toolchains//:resolved_toolchain"],
)
diff --git "a/lib/tests/tar/srcdir/Unicode\302\256 support?\360\237\244\236" "b/lib/tests/tar/srcdir/Unicode\302\256 support?\360\237\244\236"
new file mode 100644
index 000000000..388e04c99
--- /dev/null
+++ "b/lib/tests/tar/srcdir/Unicode\302\256 support?\360\237\244\236"
@@ -0,0 +1 @@
+💯
\ No newline at end of file
diff --git a/lib/tests/vis_encoding.bats b/lib/tests/vis_encoding.bats
new file mode 100644
index 000000000..5ce6b6123
--- /dev/null
+++ b/lib/tests/vis_encoding.bats
@@ -0,0 +1,362 @@
+# Tests of the vis encoding support scripts.
+#
+# Most test cases make use of the fact that newline characters are passed through verbatim by all of these scripts.
+# For this reason, paragraph-delimited records of newline-delimited fields is a natural framing structure that will
+# be preserved through the encoding/decoding/canonicalizing transformation.
+
+# Try to use utilities from toolchains and avoid dependencies on system utilities as much as possible.
+# This gives us the greatest chance at consistency across platforms.
+basenc() {
+ "$COREUTILS" basenc "$@"
+}
+cat() {
+ "$COREUTILS" cat "$@"
+}
+cp() {
+ "$COREUTILS" cp "$@"
+}
+cut() {
+ "$COREUTILS" cut "$@"
+}
+diff() {
+ # No toolchain diff tool available; rely on system version. `diff` is part of POSIX; it should be available.
+ "$(which diff)" "$@"
+}
+gawk() {
+ # TODO: from toolchain
+ /opt/homebrew/bin/gawk "$@"
+}
+od() {
+ "$COREUTILS" od "$@"
+}
+paste() {
+ "$COREUTILS" paste "$@"
+}
+tr() {
+ "$COREUTILS" tr "$@"
+}
+
+@test "vis encode passthrough text" {
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/input"
+Newlines (\n), backslahes (\\), spaces (\s), and graphical ASCII ([[:graph:]]) characters are passed through unencoded.
+Upstream encoders should escape the first three in content they feed to the general encoder.
+
+ Newline => \012
+ Backslash => \134
+ Space => \040
+
+These gaps enable our encoder to operate on newline-delimited records of space-delimited fields of vis-encoded content.
+EOF
+
+ gawk -bf "$VIS_ESCAPE" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output"
+
+ # Content chosen to pass through encoder unmodified.
+ cp "$BATS_TEST_TMPDIR/input" "$BATS_TEST_TMPDIR/want"
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis encode each byte" {
+ gawk -v OFS="0A" -v ORS="0A0A" '{ $1 = $1; print }' <<'EOF' | basenc --decode --base16 >"$BATS_TEST_TMPDIR/input"
+00 01 02 03 04 05 06 07 08 09 0B 0C 0D 0E 0F
+10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
+20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
+30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
+40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
+50 51 52 53 54 55 56 57 58 59 5A 5B 5D 5E 5F
+60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
+70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
+80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
+90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
+A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
+B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
+C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
+D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
+E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
+F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
+EOF
+
+ gawk -bf "$VIS_ESCAPE" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output.raw"
+
+ gawk -v FS='\n' -v RS='\n\n' '
+ NR == rshift(0x00, 4) + 1 { for (i = NF; i > 0x0A; i--) $(i+1) = $(i); $(0x0A+1) = "" } # Newline gap
+ NR == rshift(0x50, 4) + 1 { for (i = NF; i > 0x0C; i--) $(i+1) = $(i); $(0x0C+1) = "" } # Backslash gap
+ { for (i = 1; i <= NF; i++) printf "%4s%s", $(i), i == NF ? ORS : OFS } # Emit table with fixed-width columns.
+ ' <"$BATS_TEST_TMPDIR/output.raw" >"$BATS_TEST_TMPDIR/output"
+
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/want"
+\000 \001 \002 \003 \004 \005 \006 \007 \010 \011 \013 \014 \015 \016 \017
+\020 \021 \022 \023 \024 \025 \026 \027 \030 \031 \032 \033 \034 \035 \036 \037
+ ! " # $ % & ' ( ) * + , - . /
+ 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
+ @ A B C D E F G H I J K L M N O
+ P Q R S T U V W X Y Z [ ] ^ _
+ ` a b c d e f g h i j k l m n o
+ p q r s t u v w x y z { | } ~ \177
+\200 \201 \202 \203 \204 \205 \206 \207 \210 \211 \212 \213 \214 \215 \216 \217
+\220 \221 \222 \223 \224 \225 \226 \227 \230 \231 \232 \233 \234 \235 \236 \237
+\240 \241 \242 \243 \244 \245 \246 \247 \250 \251 \252 \253 \254 \255 \256 \257
+\260 \261 \262 \263 \264 \265 \266 \267 \270 \271 \272 \273 \274 \275 \276 \277
+\300 \301 \302 \303 \304 \305 \306 \307 \310 \311 \312 \313 \314 \315 \316 \317
+\320 \321 \322 \323 \324 \325 \326 \327 \330 \331 \332 \333 \334 \335 \336 \337
+\340 \341 \342 \343 \344 \345 \346 \347 \350 \351 \352 \353 \354 \355 \356 \357
+\360 \361 \362 \363 \364 \365 \366 \367 \370 \371 \372 \373 \374 \375 \376 \377
+EOF
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis decode passthrough text" {
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/input"
+All text that is not an 3-digit octal escape sequence is passed through the decoder.
+This includes backslashes (\), even those part of special forms sometimes recognized elsewhere (e.g. \n, \r, \v, \0, etc.).
+EOF
+
+ gawk -bf "$UNVIS" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output"
+
+ # Content chosen to pass through encoder unmodified.
+ cp "$BATS_TEST_TMPDIR/input" "$BATS_TEST_TMPDIR/want"
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis decode passthrough all non-escape-sequence bytes" {
+ tr -d ' \n' <<'EOF' | basenc --decode --base16 >"$BATS_TEST_TMPDIR/input"
+00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
+10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
+20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
+30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
+40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
+50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
+60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
+70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
+80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
+90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
+A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
+B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
+C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
+D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
+E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
+F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
+EOF
+
+ gawk -bf "$UNVIS" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output.raw"
+
+ # Decoded content contains unprintable control characters. Diff the hexdump instead.
+ od -Ax -tx1 <"$BATS_TEST_TMPDIR/output.raw" >"$BATS_TEST_TMPDIR/output"
+
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/want"
+000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
+000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
+000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
+000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
+000040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
+000050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
+000060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
+000070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
+000080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
+000090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
+0000A0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
+0000B0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
+0000C0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
+0000D0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
+0000E0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
+0000F0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff
+000100
+EOF
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis decode all octal escape-sequences" {
+ tr -d ' \n' <<'EOF' >"$BATS_TEST_TMPDIR/input"
+\000 \001 \002 \003 \004 \005 \006 \007 \010 \011 \012 \013 \014 \015 \016 \017
+\020 \021 \022 \023 \024 \025 \026 \027 \030 \031 \032 \033 \034 \035 \036 \037
+\040 \041 \042 \043 \044 \045 \046 \047 \050 \051 \052 \053 \054 \055 \056 \057
+\060 \061 \062 \063 \064 \065 \066 \067 \070 \071 \072 \073 \074 \075 \076 \077
+\100 \101 \102 \103 \104 \105 \106 \107 \110 \111 \112 \113 \114 \115 \116 \117
+\120 \121 \122 \123 \124 \125 \126 \127 \130 \131 \132 \133 \134 \135 \136 \137
+\140 \141 \142 \143 \144 \145 \146 \147 \150 \151 \152 \153 \154 \155 \156 \157
+\160 \161 \162 \163 \164 \165 \166 \167 \170 \171 \172 \173 \174 \175 \176 \177
+\200 \201 \202 \203 \204 \205 \206 \207 \210 \211 \212 \213 \214 \215 \216 \217
+\220 \221 \222 \223 \224 \225 \226 \227 \230 \231 \232 \233 \234 \235 \236 \237
+\240 \241 \242 \243 \244 \245 \246 \247 \250 \251 \252 \253 \254 \255 \256 \257
+\260 \261 \262 \263 \264 \265 \266 \267 \270 \271 \272 \273 \274 \275 \276 \277
+\300 \301 \302 \303 \304 \305 \306 \307 \310 \311 \312 \313 \314 \315 \316 \317
+\320 \321 \322 \323 \324 \325 \326 \327 \330 \331 \332 \333 \334 \335 \336 \337
+\340 \341 \342 \343 \344 \345 \346 \347 \350 \351 \352 \353 \354 \355 \356 \357
+\360 \361 \362 \363 \364 \365 \366 \367 \370 \371 \372 \373 \374 \375 \376 \377
+EOF
+
+ gawk -bf "$UNVIS" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output.raw"
+
+ # Decoded content contains unprintable control characters. Diff the hexdump instead.
+ od -Ax -tx1 <"$BATS_TEST_TMPDIR/output.raw" >"$BATS_TEST_TMPDIR/output"
+
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/want"
+000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
+000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
+000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
+000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
+000040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
+000050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
+000060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
+000070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
+000080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
+000090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
+0000A0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
+0000B0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
+0000C0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
+0000D0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
+0000E0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
+0000F0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff
+000100
+EOF
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis canonicalize passthrough already-canonical" {
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/input.table"
+\000 \001 \002 \003 \004 \005 \006 \007 \010 \011 \012 \013 \014 \015 \016 \017
+\020 \021 \022 \023 \024 \025 \026 \027 \030 \031 \032 \033 \034 \035 \036 \037
+\040 ! " # $ % & ' ( ) * + , - . /
+ 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
+ @ A B C D E F G H I J K L M N O
+ P Q R S T U V W X Y Z [ \134 ] ^ _
+ ` a b c d e f g h i j k l m n o
+ p q r s t u v w x y z { | } ~ \177
+\200 \201 \202 \203 \204 \205 \206 \207 \210 \211 \212 \213 \214 \215 \216 \217
+\220 \221 \222 \223 \224 \225 \226 \227 \230 \231 \232 \233 \234 \235 \236 \237
+\240 \241 \242 \243 \244 \245 \246 \247 \250 \251 \252 \253 \254 \255 \256 \257
+\260 \261 \262 \263 \264 \265 \266 \267 \270 \271 \272 \273 \274 \275 \276 \277
+\300 \301 \302 \303 \304 \305 \306 \307 \310 \311 \312 \313 \314 \315 \316 \317
+\320 \321 \322 \323 \324 \325 \326 \327 \330 \331 \332 \333 \334 \335 \336 \337
+\340 \341 \342 \343 \344 \345 \346 \347 \350 \351 \352 \353 \354 \355 \356 \357
+\360 \361 \362 \363 \364 \365 \366 \367 \370 \371 \372 \373 \374 \375 \376 \377
+EOF
+ gawk -v OFS='\n' -v ORS='\n\n' '{ $1 = $1; print }' <"$BATS_TEST_TMPDIR/input.table" >"$BATS_TEST_TMPDIR/input"
+
+ gawk -bf "$VIS_CANONICALIZE" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output.raw"
+
+ gawk -v FS='\n' -v RS='\n\n' '
+ { for (i = 1; i <= NF; i++) printf "%4s%s", $(i), i == NF ? ORS : OFS } # Emit table with fixed-width columns.
+ ' <"$BATS_TEST_TMPDIR/output.raw" >"$BATS_TEST_TMPDIR/output"
+
+ # Content chosen to pass through encoder unmodified.
+ cp "$BATS_TEST_TMPDIR/input.table" "$BATS_TEST_TMPDIR/want"
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis canonicalize unnecessarily escaped" {
+ gawk -v OFS='\n' -v ORS='\n\n' '{ $1 = $1; print }' <<'EOF' >"$BATS_TEST_TMPDIR/input"
+ \041 \042 \043 \044 \045 \046 \047 \050 \051 \052 \053 \054 \055 \056 \057
+\060 \061 \062 \063 \064 \065 \066 \067 \070 \071 \072 \073 \074 \075 \076 \077
+\100 \101 \102 \103 \104 \105 \106 \107 \110 \111 \112 \113 \114 \115 \116 \117
+\120 \121 \122 \123 \124 \125 \126 \127 \130 \131 \132 \133 \135 \136 \137
+\140 \141 \142 \143 \144 \145 \146 \147 \150 \151 \152 \153 \154 \155 \156 \157
+\160 \161 \162 \163 \164 \165 \166 \167 \170 \171 \172 \173 \174 \175 \176
+EOF
+
+ gawk -bf "$VIS_CANONICALIZE" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output.raw"
+
+ gawk -v FS='\n' -v RS='\n\n' '
+ NR == rshift(0x20 - 0x20, 4) + 1 { for (i = NF; i > 0x00; i--) $(i+1) = $(i); $(0x00+1) = "" } # Space gap
+ NR == rshift(0x50 - 0x20, 4) + 1 { for (i = NF; i > 0x0C; i--) $(i+1) = $(i); $(0x0C+1) = "" } # Backslash gap
+ NR == rshift(0x70 - 0x20, 4) + 1 { for (i = NF; i > 0x0F; i--) $(i+1) = $(i); $(0x0F+1) = "" } # Delete gap
+ { for (i = 1; i <= NF; i++) printf "%1s%s", $(i), i == NF ? ORS : OFS } # Emit table with fixed-width columns.
+ ' <"$BATS_TEST_TMPDIR/output.raw" >"$BATS_TEST_TMPDIR/output"
+
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/want"
+ ! " # $ % & ' ( ) * + , - . /
+0 1 2 3 4 5 6 7 8 9 : ; < = > ?
+@ A B C D E F G H I J K L M N O
+P Q R S T U V W X Y Z [ ] ^ _
+` a b c d e f g h i j k l m n o
+p q r s t u v w x y z { | } ~
+EOF
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis canonicalize unescaped" {
+ gawk -v OFS='0A' -v ORS='0A0A' '{ $1 = $1; print }' <<'EOF' | basenc --decode --base16 >"$BATS_TEST_TMPDIR/input"
+00 01 02 03 04 05 06 07 08 09 0B 0C 0D 0E 0F
+10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
+20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
+30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
+40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
+50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
+60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
+70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
+80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
+90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
+A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
+B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
+C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
+D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
+E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
+F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
+EOF
+
+ gawk -bf "$VIS_CANONICALIZE" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output.raw"
+
+ gawk -v FS='\n' -v RS='\n\n' '
+ NR == rshift(0x00, 4) + 1 { for (i = NF; i > 0x0A; i--) $(i+1) = $(i); $(0x0A+1) = "" } # Newline gap
+ { for (i = 1; i <= NF; i++) printf "%4s%s", $(i), i == NF ? ORS : OFS } # Emit table with fixed-width columns.
+ ' <"$BATS_TEST_TMPDIR/output.raw" >"$BATS_TEST_TMPDIR/output"
+
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/want"
+\000 \001 \002 \003 \004 \005 \006 \007 \010 \011 \013 \014 \015 \016 \017
+\020 \021 \022 \023 \024 \025 \026 \027 \030 \031 \032 \033 \034 \035 \036 \037
+\040 ! " # $ % & ' ( ) * + , - . /
+ 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
+ @ A B C D E F G H I J K L M N O
+ P Q R S T U V W X Y Z [ \134 ] ^ _
+ ` a b c d e f g h i j k l m n o
+ p q r s t u v w x y z { | } ~ \177
+\200 \201 \202 \203 \204 \205 \206 \207 \210 \211 \212 \213 \214 \215 \216 \217
+\220 \221 \222 \223 \224 \225 \226 \227 \230 \231 \232 \233 \234 \235 \236 \237
+\240 \241 \242 \243 \244 \245 \246 \247 \250 \251 \252 \253 \254 \255 \256 \257
+\260 \261 \262 \263 \264 \265 \266 \267 \270 \271 \272 \273 \274 \275 \276 \277
+\300 \301 \302 \303 \304 \305 \306 \307 \310 \311 \312 \313 \314 \315 \316 \317
+\320 \321 \322 \323 \324 \325 \326 \327 \330 \331 \332 \333 \334 \335 \336 \337
+\340 \341 \342 \343 \344 \345 \346 \347 \350 \351 \352 \353 \354 \355 \356 \357
+\360 \361 \362 \363 \364 \365 \366 \367 \370 \371 \372 \373 \374 \375 \376 \377
+EOF
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u want output
+}
+
+@test "vis canonicalize special forms" {
+ cat <<'EOF' >"$BATS_TEST_TMPDIR/input_want"
+\0 \000
+\ \134
+\\ \134
+\a \007
+\b \010
+\f \014
+\n \012
+\r \015
+\s \040
+\t \011
+\v \013
+EOF
+ cut -f1 <"$BATS_TEST_TMPDIR/input_want" >"$BATS_TEST_TMPDIR/input"
+
+ gawk -bf "$VIS_CANONICALIZE" <"$BATS_TEST_TMPDIR/input" >"$BATS_TEST_TMPDIR/output"
+
+ paste "$BATS_TEST_TMPDIR/input" "$BATS_TEST_TMPDIR/output" >"$BATS_TEST_TMPDIR/input_output"
+
+ cd "$BATS_TEST_TMPDIR"
+ diff -u input_want input_output
+}