Skip to content

[FileCheck] improve prefix validation #92248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

klensy
Copy link
Contributor

@klensy klensy commented May 15, 2024

First commit fixes already existing check with wrong regex, enforcing actual behavior: "prefix must start with a letter or digit", this should break few tests like:

; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mattr=+outline-atomics -O0 | FileCheck %s --check-prefixes=CHECK,-O0

Historically documented was "must start with letter" but not enforced, so tests with prefixes starting with digit was spread across test suites.

Second commit forbids prefixes ends with directive name, like here, to prevent ambiguity:
Should CHECK-EXTAB-NEXT: xxx be parsed as prefix: xxx or this is missing prefix CHECK-EXTAB with -NEXT directive?

// RUN: llvm-objdump -s --triple=armv7a-none-linux-gnueabi %t2 | FileCheck --check-prefix=CHECK-EXTAB-NEXT %s

TODO: mention filecheck_lint.py in FileCheck docs.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 15, 2024

@llvm/pr-subscribers-testing-tools

Author: klensy (klensy)

Changes

First commit fixes already existing check with wrong regex, this should break few tests: like

; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mattr=+outline-atomics -O0 | FileCheck %s --check-prefixes=CHECK,-O0

Second commit forbids prefixes ends with directive name, like here, to prevent ambiguity:

// RUN: llvm-objdump -s --triple=armv7a-none-linux-gnueabi %t2 | FileCheck --check-prefix=CHECK-EXTAB-NEXT %s


Full diff: https://github.com/llvm/llvm-project/pull/92248.diff

1 Files Affected:

  • (modified) llvm/lib/FileCheck/FileCheck.cpp (+12-1)
diff --git a/llvm/lib/FileCheck/FileCheck.cpp b/llvm/lib/FileCheck/FileCheck.cpp
index 1719f8ef2b436..0d511611511d2 100644
--- a/llvm/lib/FileCheck/FileCheck.cpp
+++ b/llvm/lib/FileCheck/FileCheck.cpp
@@ -2468,13 +2468,16 @@ FileCheckString::CheckDag(const SourceMgr &SM, StringRef Buffer,
 
 static bool ValidatePrefixes(StringRef Kind, StringSet<> &UniquePrefixes,
                              ArrayRef<StringRef> SuppliedPrefixes) {
+  static const char *Directives[] = {"-NEXT",  "-SAME", "-EMPTY", "-NOT",
+                                     "-COUNT", "-DAG",  "-LABEL"};
+
   for (StringRef Prefix : SuppliedPrefixes) {
     if (Prefix.empty()) {
       errs() << "error: supplied " << Kind << " prefix must not be the empty "
              << "string\n";
       return false;
     }
-    static const Regex Validator("^[a-zA-Z0-9_-]*$");
+    static const Regex Validator("^[a-zA-Z][a-zA-Z0-9_-]*$");
     if (!Validator.match(Prefix)) {
       errs() << "error: supplied " << Kind << " prefix must start with a "
              << "letter and contain only alphanumeric characters, hyphens, and "
@@ -2486,6 +2489,14 @@ static bool ValidatePrefixes(StringRef Kind, StringSet<> &UniquePrefixes,
              << "check and comment prefixes: '" << Prefix << "'\n";
       return false;
     }
+    for (StringRef Directive : Directives) {
+      if (Prefix.ends_with(Directive)) {
+        errs() << "error: supplied " << Kind << " prefix must not ends with "
+               << "directive: '" << Directive << "', prefix: '" << Prefix
+               << "'\n";
+        return false;
+      }
+    }
   }
   return true;
 }

@klensy
Copy link
Contributor Author

klensy commented May 15, 2024

Oh, a lot of failures after 1st commit :-(

RKSimon added a commit that referenced this pull request May 15, 2024
Avoid using numbers as check prefix - replace with actual triple config names where possible
@RKSimon
Copy link
Collaborator

RKSimon commented May 15, 2024

I'll take a look at the X86 failures

RKSimon added a commit that referenced this pull request May 15, 2024
Avoid using numbers as check prefix - replace with actual triple config names
RKSimon added a commit that referenced this pull request May 15, 2024
Avoid using numbers as check prefix - replace with actual triple config names
RKSimon added a commit that referenced this pull request May 15, 2024
…92248

Avoid using numbers as check prefix - replace with actual triple config names
RKSimon added a commit that referenced this pull request May 15, 2024
Avoid using leading numbers in check prefixes - replace with actual triple config names (and makes it easier to add X64 test coverage in a future commit).
Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

@@ -2486,6 +2489,14 @@ static bool ValidatePrefixes(StringRef Kind, StringSet<> &UniquePrefixes,
<< "check and comment prefixes: '" << Prefix << "'\n";
return false;
}
for (StringRef Directive : Directives) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for this part of the change? Has it caused issues?

If we are doing this, a test should be added to the FileCheck test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a prefix that looks like a prefix + directive is very misleading and can easily lead to bug as the test gets updated. E.g. a prefix CHECK-NEXT could easily go wrong. That sounds like a good change to me but should have a lit test indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has it caused issues?

@klensy reported that
https://github.com/llvm/llvm-project/blame/main/lld/test/ELF/arm-exidx-shared.s used // RUN: llvm-objdump -s --triple=armv7a-none-linux-gnueabi %t2 | FileCheck --check-prefix=CHECK-EXTAB-NEXT %s.

CHECK-EXTAB-NEXT was very confusing. I fixed it.

RKSimon added a commit that referenced this pull request May 15, 2024
…#92248

Don't include "-LABEL" (or any other FileCheck modifier) in the core check prefix name
RKSimon added a commit that referenced this pull request May 15, 2024
Avoid using leading numbers in check prefixes - replace with actual triple config names.
@@ -2468,13 +2468,16 @@ FileCheckString::CheckDag(const SourceMgr &SM, StringRef Buffer,

static bool ValidatePrefixes(StringRef Kind, StringSet<> &UniquePrefixes,
ArrayRef<StringRef> SuppliedPrefixes) {
static const char *Directives[] = {"-NEXT", "-SAME", "-EMPTY", "-NOT",
"-COUNT", "-DAG", "-LABEL"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be called Suffixes not Directives.

There is a place that checks for some (but not all) cases of multiple suffixes, in the test file. That place should use the same list of suffixes that this function does. Ideally that place would also be enhanced to check for all combinations.

@MaskRay
Copy link
Member

MaskRay commented May 16, 2024

Oh, a lot of failures after 1st commit :-(

During development you can do ninja check-llvm-filecheck to test just llvm/test/FileCheck.

I typically do ninja check-llvm check-clang check-lld before creating a patch, which would give quite good coverage.

More complete coverage requires check-all (or check-bolt check-clang-tools check-flang check-lldb check-mlir ...).

If there many issues, don't feel stressed to fix them all by yourself. You can make a post on https://discourse.llvm.org/ for help.

klensy added 3 commits May 19, 2024 12:49
"prefix must start with a letter and contain only alphanumeric characters, hyphens, and underscores"
But actual prefixes starting with digit quite common, so allow them too, but forbid to start prefix with hyphen and underscore.
@@ -1,5 +1,5 @@
# Comment prefixes plus check directive suffixes are not comment directives
# and are treated as plain text.
# Comment prefixes plus check directive suffixes are forbidden.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There still some unresolved error, working on it

# note: command had no output on stdout or stderr
# error: command failed with exit status: 2

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use force push to rebase (it's against LLVM's guidelines because it makes it harder to review what's changed since the previous version that was reviewed). Instead, use a merge commit from main, if rebasing is really necessary.

for (StringRef Prefix : SuppliedPrefixes) {
if (Prefix.empty()) {
errs() << "error: supplied " << Kind << " prefix must not be the empty "
<< "string\n";
return false;
}
static const Regex Validator("^[a-zA-Z0-9_-]*$");
// TODO: restrict prefixes to start with only letter eventually
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this TODO. We DO NOT want to restrict this: there is no benefit to doing so.

Please move adjusting the error message to match the actual behaviour into a separate PR, as it is not related to "improving prefix validation".

for (StringRef Directive : Suffixes) {
if (Prefix.ends_with(Directive)) {
errs() << "error: supplied " << Kind << " prefix must not end with "
<< "directive: '" << Directive << "', prefix: '" << Prefix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the variable name and message here to use "suffix", per the previous request.

for (StringRef Prefix : SuppliedPrefixes) {
if (Prefix.empty()) {
errs() << "error: supplied " << Kind << " prefix must not be the empty "
<< "string\n";
return false;
}
static const Regex Validator("^[a-zA-Z0-9_-]*$");
// TODO: restrict prefixes to start with only letter eventually
static const Regex Validator("^[a-zA-Z0-9][a-zA-Z0-9_-]*$");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to change the behaviour? What's wrong with a leading underscore or dash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected for prefixes to be kind-of-ident (which can't start from digits and dashes, but digits actually used, bc no check for that was added, oh, well), and be similar to substitution blocks (which should be "[a-zA-Z_][0-9a-zA-Z_]*", as per doc).

Less restrictions makes it harder to catch possible errors in directives; currently possible typos simply ignored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most languages, identifiers can start with a leading underscore (this is sometimes reserved by the system, but it's still a valid identifier). FileCheck's behaviour essentially treats dashes like underscores (hence why custom check prefixes can contain them).

Have you seen any examples of typos where the problem was a leading underscore/dash/digit?

@klensy
Copy link
Contributor Author

klensy commented May 20, 2024

I'll move second commit to separate PR, as it looks like less controversial (and it simpler to bless tests).

@@ -2468,24 +2468,36 @@ FileCheckString::CheckDag(const SourceMgr &SM, StringRef Buffer,

static bool ValidatePrefixes(StringRef Kind, StringSet<> &UniquePrefixes,
ArrayRef<StringRef> SuppliedPrefixes) {
static const char *Suffixes[] = {"-NEXT", "-SAME", "-EMPTY", "-NOT",
"-COUNT", "-DAG", "-LABEL"};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a place that checks for some (but not all) cases of multiple suffixes. That place should use the same list of suffixes that this function does, that is, there should be one list that is shared between them. Ideally that other place would also be enhanced to check for all combinations, but that can wait for another patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which place exactly? I don't very knowledgeable about codebase.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search for "DAG-NOT" and you will see the place I mean. Looks like it specifically checks for NOT combined with anything else, but not other combinations. FileCheck does not have defined semantics for any combination of suffixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants