Skip to content

Conversation

@AlexBlueSteele
Copy link
Contributor

@AlexBlueSteele AlexBlueSteele commented Oct 8, 2025

Ticket: #2186
Test run output:
File index

{
  "_index": "test",
  "_id": "REDACTED",
  "_score": null,
  "_source": {
    "meta": {
      "raw": {
        "X-TIKA:Parsed-By": "org.apache.tika.parser.DefaultParser",
        "X-TIKA:Parsed-By-Full-Set": "org.apache.tika.parser.DefaultParser",
        "resourceName": "test.xml",
        "Content-Type": "application/xml"
      }
    },
    "file": {
      "extension": "xml",
      "content_type": "application/xml",
      "created": "REDACTED",
      "last_modified": "REDACTED",
      "last_accessed": "REDACTED",
      "indexing_date": "REDACTED",
      "filesize": REDACTED,
      "filename": "test.xml",
      "url": "file://REDACTED//test.xml",
      "indexed_chars": REDACTED
    },
    "path": {
      "root": "REDACTED",
      "virtual": "\\REDACTED\\test.xml",
      "real": "\\\\REDACTED\\test.xml"
    },
    "attributes": {
      "owner": "DOMAIN1\\testuser",
      "permissions": 0,
      "acl": [
        {
          "principal": "DOMAIN1\\testuser",
          "type": "ALLOW",
          "permissions": [
            "APPEND_DATA",
            "DELETE",
            "DELETE_CHILD",
            "EXECUTE",
            "READ_ACL",
            "READ_ATTRIBUTES",
            "READ_DATA",
            "READ_NAMED_ATTRS",
            "SYNCHRONIZE",
            "WRITE_ACL",
            "WRITE_ATTRIBUTES",
            "WRITE_DATA",
            "WRITE_NAMED_ATTRS",
            "WRITE_OWNER"
          ]
        },
        {
          "principal": "DOMAIN1\\Test Admins",
          "type": "ALLOW",
          "permissions": [
            "APPEND_DATA",
            "DELETE",
            "DELETE_CHILD",
            "EXECUTE",
            "READ_ACL",
            "READ_ATTRIBUTES",
            "READ_DATA",
            "READ_NAMED_ATTRS",
            "SYNCHRONIZE",
            "WRITE_ACL",
            "WRITE_ATTRIBUTES",
            "WRITE_DATA",
            "WRITE_NAMED_ATTRS",
            "WRITE_OWNER"
          ]
		}
    }
  },
  "highlight": {
    "path.virtual": [
      "<b>\\REDACTED\\test.xml</b>"
    ],
    "file.filename": [
      "<b>test.xml</b>"
    ]
  },
  "sort": [
    REDACTED
  ]
}

Folder index

    {
        "path": {
            "root": "REDATED",
            "virtual": "Group3Folder",
            "real": "W:\\Group3Folder"
        },
        "file": {
            "content_type": "text/directory",
            "created": "REDACTED",
            "last_modified": "REDACTED",
            "last_accessed": "REDACTED",
            "filename": "Group3Folder"
        },
        "attributes": {
            "owner": "DOMAIN\\FileShareManagers",
            "permissions": 0,
            "acl": [
                {
                    "principal": "DOMAIN\\group3",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ],
                    "flags": [
                        "DIRECTORY_INHERIT",
                        "FILE_INHERIT",
                        "INHERIT_ONLY"
                    ]
                },
                {
                    "principal": "BUILTIN\\Administrators",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ],
                    "flags": [
                        "DIRECTORY_INHERIT",
                        "FILE_INHERIT"
                    ]
                },
                {
                    "principal": "DOMAIN\\Domain Users",
                    "type": "ALLOW",
                    "permissions": [
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE"
                    ]
                },
                {
                    "principal": "DOMAIN\\service accounts",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ],
                    "flags": [
                        "DIRECTORY_INHERIT",
                        "FILE_INHERIT"
                    ]
                },
                {
                    "principal": "DOMAIN\\FileShareManagers",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ]
                },
                {
                    "principal": "\\CREATOR OWNER",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ],
                    "flags": [
                        "DIRECTORY_INHERIT",
                        "FILE_INHERIT",
                        "INHERIT_ONLY"
                    ]
                },
                {
                    "principal": "NT AUTHORITY\\SYSTEM",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ],
                    "flags": [
                        "DIRECTORY_INHERIT",
                        "FILE_INHERIT"
                    ]
                },
                {
                    "principal": "DOMAIN\\FileShareManagers",
                    "type": "ALLOW",
                    "permissions": [
                        "APPEND_DATA",
                        "DELETE",
                        "DELETE_CHILD",
                        "EXECUTE",
                        "READ_ACL",
                        "READ_ATTRIBUTES",
                        "READ_DATA",
                        "READ_NAMED_ATTRS",
                        "SYNCHRONIZE",
                        "WRITE_ACL",
                        "WRITE_ATTRIBUTES",
                        "WRITE_DATA",
                        "WRITE_NAMED_ATTRS",
                        "WRITE_OWNER"
                    ],
                    "flags": [
                        "DIRECTORY_INHERIT",
                        "FILE_INHERIT",
                        "INHERIT_ONLY"
                    ]
                }
            ]
        },
        "_id": "REDACTED"
    },

Note

Adds fs.acl_support to collect and index file/folder ACLs, updates mappings/templates, wiring, docs, and tests.

  • Framework/Core:
    • Introduces FileAcl and FsCrawlerUtil#getFileAcls(...); adds ACL extraction for local files.
    • Extends beans.Attributes with acl and beans.Folder with attributes; propagate attributes/ACLs in FsParserAbstract.
    • FileAbstractModel carries acls; FileAbstractorFile fills ACLs (FTP/SSH return empty lists).
  • Settings:
    • New flag fs.acl_support (+ defaults, parsing, validation warning if attributes_support is false).
  • Elasticsearch:
    • Add component template fscrawler_mapping_attributes with attributes.acl.{principal,type,permissions,flags}.
    • Include attributes component in folders index template; load it at startup.
  • Docs:
    • Document acl_support in admin guides (index.rst, local-fs.rst, rest.rst) with examples.
  • Tests:
    • Add ACL-related tests (FsCrawlerUtilTest, JsonUtilTest, settings loader/parser tests) and update sample configs.

Written by Cursor Bugbot for commit b61843c. This will update automatically on new commits. Configure here.

@dadoonet
Copy link
Owner

@AlexBlueSteele Do you want to continue on this PR? If so, could you make sure that all the tests are passing locally?

And then I can start reviewing the code ;)

@AlexBlueSteele
Copy link
Contributor Author

@AlexBlueSteele Do you want to continue on this PR? If so, could you make sure that all the tests are passing locally?

And then I can start reviewing the code ;)

I will work on this. Thanks!

@AlexBlueSteele AlexBlueSteele marked this pull request as draft November 6, 2025 16:03
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 6, 2025

@dadoonet
Copy link
Owner

dadoonet commented Nov 6, 2025

I'm wondering if we need the acl_support setting? I believe that the other one is enough. WDYT?

@AlexBlueSteele
Copy link
Contributor Author

I'm wondering if we need the acl_support setting? I believe that the other one is enough. WDYT?

I think it's nice to have because some people might not care about ACLs when crawling on windows. It adds slight overhead in storage and performance.

@AlexBlueSteele
Copy link
Contributor Author

@dadoonet I updated the demo data above to reflect the ACL collection for the folder index as well. The final part that I think needs to be changed its the verbosity of the debug statements. I don't know how verbose you want it. Thoughts?

@dadoonet dadoonet marked this pull request as ready for review November 18, 2025 15:21
Copy link
Owner

@dadoonet dadoonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great start and thank you so much for working on this.
Bonus point for writing some documentation ;)

Could you fix the issues I mentioned before?

Also, next steps would be IMO to add if possible an Integration Tests about this. May be in FsCrawlerTestAttributesIT?

Would that be possible for you?

Also did you run the code locally and does it do what you are expecting?

I'm still wondering if we really need another acl_support attribute or consider that we want want to collect all the available metadata on files and folders whenever we are asking for attributes...

import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste ?

if (acls == null || acls.isEmpty()) {
this.acls = Collections.emptyList();
} else {
this.acls = Collections.unmodifiableList(new ArrayList<>(acls));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we write this?

this.acls = Collections.unmodifiableList(acls);

Or may be better:

Replace the whole:

        if (acls == null || acls.isEmpty()) {
            this.acls = Collections.emptyList();
        } else {
            this.acls = Collections.unmodifiableList(new ArrayList<>(acls));
        }

With:

this.acls = acls;

}
if (fsSettings.getFs().isAclSupport()) {
List<FileAcl> fileAcls = fileAbstractModel.getAcls();
if (!fileAcls.isEmpty()) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check this? Can't we just return whatever the fileAbstractModel.getAcls(); sent?

path = new Path();
file = new File();
file.setContentType(CONTENT_TYPE);
attributes = null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's needed.

file.setLastModified(localDateTimeToDate(modification));
file.setCreated(localDateTimeToDate(creation));
file.setLastAccessed(localDateTimeToDate(lastAccess));
attributes = null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's needed.

System.out.println("[ACL DEBUG] Entry -> principal=" + principal + ", type=" + type + ", permissions=" + permissions + ", flags=" + flags);
}

return Collections.unmodifiableList(result);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return the result.

result.add(new FileAcl(principal, type, permissions, flags));
System.out.println("[ACL DEBUG] Entry -> principal=" + principal + ", type=" + type + ", permissions=" + permissions + ", flags=" + flags);
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something like:

logger.debug("ACL entries found for [{}]: {}", file, result);


return Collections.unmodifiableList(result);
} catch (Exception e) {
System.out.println("[ACL DEBUG] Failed to resolve ACLs for " + file.getAbsolutePath() + ": " + e.getMessage());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line

} catch (Exception e) {
System.out.println("[ACL DEBUG] Failed to resolve ACLs for " + file.getAbsolutePath() + ": " + e.getMessage());
logger.warn("Failed to determine 'acl' of {}: {}", file, e.getMessage());
return Collections.emptyList();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add logger.debug("Failed to determine acl for [{}]", file, e);


@Test
public void aclEntries() {
assertThat(FsCrawlerUtil.getFileAcls(file)).isNotNull();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it's not going to work on Linux?
I did not test your code yet ;)

But ideally, if we are running on Windows (see lines 56-58), we can check if we have some acl for the file and then test that the output of the getFileAcls is what we expect?

@AlexBlueSteele
Copy link
Contributor Author

That's a great start and thank you so much for working on this. Bonus point for writing some documentation ;)

Could you fix the issues I mentioned before?

Also, next steps would be IMO to add if possible an Integration Tests about this. May be in FsCrawlerTestAttributesIT?

Would that be possible for you?

Also did you run the code locally and does it do what you are expecting?

I'm still wondering if we really need another acl_support attribute or consider that we want want to collect all the available metadata on files and folders whenever we are asking for attributes...

I will work on this as soon as I get the chance! Thanks for spending the time to CR!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants