Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Performance on LocalFileSystem.ls() if detail=False #1789

Conversation

FrankEssenberger
Copy link
Contributor

Relates to #1788

Only get the infos on a file if they are requested by the user - otherwise return just the path as usual.

@martindurant
Copy link
Member

The failures are not your fault (also seen in #1790 ), I'll fix early next week.

@martindurant
Copy link
Member

The windows failure I think is genuine - we are getting windows path separators inside an otherwise posix path in the ls() code path.

…o-details' into improve-performance-when-using-no-details

# Conflicts:
#	fsspec/implementations/tests/test_local.py
@FrankEssenberger
Copy link
Contributor Author

removed the hard coded / in the tests, I hope it is ok now - I see a different error in the pipeline which could be unrelated I guess.

@martindurant
Copy link
Member

I think using pathlib is the wrong choice for two reasons:

  • it is considerably slower than string manipulation
  • you will get windows paths (in the test), but actually we want the real code to return posix paths always, that's the contract fsspec promises. So instead of fixing the test, we need to fix the code.

Hah, yeah, instead of removing the one errant "\", it made all the separators like that:

  -     'C:\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-runneradmin\\pytest-0\\test_ls_on_files0\\file_2.json',
  ?        ^^     ^^           ^^       ^^     ^^    ^^                     ^^        ^^                 ^^
  +     'C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_ls_on_files0/file_2.json',

@FrankEssenberger
Copy link
Contributor Author

FrankEssenberger commented Feb 25, 2025

ah I was not aware of the we want the real code to return posix paths always paradime - who uses windows anyway :-). No jokes aside there is the make_path_posix this should do the trick if I use it properly.

@martindurant
Copy link
Member

Can you do a speed test, please, and then I'll merge this. I ask because make_path_posix isn't free either, but should still be faster than info().

@FrankEssenberger
Copy link
Contributor Author

Can you do a speed test, please, and then I'll merge this. I ask because make_path_posix isn't free either, but should still be faster than info().

sure. Since the info calls the make_path_posix as well it must be faster. I would assume the path.stat() is the costly thing to do in the info and if I add a very deep structure with a lot of data the aggregation in the test it will take time. What do you think would be a fair setup for such a test? Is there some test creating a representative data layout?

@martindurant
Copy link
Member

I meant a quick and dirty test, perhaps on your home directory - no need to go crazy or to add code.

@martindurant
Copy link
Member

Running the following locally:

In [1]: import fsspec
In [2]: fs = fsspec.filesystem("file")
In [3]: %timeit fs.ls("", detail=False)

I get 85us on this branch and 171us on master. So 👍 .

@martindurant martindurant merged commit 56054c0 into fsspec:master Feb 25, 2025
10 checks passed
@FrankEssenberger
Copy link
Contributor Author

Thanks for taking care of the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants