-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
of::filesystem::path with implicit narrow string conversion #8353
base: master
Are you sure you want to change the base?
Conversation
ha! ok well at this point we're in unchartered territory... especially pasted text, it may have underwent UTF-8 conversion without knowing how MSVC's editor processes the encoding... maybe try an image load from a method that receives a live existing path from a WIN API? that would ensure it's actually wide. also ofImage looks like it's ready to correctly process wide paths down to the so considering you're not crashing with the normal stuff and that you can assign a path to std::string, we can consider this PR ready to be merged, but I'd still wait post 12.1, both to ensure a thorough cleanup around the -FS calls, and allowing a bit of time for git-users to throw things at it. |
One good thing to test are filenames with cyrillic characters. |
Testing with an image from this discussion: (#7435) ( also the same output with the nightly )
Outputs:
The following causes a crash:
Crashes in ofLog(), openFrameworks/libs/openFrameworks/utils/ofLog.h Lines 507 to 511 in 45c3458
|
@NickHardeman good to know that does not crash! @dimitre was your panic from a test typing "someunicodestring🫥" in source code, or from getting an actual filename from WIN? in any case getting your triggering data would be good. |
loading an actual file, Probably one from here, https://github.com/dimitre/ofTests/tree/main/WinEncoding/bin/data |
@NickHardeman can you test @dimitre's file? so we have a good data point vs previous behaviour. at this point we must consider the problem of unicode in general in OF separated from "convenient widepath-to-string conversion on windows" (the goal of this PR, which has been confirmed to be attained). ultimately we should be able to trigger exceptions by generating broken unicode to clarify how it behaves with normal std:filesystem::path, for instance on macOS14 the below non-OF code throws an exception: try {
std::wstring ws = std::wstring{L"\xD800\xDC00"};
std::filesystem::path p{ws};
std::string str = p.string();
std::cout << "Path string: " << str << std::endl;
}
catch (const std::exception& e) {
std::cerr << "Caught exception: " << e.what() << std::endl;
} |
@NickHardeman I think it would be worth testing all the examples in Both with the nightly and with @artificiel's PR applied. Also even something simple like: As I can imagine there are a lot of projects and addons that might do something like that? I am guessing that does break with the current nightly and would be fixed by this PR And thanks for doing all this testing!! |
@ofTheo I think you need an but we know it works because this PR includes a test (line 189 would fail on the nightly): openFrameworks/tests/utils/fileUtils/src/main.cpp Lines 189 to 190 in 8911f00
why I'm suggesting not merging this now is not for the base function, but the more expanded usage (like the virtual interface problem that was raised when this topic was active in October, and certainly other edge cases like vectors). and to be clear the snippet in my post just above will fail everywhere independently of OF as the escaped sequence is not convertible, so it's important to be specific in the tests and expectations of this PR as |
Testing the above in @artificiel's branch and the nightly outputs: Testing the following
In this branch In the nightly: std::filesystem::path p(L"canapés-спациба.jpg"); This branch: Nightly: When I change the following in ofFileUtils.h in this branch, it avoids the crash at least and the image does not load.
and more catches in
|
In this branch:
Output from the nightly
|
OK! the direct w_char constructor should deal with |
All of the examples in input_output seem to function as they should on both the nightly and this branch in regards to file paths, except the following: Both the nightly and this branch have the following error, though not sure it's related to fs::path
|
@artificiel thank you, can you add the extra catches in |
Great advancements @NickHardeman @artificiel |
@NickHardeman I've added the catch to ofPathToString() which is a convenience function to facilitate transition by catching conversion errors. however the scope of this PR is (1) anything that breaks direct use of what is related to actual wide-narrow conversion or internal OF/libs treatment is part of a larger effort, and before that work is attempted it might be ineffective to test too deeply as many errors will just go away when the FS stuff is revised. |
@dimitre we crossed posts! I was just clarifying the scope of this PR, and yes to what you said, but it's a second layer of work and my PR's for this are not ready. |
Thanks @artificiel - I would be in favor of merging this as soon as possible ( before 0.12.1 ) as it will prevent a good amount of issues for Windows users. We can do some follow up PRs if needed for edge cases but might be helpful to have in the nightlies so we can ask people on the forum to test it. |
@artificiel thank you for the clarification on the scope. Thinking of the additional functionality for this PR for Windows, specifically the std::filesystem::path to std::string conversion; it seems like a benefit to include for 0.12.1. Additional PRs could build upon this one, incorporating fixes from the PR @dimitre mentioned. |
@ofTheo @NickHardeman in theory this PR will not fix so many true existing issues as people are currently not sending wide strings through OF (it just does not work). if they start trying funky things between "now and 12.1's release", they will hit the same blocks we know are there but not-handled-yet. another PR is required to clean FileUtils (which was a work-in-progress that stopped in the fall), and most of it is done but not completely thought-out. then an audit of every place OF processes a path should be tested with wide unicode to ensure underlying calls are doing the right things fair task which requires a clean slate of time, and a couple of separate issues/PR (ex: as was noted above |
Thanks @artificiel for the explanation. I see what you mean about the need to clean up ofFileUtils with the ofToDataPathFS using .string() I think my priority for 0.12.1 is not as much to have unicode file paths supported ( which as you mentioned isn't working in 0.12.0 ), but to have what is in the current nightly / master work the same for Windows users as it did before. ( ie: no downgrade / errors with file paths ). To me that should be the priority for 0.12.1, and pushing wide file paths working well on all platforms to 0.13.0 From my understanding the main issue is that currently in the nightly builds any function that returns a file path where it used to return a string will break for Windows users. With that in mind if the goal is to have 0.12.0 like behavior can you see any minimal changes that could allow that without reverting all of the filesystem progress we have in there now? |
@ofTheo ah! but the current nightly does not introduce systematic of::filesystem::path! that's what why the FS-flavored functions are still laying there; there was a moment where it was enabled, then backtracked, which led to this PR to fix things properly then momentum sort of fizzled as "real world windows" was needed to test (what we're now doing). current ofToDataPath() returns a string, as do all path-ish methods: I now get why you want to push it out, but things are currently patched for whatever worked in 12.0 to keep working the same. |
@artificiel thanks!! I think I forgot about that rollback. 🤦♂️ So basically as it is now, on Windows, we'll only get issues if someone passes a wide string in as an argument to ofToDataPath? And in that case it will trigger an exception in |
@ofTheo exactly! all path methods are wrapped in |
@artificiel what do you think would be the best approach for removing ofPathToString()? Keep building/submitting in this PR? |
@nick it would make this PR complicated and hard to validate if "further work" is started. it's really a stepping stone. because OF commits are squashed (which is good IMO) it would also make a very wide single commit as it will touch so many files. once this is committed we can go step by step. so the sooner 12.1 can be tagged, the better! also one thing I hope is OK to do is break down ofFileUtils (it will need a bit of project template tweaks); it's a huge file and the interdependencies can be broken down with a bit of work, but after that it really make working on the classes better: |
BACKGROUND
We want to support
std::filesystem
at is provides many cross-platform filesystem functions, includingstd::filesystem::path
which is a representation of a thing on a filesystem (including knowing intelligently about it’s root drive (ex: C:) and the native separator, if it’s a directory or a file, permissions, etc). Being standardized means the responsability of dealing with the idiosyncrasies of diifferent OSes and filesystems is transferred to the compiler vendors.one great feature of
std::filesystem::path
is the conversion support withstd::(w)string
. combined with the constructor takingstd::(w)string
as an origin path, it means they can be used interchangeably in most «in» and «out» operations: you can pass a string to a method expecting a path (and it will get constructed in place) or you can pass a path to a method expecting a string, and it will get converted by it’s operator, essentially callingpath.string()
. almost awesome.PROBLEM
the standard stipulates that the conversion behaviour be done for the platform-native char type, which means
std::string
on most systems, butstd::wstring
on windows. so what’s written above is not totally true: on windows thestd::string
conversion operator is not implemented… meaning the above code does not compile. it should be written asthis is because on windows, handling file paths as narrow
std::string
is a blasphemy that makes it impossible to support the exotic wide unicode that might be present in a windows file path. however little to no OF code supportswstring
.the «solution» OF is providing is
ofPathToString()
which forces-converts and catches the exceptions, but that requires adding calls toofPathToString()
, and if you are doing that (which is just hiding the blasphemy) you might as well upgrade to supportstd::filesystem::path
to get proper native path support.so what is really needed is to add the conversion to
std::string
on msvc, essentially adding a line around here:https://github.com/microsoft/STL/blob/0d8f517ae3828284fed594741b847db940167a59/stl/inc/filesystem#L921-L923
OBJECTIVE
we want to tolerate the blasphemy of converting wide paths to narrow paths because a lot of historical «cross-platform» OF code carries paths inside
std:::string
.this must be considered as a stopgap «transparent» measure, and anywhere
std::string
is used to contain a path, it should be changed tostd::filesystem::path
.SOLUTION
microsoft has no interest in deviating from the standard in order to support our buggy cross-plaform usage (properly written win32 string-as-path cases are using wstring/wchar and switches between wide and narrow interface calls, often with
#ifdef WIN32
).it is repeated all over the place that the stdlib classes are not designed to be subclassed, so the obvious solution of doing:
is an upwards battle against the lib, which is discouraged.
contrary to other languages, C++ does not allow the refinement of classes after they are declared (for instance Objective-C has Categories, which allow you to graft methods and properties to existing classes dynamically at runtime). so there is no mecanism to specialize
std::filesystem::path
«in place».the solution is to make a new class that composes a
std::filesystem::path
and implements all its interface, forwarding everything to and from the internal member:which is basically this PR.
IMPACT ON EXISTING CODE
for historical reasons OF aliases
std::filesystem
asof::filesystem
, but that reason does not hold anymore. moreover if we want to specializeof::filesystem::path
we cannot «overwrite» it if it’s been imported fromstd::filesystem::path
, nor can we selectively importstd::filesystem::everyhing_but_path
and sneakily slide our own impl in the path gap.the approach taken is to stop importing
std::filesystem
(which is pointless anyway) meaning the core calls likeof::filesystems::exists()
must be corrected tostd::filesystem::exists()
(which has no side effect as it’s already what’s happening, and they’re easy to find as compilation will not work with them).then in ofConstants.h:
which means, for windows, the string conversion is now defined and other platforms behave exactly as they currently are (because they are already using
std::filesystem::path
even if it’s labelledof::filesystem::path
).there are no regression risks on the new functionality as currently none of this is possible. there is a small risk of not handling correctly all possible windows path uses, which can be corrected as we move forward and get feedback from windows people throwing their funny things into
of::filesystem::path
.as was discussed in #8138 there is a risk of virtual interfaces being thrown off in derived classes not supporting
of::filesystem::path
; the solution for backwards-compatibility is to maintain but deprecate thestd::string
interface and define impure (not-required)of::filesystem::paths
methods.finally, there are edge cases where conversion does not kick such as expressions (presuming one wants to transduce a path to a string in order to display a filename):
(these cases such as
ofDrawBitmapString()
should supportstd::wstring
(to display the funky unicode), or more probably be upgraded tostd::u8string
, new kid on the block (C++20), but it's outside the scope of the general mechanism proposed here, which can be expanded later)also, passing paths as strings in vectors
std::vector<std::string> my_files;
— or other indirect typing will not get the conversion treatment, so these edge cases will need fixing.QUESTIONS
should the try-catch of exception (due to rare incompatible exotic unicode) in narrowing be handled internally by
of::filesystem::path
? I have conflicted opinions: spontaneously, yes, to streamline things for users; but it introduces a magic value (return empty string to mean failure) a pattern which should be avoided. this PR keeps it clean (no try-catch to hide the blasphemous behaviour) with the idea that putting it to test will show what it reveals? in theory, the cases triggering that exception (such as throwing an asian path toofImage
) would already occur in current OF, and it might be better to identify these cases and fix the root causes? (and anyhow the end user will be out of luck as the path will be empty).the PR implements a
const char* to_narrow_cstr() const
method defined around amutable std::string cached_narrow_str_;
. it's assembled from internet sources and seems to work, but it would be good to have criticism on the strategy and implementation. (this method is required to supportw_char-to-char
conversion (we talk mostly about strings but everything also applies tochar *
))