Conversation
|
I am not sure about this PR. I am trying not to change API w/o significant advantages. But I'll think more about this. |
|
it would appear this PR makes code slower for no good reason other than to appear as "safe" as Annex K |
|
I made this PR because I don't want to append the final \0 for mmaped files. \0 can appear inside a normal file. The current parser stop parsing at first \0. Maybe it should be an error, or ignore the \0? |
|
Not all strings are nul-terminated. Raw text files are a common case, as mentioned above. And this comes up a lot in FFI: I’ve worked with several languages whose strings are passed into C as unterminated (pointer, length) pairs. IIRC both Go and Python do this. And then there’s C++’s If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.) On the other hand, I see the string API as mostly for debugging, so is it really important to save the overhead of copying the text into a nul-terminated buffer? |
I have applied the suggestion in the last commit. |
Pass all test.
I'm not sure what the best API would be. At least
MIR_scan_string_sshould be safer.More improvement opportunities: Once it is sure that no \0 is in input string, maybe other code can be simplified too?