-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fix #3171 : 去掉生成绝对路径函数中的窄字符WindowsAPI调用,改用宽字符API和C++17 std::filesystem来避免win平台中文路径问题 #3255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 4 commits
391b62a
750f6a3
3e6f259
25c5a4e
f27c320
43db5d0
01881d3
989488c
853d5fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -13,16 +13,30 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||
| #ifdef _WIN32 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include <windows.h> | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #else | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include <sys/stat.h> | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include <unistd.h> | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include <limits.h> | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include <stdlib.h> | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| #endif | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include "sherpa-onnx/csrc/macros.h" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| namespace sherpa_onnx { | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| std::wstring ToWideString(const std::string &s); | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| std::string ToString(const std::wstring &s); | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
23
to
+26
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| namespace sherpa_onnx { | |
| std::wstring ToWideString(const std::string &s); | |
| std::string ToString(const std::wstring &s); | |
| #include "sherpa-onnx/csrc/text-utils.h" | |
| namespace sherpa_onnx { |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe which function can throw in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
谢谢,确实触发概率太低,我将去掉
Outdated
Copilot
AI
Mar 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file now catches std::exception, but it does not include <exception>. Add the direct standard header to avoid relying on transitive includes.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition is toooooo complicated. Please use multilines of code to simplify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好滴
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Examine the file around lines 73-81
nl -ba sherpa-onnx/csrc/file-utils.cc | sed -n '65,90p'Repository: k2-fsa/sherpa-onnx
Length of output: 102
🏁 Script executed:
#!/bin/bash
# Examine the file around lines 73-81 using cat
cat -n sherpa-onnx/csrc/file-utils.cc | sed -n '65,90p'Repository: k2-fsa/sherpa-onnx
Length of output: 939
🏁 Script executed:
#!/bin/bash
# Look at the start of the file for includes and to understand context
head -n 64 sherpa-onnx/csrc/file-utils.ccRepository: k2-fsa/sherpa-onnx
Length of output: 1509
🏁 Script executed:
#!/bin/bash
# Check the actual definitions of these constants in Windows
python3 << 'EOF'
# On 64-bit systems:
SIZE_MAX_64 = (1 << 64) - 1 # 2^64 - 1
MAXDWORD = (1 << 32) - 1 # 2^32 - 1 = 4,294,967,295
print(f"SIZE_MAX on 64-bit: {SIZE_MAX_64:,} bytes ({SIZE_MAX_64 / (1024**4):.1f} TB)")
print(f"MAXDWORD (ReadFile limit): {MAXDWORD:,} bytes ({MAXDWORD / (1024**3):.2f} GB)")
print(f"\nIf a file is 5GB:")
print(f" - Passes SIZE_MAX check: YES")
print(f" - Allocated buffer size: 5GB")
print(f" - ReadFile call truncates to: {MAXDWORD:,} bytes")
print(f" - bytes_read: {MAXDWORD:,} bytes")
print(f" - buffer.size(): 5,368,709,120 bytes")
print(f" - bytes_read != buffer.size(): TRUE (causes return {})")
print(f"\nResult: File silently fails to read, appearing as read failure")
EOFRepository: k2-fsa/sherpa-onnx
Length of output: 281
🏁 Script executed:
#!/bin/bash
# Check the exact definitions of these constants in Windows
python3 << 'EOF'
# On 64-bit systems:
SIZE_MAX_64 = (1 << 64) - 1 # 2^64 - 1
MAXDWORD = (1 << 32) - 1 # 2^32 - 1 = 4,294,967,295
print(f"SIZE_MAX on 64-bit: {SIZE_MAX_64:,} bytes ({SIZE_MAX_64 / (1024**4):.1f} TB)")
print(f"MAXDWORD (ReadFile limit): {MAXDWORD:,} bytes ({MAXDWORD / (1024**3):.2f} GB)")
print(f"\nIf a file is 5GB:")
print(f" - Passes SIZE_MAX check: YES")
print(f" - Allocated buffer size: 5GB")
print(f" - ReadFile call truncates to: {MAXDWORD:,} bytes")
print(f" - bytes_read: {MAXDWORD:,} bytes")
print(f" - buffer.size(): 5,368,709,120 bytes")
print(f" - Comparison bytes_read != buffer.size(): TRUE")
print(f"\nResult: File silently fails to read, appearing as read failure")
EOFRepository: k2-fsa/sherpa-onnx
Length of output: 508
Guard against ReadFile DWORD byte-count limit for large files.
Files larger than 4 GB (MAXDWORD) will silently fail to read. At line 80, ReadFile can read at most ~4.3 GB per call, but the code allocates buffers up to SIZE_MAX (~16 exabytes on 64-bit systems) without checking this limit. For files between 4 GB and SIZE_MAX, the cast static_cast<DWORD>(buffer.size()) truncates silently, bytes_read will be less than buffer.size(), and the check at line 81 will reject the file.
Add a guard to reject files exceeding MAXDWORD before allocation:
Proposed fix
- if (!GetFileSizeEx(hFile, &file_size) || file_size.QuadPart > SIZE_MAX) {
+ if (!GetFileSizeEx(hFile, &file_size) || file_size.QuadPart < 0 ||
+ file_size.QuadPart > static_cast<LONGLONG>(SIZE_MAX) ||
+ file_size.QuadPart > static_cast<LONGLONG>(MAXDWORD)) {
return {};
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/file-utils.cc` around lines 73 - 81, The code currently
allocates buffer based on file_size.QuadPart and calls ReadFile with a DWORD
length which will overflow for files > MAXDWORD; add a guard after GetFileSizeEx
to check if file_size.QuadPart > MAXDWORD and return {} (or handle error) before
allocating the std::vector and before casting to DWORD, ensuring ReadFile is
only called with a safe static_cast<DWORD>(buffer.size()) and avoiding silent
truncation; reference GetFileSizeEx, file_size, buffer, ReadFile and MAXDWORD
when making this change.
Copilot
AI
Mar 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows ReadFile casts buffer.size() to DWORD for the ReadFile() call. For files >4GiB this overflows / truncates the requested read size; the function will then fail even though SIZE_MAX is larger. Consider rejecting sizes > DWORD max explicitly or reading in a loop/chunks.
| DWORD bytes_read = 0; | |
| if (!::ReadFile(hFile, buffer.data(), static_cast<DWORD>(buffer.size()), &bytes_read, nullptr) || | |
| bytes_read != buffer.size()) { | |
| return {}; | |
| } | |
| size_t total_read = 0; | |
| while (total_read < buffer.size()) { | |
| size_t remaining = buffer.size() - total_read; | |
| DWORD to_read = remaining > static_cast<size_t>(MAXDWORD) | |
| ? MAXDWORD | |
| : static_cast<DWORD>(remaining); | |
| DWORD bytes_read = 0; | |
| if (!::ReadFile(hFile, buffer.data() + total_read, to_read, &bytes_read, nullptr)) { | |
| return {}; | |
| } | |
| if (bytes_read == 0) { | |
| // Unexpected end of file | |
| return {}; | |
| } | |
| total_read += bytes_read; | |
| } |
Copilot
AI
Mar 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the non-Windows ReadFile path, tellg() can return -1 on failure; constructing std::vector<char> buffer(size) would then attempt a huge allocation due to signed-to-unsigned conversion. Check that size >= 0 (and that the stream is in a good state) before allocating.
Copilot
AI
Mar 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentions refactoring ResolveAbsolutePath to use std::filesystem and lexically_normal(), but the implementation still uses GetFullPathNameW/realpath and does not do lexical normalization. Either update the PR description to match the actual approach, or implement the described std::filesystem-based normalization.
Uh oh!
There was an error while loading. Please reload this page.