Skip to content
Open
63 changes: 31 additions & 32 deletions sherpa-onnx/csrc/file-utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,28 @@
#include "sherpa-onnx/csrc/file-utils.h"

#include <fstream>
#include <filesystem>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好,我们现在避免使用 filesystem 这个头文件. 详见 #2998

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

您好,看ai审阅还有超过4GB的大文件的读取问题,需要考虑循环分块读取吗?

#include <memory>
#include <sstream>
#include <string>
#include <vector>

#ifdef _WIN32
#include <windows.h>
#else
#include <limits.h>
#include <stdlib.h>
#endif

#include "sherpa-onnx/csrc/macros.h"

namespace sherpa_onnx {
std::wstring ToWideString(const std::string &s);
} // namespace sherpa_onnx

namespace sherpa_onnx {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这两个 namespace sherpa_onnx 块是连续的。建议将它们合并为一个,以提高代码的可读性和维护性。您可以移除此处的命名空间闭合和下一个命名空间的开始部分。

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u8path在C++20中不是被弃用了吗?长期看是不是不该在这里用u8path


bool FileExists(const std::string &filename) {
#ifdef _WIN32
std::wstring wide_path = ToWideString(filename);
std::ifstream file(wide_path);
return file.good();
#else
return std::ifstream(filename).good();
#endif
}

void AssertFileExists(const std::string &filename) {
Expand All @@ -33,7 +37,12 @@ void AssertFileExists(const std::string &filename) {
}

std::vector<char> ReadFile(const std::string &filename) {
#ifdef _WIN32
std::wstring wide_path = ToWideString(filename);
std::ifstream file(wide_path, std::ios::binary | std::ios::ate);
#else
std::ifstream file(filename, std::ios::binary | std::ios::ate);
#endif
if (!file.is_open()) {
return {};
}
Expand Down Expand Up @@ -119,33 +128,23 @@ std::string ResolveAbsolutePath(const std::string &path) {
return path;
}

#ifdef _WIN32
// Check if path is already absolute (drive letter or UNC path)
if ((path.size() > 1 && path[1] == ':') ||
(path.size() > 1 && path[0] == '\\' && path[1] == '\\')) {
return path;
}

char buffer[MAX_PATH];
if (GetFullPathNameA(path.c_str(), MAX_PATH, buffer, nullptr)) {
return std::string(buffer);
}

return path; // fallback on failure

#else
// POSIX: absolute paths start with '/'
if (path[0] == '/') {
try {
std::filesystem::path fs_path(path);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

std::filesystem::pathstd::string 构造时的编码行为是实现定义的。为了明确地将输入字符串 path 解释为 UTF-8 编码,从而避免在处理中文等非 ASCII 字符时出现潜在的编码问题,建议使用 std::filesystem::u8path()。这能让代码的意图更清晰,也更健壮。

Suggested change
std::filesystem::path fs_path(path);
std::filesystem::path fs_path = std::filesystem::u8path(path);


// If already absolute, return normalized path
if (fs_path.is_absolute()) {
return fs_path.lexically_normal().u8string();
}

// Convert to absolute path and normalize
std::filesystem::path abs_path = std::filesystem::absolute(fs_path);
abs_path = abs_path.lexically_normal();

return abs_path.u8string();
} catch (const std::filesystem::filesystem_error&) {
// If conversion fails, return original path
return path;
}

char buffer[PATH_MAX];
if (realpath(path.c_str(), buffer)) {
return std::string(buffer);
}

return path; // fallback on failure
#endif
}

} // namespace sherpa_onnx