refactor: Optimize and refactor project by 100gle · Pull Request #44 · 100gle/wordcounter

100gle · 2025-07-17T13:29:13Z

主要改进

本 PR 包含了多项重大优化和重构：

🚀 性能优化

字符计数算法优化: 采用单次遍历算法和直接 Unicode 检查，显著提升性能
现代化 Go 代码: 移除不必要的抽象，提升代码执行效率

🔧 框架重构

从 Fiber 迁移到 Echo: 使用更轻量级和高性能的 Echo 框架
API 设计优化: 改进 API 设计以更好地支持库的使用

✅ 测试改进

单元测试优化: 改进测试用例覆盖率和测试质量

技术细节

单次遍历字符计数算法减少了时间复杂度
Echo 框架提供更好的性能和更简洁的 API
优化的 API 设计使库更易于集成和使用
增强的测试覆盖确保代码质量

这些优化将显著提升应用的性能和可维护性。

- 文件读取优化: * 使用 io.ReadAll 一次性读取整个文件，替代1024字节小缓冲区循环读取 * 避免UTF-8字符在缓冲区边界被分割的问题 * 简化文件读取逻辑，提高大文件处理性能 - 内存分配优化: * 重构 TextCounter.Count 方法，新增 CountBytes 方法 * 直接从字节数组解码UTF-8字符，使用 utf8.DecodeRune 替代 Scanner * 移除不必要的 bufio.Scanner 和 strings.NewReader，减少内存分配 * 优化行数统计，直接扫描换行符而非逐行处理 - 字符统计算法改进: * 保持与原有行为完全一致（不计算换行符为字符） * 正确处理空文件和无换行符文件的行数统计 * 显著减少内存分配和垃圾回收压力所有测试通过，功能完全兼容，为后续并发优化奠定基础

- 重构 DirCounter.Count 方法，采用两阶段处理：先收集文件路径，再并发处理 - 实现工作池模式，限制并发 goroutine 数量为 CPU 核心数，避免过度并发 - 使用带索引的结果收集机制，确保文件处理结果的顺序与原始顺序一致 - 改进错误处理，确保任何文件处理错误都能正确传播 - 显著提高大目录处理性能，同时保持结果的确定性和可预测性

- 新增 errors.go 文件，定义自定义错误类型和错误分类 - 实现 WordCounterError 结构体，支持错误类型、上下文信息和错误链 - 改进所有核心模块的错误处理： * count.go: 更详细的输入验证和错误信息 * file.go: 区分文件不存在和读取错误，提供具体文件路径 * dir.go: 增强模式匹配错误处理，新增 IsIgnoredWithError 方法 * export.go: 完善 CSV 和 Excel 导出的错误处理 * helper.go: 新增 ToAbsolutePathWithError 方法，正确处理路径转换错误 - 改进命令行工具的参数验证和错误提示 - 保持向后兼容性，所有测试通过

- 新增 constants.go: 统一管理常量，包括导出类型、模式、默认值、服务器配置等 - 新增 interfaces.go: 定义核心接口，提高代码抽象层次和可扩展性 * Counter: 统一计数器接口 * CharacterCounter: 字符计数接口 * IgnoreChecker: 忽略检查接口 * Server: 服务器接口 - 新增 cmd_utils.go: 提取命令行公共功能 * CounterExporter: 统一导出处理逻辑，消除代码重复 * 路径和参数验证函数 - 更新现有结构体实现相应接口，提高代码一致性 - 使用常量替换魔法字符串和数字，提高代码可读性 - 优化工作池配置，增加最小/最大工作线程限制 - 保持完全向后兼容，所有测试通过

- 新增 errors_test.go: 测试自定义错误类型的所有功能 * 测试各种错误类型的创建和消息格式 * 测试错误上下文和错误链功能 - 新增 cmd_utils_test.go: 测试命令行工具函数 * 测试路径验证、导出类型验证、模式验证 * 测试 CounterExporter 的导出功能 - 新增 constants_test.go: 测试所有常量定义 * 确保常量值的正确性和一致性 - 扩展 helper_test.go: 测试新增的 ToAbsolutePathWithError 函数 - 新增 cmd/wordcounter/main_test.go: 为命令行包提供基础测试覆盖 * 测试全局变量的设置和读取 * 避免测试 log.Fatal 调用，专注于可测试的部分测试覆盖率显著提升： - 主包: 70.4% → 79.8% - cmd包: 0% → 16.1%

- 新增包级别文档注释，详细说明库的功能和用法 - 完善核心结构体和方法的文档注释： * TextCounter: 添加详细的功能说明和使用示例 * FileCounter: 说明文件处理流程和错误处理机制 * Count/CountBytes: 详细说明字符统计算法和性能优化 - 大幅更新 README.md: * 重新组织功能特性，使用图标和更清晰的描述 * 更新代码示例，包含完整的错误处理 * 新增性能和优化部分，说明并发处理和内存优化 * 添加基准测试数据和可靠性特性说明 * 完善API文档链接和贡献指南 - 改进代码示例的实用性，展示最佳实践 - 所有文档更新保持与最新代码功能同步

- 在export.go中添加通用的导出函数：ExportCounterCSV、ExportCounterExcel、ExportCounterTable - 添加GetHeaderAndRows辅助函数，利用Counter接口统一处理数据 - 简化FileCounter和DirCounter的导出方法，移除重复代码 - 移除不再需要的内部exporter字段和GetHeaderAndRow/GetHeaderAndRows方法 - 更新相关测试以使用新的通用函数 - 所有测试通过，功能保持不变

- 在interfaces.go中添加Exportable接口，统一导出方法签名 - 在main.go中添加handleExport通用函数，处理所有导出逻辑 - 重构runFileCounter和runDirCounter，使用统一的handleExport函数 - 消除了两个函数中重复的switch语句和错误处理逻辑 - 代码更简洁，维护性更好，所有测试通过

主要改进： - 移除Exporter结构体，改为直接使用函数：ExportToCSV、ExportToExcel、ExportToTable - 消除每次导出都创建新table.Writer实例的性能浪费 - 简化CounterExporter，移除不必要的类型断言 - 更新所有相关测试，使用新的函数式API - 代码更符合Go语言惯用法：简单、直接、高效 - 减少内存分配，提升导出性能 - 保持向后兼容性，所有测试通过

主要改进： - 将所有interface{}替换为any（Go 1.18+现代语法） - 移除handleExport函数，恢复直接的switch语句（更符合Go简洁原则） - 添加测试文件忽略规则到.gitignore（*.csv, *.xlsx, test_output*等） - 清理测试生成的临时文件 - 代码更加简洁直接，符合Go语言'少即是多'的哲学 - 所有测试通过，功能完全正常总结：通过这次重构，我们： 1. 消除了FileCounter和DirCounter中的重复导出代码 2. 简化了export.go架构，提升了性能 3. 移除了不必要的抽象层 4. 现代化了Go语法 5. 保持了代码的简洁性和可维护性

- Privatize internal fields in FileCounter and DirCounter structs - Add getter methods for controlled access to private fields - Privatize internal helper functions (toAbsolutePathWithError, convertToSliceOfString, getTotal) - Privatize global export functions, users should use Counter interface methods - Maintain backward compatibility while improving encapsulation - Update all tests to use public API - Add comprehensive API optimization documentation Breaking changes: None (all public APIs remain unchanged) New features: GetStats(), GetFileCounters(), GetIgnoreList() methods

- Replace Fiber v2 with Echo v4 for better Go native HTTP server compatibility - Update server.go: migrate all handlers and routing logic to Echo - Update server_test.go: rewrite tests using httptest for Echo compatibility - Modernize code: replace interface{} with any type alias - Maintain API compatibility: all endpoints and response formats unchanged - All tests passing: verified functionality integrity Benefits: - Better integration with Go standard library net/http - Improved compatibility with Go native HTTP server - Modern Go syntax usage - Maintained backward compatibility

…ct Unicode checks - Replace unicode.In() with direct Unicode range checks for better performance - Implement single-pass processing to count lines and characters simultaneously - Use local variables to reduce struct field access overhead - Add comprehensive isChinese() function covering all CJK Unicode blocks - Improve line counting logic accuracy - Rename TextCounter to Counter for better naming consistency - Achieve zero memory allocation in CountBytes method - Add extensive benchmark tests showing significant performance improvements Performance improvements: - Zero memory allocations (0 B/op, 0 allocs/op) - Faster character classification with direct range checks - Reduced time complexity with single-pass algorithm - Better cache locality with local variables Also update .gitignore to exclude coverage files from repository.

100gle added 14 commits July 17, 2025 21:25

test(coverage): 优化单元测试测试用例

a2c7c18

100gle force-pushed the improve branch from 10baea7 to a2c7c18 Compare July 17, 2025 13:31

100gle changed the title ~~性能优化和框架重构~~ refactor: Optimize and refactor project Jul 17, 2025

100gle merged commit 7b61320 into main Jul 17, 2025
2 checks passed

100gle deleted the improve branch July 17, 2025 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Optimize and refactor project#44

refactor: Optimize and refactor project#44
100gle merged 14 commits into
mainfrom
improve

100gle commented Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

100gle commented Jul 17, 2025

主要改进

🚀 性能优化

🔧 框架重构

✅ 测试改进

技术细节

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant