Skip to content

Commit dcd65a3

Browse files
authored
feat(action-import-pro-doc): add performance description and suggestions (#381)
* feat(action-import-pro-doc): add performance description and suggestions
1 parent ad5640d commit dcd65a3

2 files changed

Lines changed: 48 additions & 0 deletions

File tree

  • docs
    • en-US/handbook/action-import-pro
    • zh-CN/handbook/action-import-pro

docs/en-US/handbook/action-import-pro/index.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,30 @@ After initiating an import, the import process will be executed in a separate ba
2525

2626
After the import is complete, you can view the import results in the import task.
2727

28+
#### About Performance
29+
30+
To evaluate the performance of large-scale data imports, we conducted comparative tests across different scenarios, field types, and trigger configurations (results may vary depending on server and database configurations, provided for reference only):
31+
32+
| Data Volume | Field Types | Import Configuration | Processing Time |
33+
|------|---------|---------|---------|
34+
| 1 million records | String, number, date, email, long text | • Trigger workflow: No<br>• Duplicate identification: None | Approx. 1 minute |
35+
| 500,000 records | String, number, date, email, long text, many-to-many | • Trigger workflow: No<br>• Duplicate identification: None | Approx. 16 minutes |
36+
| 500,000 records | String, number, date, email, long text, many-to-many, many-to-one | • Trigger workflow: No<br>• Duplicate identification: None | Approx. 22 minutes |
37+
| 500,000 records | String, number, date, email, long text, many-to-many, many-to-one | • Trigger workflow: Async notifications<br>• Duplicate identification: None | Approx. 22 minutes |
38+
| 500,000 records | String, number, date, email, long text, many-to-many, many-to-one | • Trigger workflow: Async notifications<br>• Duplicate identification: Update duplicates (50,000 duplicate records) | Approx. 3 hours |
39+
40+
Based on the above performance test results and current design considerations, here are explanations and recommendations regarding key influencing factors:
41+
42+
1. **Duplicate Record Processing Mechanism**: When selecting the **Update Duplicates** or **Update Duplicates Only** options, the system executes queries and updates record-by-record, which significantly reduces import efficiency. We recommend preprocessing your data (using professional tools for deduplication) before importing it into the system, which can substantially shorten the overall processing time.
43+
44+
2. **Relationship Field Processing Efficiency**: The system processes relationship fields using record-by-record query associations, which becomes a performance bottleneck in large data volume scenarios. For simple relationship structures (such as one-to-many associations between two tables), we recommend a phased import strategy: first import the main table's basic data, then establish relationships between tables afterward. If business requirements necessitate importing relationship data simultaneously, please refer to the performance test results above to plan import times accordingly.
45+
46+
3. **Workflow Processing Mechanism**: We do not recommend enabling workflow triggers when importing large volumes of data, primarily for two reasons:
47+
- When the import task status shows 100%, the task does not immediately end, as the system still needs additional time to create workflow execution plans. During this phase, the system generates corresponding workflow execution plans for each imported record, occupying the import thread, though this does not affect the use of already imported data.
48+
- After the import task is fully completed, the concurrent execution of numerous workflows may cause system resource constraints, affecting overall system response speed and user experience.
49+
50+
These three factors affecting performance are being considered for further optimization in future updates.
51+
2852
### Import Settings
2953

3054
#### Import Options - Trigger Workflow

docs/zh-CN/handbook/action-import-pro/index.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,30 @@
2525

2626
导入结束后,可在导入任务中查看导入结果。
2727

28+
#### 关于性能
29+
30+
为了评估大规模数据导入的性能表现,我们在不同场景、字段类型及触发配置下进行对比测试(具体在不同服务器、数据库配置下可能会有差异,仅供参考):
31+
32+
| 数据量 | 字段类型 | 导入配置 | 处理时长 |
33+
|------|---------|---------|---------|
34+
| 100万条 | 字符串、数字、日期、邮箱、长文本 | • 触发工作流:否<br>• 重复标识:无 | 约1分钟 |
35+
| 50万条 | 字符串、数字、日期、邮箱、长文本、多对多 | • 触发工作流:否<br>• 重复标识:无 | 约16分钟|
36+
| 50万条 | 字符串、数字、日期、邮箱、长文本、多对多,多对一 | • 触发工作流:否<br>• 重复标识:无 | 约22分钟 |
37+
| 50万条 | 字符串、数字、日期、邮箱、长文本、多对多,多对一 | • 触发工作流:异步触发通知<br>• 重复标识:无 | 约22分钟 |
38+
| 50万条 | 字符串、数字、日期、邮箱、长文本、多对多,多对一 | • 触发工作流:异步触发通知<br>• 重复标识:更新重复,且有5万重复数据 | 约3个小时 |
39+
40+
根据上述性能测试结果以及现有的一些设计,对影响因素有以下说明和建议:
41+
42+
1. **重复记录处理机制**:当选择**更新重复记录****仅更新重复记录**选项时,系统会逐条执行查询和更新操作,这会显著降低导入效率。如果你的 Excel 中存在无用的重复数据,将会进一步显著影响导入速度,建议在导入前对 Excel 中无用的重复数据进行清理(如使用专业工具进行去重),然后再导入系统,这样能避免浪费不必要的时间。
43+
44+
2. **关系字段处理效率**:系统处理关系字段时采用逐条查询关联的实现方式,这在大数据量场景下会成为性能瓶颈。对于简单关系结构(如两表一对多关联),建议采用分步导入策略:先导入主表基础数据,待完成后再建立表间关系。如业务需求必须同时导入关系数据,请参考上表中的性能测试结果合理规划导入时间。
45+
46+
3. **工作流处理机制**:不建议在大规模数据导入场景下启用工作流触发,主要基于以下两方面考虑:
47+
- 导入任务状态显示为100%,并不会立即结束,系统仍需额外时间处理工作流执行计划的创建。此阶段系统会为每条导入数据生成相应的工作流执行计划,占用导入线程,但不会影响已导入数据的使用。
48+
- 导入任务完全结束后,大量工作流的并发执行可能导致系统资源紧张,影响整体系统响应速度和用户体验。
49+
50+
以上 3 条影响因素,会考虑后续进一步优化。
51+
2852
### 导入配置
2953

3054
#### 导入选项-是否触发工作流

0 commit comments

Comments
 (0)