-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Description
Issue Type
Bug
Source
binary
Secretflow Version
secretflow 1.0.0b3
OS Platform and Distribution
Asianux
Python version
3.8.16
Bazel version
No response
GCC/Compiler version
No response
What happend and What you expected to happen.
运行星河杯隐匿查询代码报错(数据量:1000w), 提示是节点在内存上运行太慢,工作进程被杀。
运行环境:4核8G内存
swap_space: 20G
请问应该对ray集群进行怎样的配置以满足需求。
Reproduction code to reproduce the issue.
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class name: SPURuntime
The actoris dead because its worker process has died. Worker exit type: NODE.OUT_OF MEMORY Worker exit detail: Task was lilled due to the node running low on memory
Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html.
Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task.
Set max restarts and max task retries to enable retry when the task crashes due to OOM.
To adjust the kill threshold, set the environment variable 'RAY_memory_usage_ threshold' when starting Ray.
To disable worker killing, set the environment variable 'RAY_memory_monitor refresh ms' to zero.
Metadata
Metadata
Assignees
Labels
No labels