Skip to content

对于星核杯隐匿查询案例代码运行有关Ray报错 #126

@integrationex01

Description

@integrationex01

Issue Type

Bug

Source

binary

Secretflow Version

secretflow 1.0.0b3

OS Platform and Distribution

Asianux

Python version

3.8.16

Bazel version

No response

GCC/Compiler version

No response

What happend and What you expected to happen.

运行星河杯隐匿查询代码报错(数据量:1000w), 提示是节点在内存上运行太慢,工作进程被杀。
运行环境:4核8G内存
swap_space: 20G
请问应该对ray集群进行怎样的配置以满足需求。

Reproduction code to reproduce the issue.

ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class name: SPURuntime


The actoris dead because its worker process has died. Worker exit type: NODE.OUT_OF MEMORY Worker exit detail: Task was lilled due to the node running low on memory
Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html.
Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task.
Set max restarts and max task retries to enable retry when the task crashes due to OOM.
To adjust the kill threshold, set the environment variable 'RAY_memory_usage_ threshold' when starting Ray.
To disable worker killing, set the environment variable 'RAY_memory_monitor refresh ms' to zero.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions