Skip to content

Conversation

@pushpavanthar
Copy link

@pushpavanthar pushpavanthar commented Jan 16, 2026

Why are these changes needed?

This PR replaces SHA-1 with SHA-256 across Ray's internal hash operations to address security concerns in #53435 with a conservative approach for runtime environment paths. SHA-1 has known collision vulnerabilities and is deprecated - switching to SHA-256 improves security and FIPS compliance.

Related issue number

Fixes #53435

SHA-256 Migration (15 files)

  • Function collision detection
  • SSH control path hashing
  • Configuration cache keys
  • CloudWatch config hashing (with method renames and UTF-8 encoding fix)
  • File lock paths
  • Deployment code versioning
  • Feature hashing
  • Test files updated

SHA-1 Retained (5 files)

Runtime environment path generation kept SHA-1 to avoid:

  • Path length issues on Windows and network filesystems
  • Breaking existing cached environments
  • Unnecessary environment rebuilds

For content-addressable hashing (non-cryptographic), SHA-1 provides sufficient collision resistance.

Testing

  • All linter checks pass
  • No new errors introduced
  • Zero breaking changes to runtime environments

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly replaces SHA-1 with SHA-256 for internal hashing operations, which is a great security improvement. The changes are mostly straightforward. I've added a few comments to suggest explicitly specifying the encoding as 'utf-8' when calling .encode() on strings before hashing. This ensures consistent behavior across different platforms and avoids potential issues with default encodings.

sha1 = hashlib.sha1()
sha1.update(str(filepath.relative_to(relative_path)).encode())
sha256 = hashlib.sha256()
sha256.update(str(filepath.relative_to(relative_path)).encode())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
sha256.update(str(filepath.relative_to(relative_path)).encode())
sha256.update(str(filepath.relative_to(relative_path)).encode("utf-8"))

Comment on lines +176 to +177
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
ssh_control_hash = hashlib.sha256(cluster_name.encode("utf-8")).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode("utf-8")).hexdigest()

Comment on lines +48 to +49
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
ssh_control_hash = hashlib.sha256(cluster_name.encode("utf-8")).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode("utf-8")).hexdigest()

Comment on lines +122 to +123
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
ssh_control_hash = hashlib.sha256(cluster_name.encode("utf-8")).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode("utf-8")).hexdigest()

Comment on lines +65 to +66
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
ssh_control_hash = hashlib.sha256(cluster_name.encode("utf-8")).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode("utf-8")).hexdigest()

Comment on lines +132 to +133
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
ssh_control_hash = hashlib.sha256(cluster_name.encode()).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode()).hexdigest()
ssh_control_hash = hashlib.sha256(cluster_name.encode("utf-8")).hexdigest()
ssh_user_hash = hashlib.sha256(getuser().encode("utf-8")).hexdigest()

if not group_name:
raise ValueError("group_name is None.")
hexlified_name = hashlib.sha1(group_name.encode()).hexdigest()
hexlified_name = hashlib.sha256(group_name.encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid platform-dependent behavior, it's better to explicitly specify the encoding as utf-8 when calling .encode().

Suggested change
hexlified_name = hashlib.sha256(group_name.encode()).hexdigest()
hexlified_name = hashlib.sha256(group_name.encode("utf-8")).hexdigest()

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

israbbani and others added 4 commits January 17, 2026 03:54
Pure refactoring. The file isn't used anywhere. Unused code adds to
maintenance burden.

Signed-off-by: irabbani <[email protected]>
Signed-off-by: Purushotham Pushpavanth <[email protected]>
This commit addresses security concerns by migrating all internal
hash operations from SHA-1 to SHA-256 across the Ray codebase.

Fixes ray-project#53435

Signed-off-by: Purushotham Pushpavanth <[email protected]>
## Description

When the autoscaler commits to draining a node, immediately update
`ClusterResourceManager` so the scheduler sees the node as unavailable.

### Why
There's a race condition where work can be scheduled to nodes that are
committed for draining:
1. Autoscaler calls `HandleDrainNode()` -> `SetNodeDraining()` updates
`GcsNodeManager::draining_nodes_`.
2. Scheduler checks `ClusterResourceManager::IsNodeDraining()` which
reads from `NodeResources::is_draining`.
3. But `is_draining` is only updated when the raylet broadcasts via
RaySyncer.

During this window, scheduler sees `is_draining=false` and schedules
work to the draining node. To fix this, `GcsResourceManager` now
registers a listener with `GcsNodeManager` that immediately updates
`ClusterResourceManager` when a node is set to draining. This closes the
race window where work could be scheduled to draining nodes.

---------

Signed-off-by: Sagar Sumit <[email protected]>
Co-authored-by: Zac Policzer <[email protected]>
Signed-off-by: Purushotham Pushpavanth <[email protected]>
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ changes look irrelevant here -- maybe a bad merge?

I believe some of these hashes are used in the runtime_env directory paths. I'd be concerned that extending their length could cause problems in some environments. I don't think this change in those areas is worth the risk of breakage.

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Jan 17, 2026
This commit reverts runtime_env hash functions back to SHA-1 to address
reviewer concerns about filesystem path length limitations.

Runtime environment hashes are used in:
- Directory paths: /tmp/ray/.../pip/<hash>
- GCS URIs: gcs://_ray_pkg_<hash>.zip
- Conda env names: ray-<hash>

SHA-256 would increase hash length from 40 to 64 characters (+24 chars),
which could cause issues on:
- Windows systems (PATH_MAX = 260 characters)
- Network filesystems with shorter path limits
- Legacy systems with path restrictions

Additionally, changing these hashes would invalidate all existing cached
runtime environments, causing unnecessary rebuilds.

For content-addressable hashing (non-cryptographic use), SHA-1 provides
sufficient collision resistance. SHA-256 is retained for security-critical
operations like SSH control paths, config hashing, and CloudWatch configs.

Files reverted to SHA-1:
- python/ray/_private/runtime_env/pip.py
- python/ray/_private/runtime_env/uv.py
- python/ray/_private/runtime_env/conda.py
- python/ray/_private/runtime_env/conda_utils.py
- python/ray/_private/runtime_env/packaging.py

Related to ray-project#53435

Signed-off-by: Purushotham Pushpavanth <[email protected]>
@pushpavanthar
Copy link
Author

pushpavanthar commented Jan 17, 2026

@edoakes Thank you for the feedback!

The C++ changes look irrelevant here -- maybe a bad merge?

You are right about the bad merge, fixed it by pulling master again.

I believe some of these hashes are used in the runtime_env directory paths. I'd be concerned that extending their length could cause problems in some environments. I don't think this change in those areas is worth the risk of breakage.

Great catch! I've reverted the 5 runtime_env files back to SHA-1 to avoid the path length issues you identified. SHA-256 would add +24 characters to filesystem paths, which could definitely cause problems on Windows and network filesystems, plus it would invalidate all existing cached environments.

@pushpavanthar pushpavanthar requested a review from edoakes January 17, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace SHA-1 usage with safer alternatives (e.g., SHA-256)

4 participants