Skip to content

Conversation

@wenxie-amd
Copy link
Contributor

@wenxie-amd wenxie-amd commented Nov 4, 2025

Add two parameters, both disabled by default:

HSA_KERNARG_POOL_SIZE: When there are many small operators, the pool size can become insufficient, causing kernel launches to wait and significantly increasing launch times. This is particularly useful in scenarios with a high number of small kernels.
ENABLE_NUMA_BINDING: When enabled, this binds each GPU's process to its corresponding affinity socket, boosting CPU efficiency.

# Increase HSA kernarg pool size to 12MB for models with lot of kernels
# export HSA_KERNARG_POOL_SIZE=${HSA_KERNARG_POOL_SIZE:-12582912}

# Enable NUMA binding for better memory locality (may increase stability for large models)
export ENABLE_NUMA_BINDING=${ENABLE_NUMA_BINDING:-0}

@wenxie-amd wenxie-amd force-pushed the dev/wenx/numa_binding branch from fba890e to aca932a Compare November 4, 2025 10:44
Copy link
Contributor

@limou102 limou102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two newly added environment variables definitely speed up training? Why aren’t they enabled by default, especially since the NUMA setting feels a bit tricky.

@wenxie-amd
Copy link
Contributor Author

Do these two newly added environment variables definitely speed up training? Why aren’t they enabled by default, especially since the NUMA setting feels a bit tricky.

The main branch is currently undergoing testing for the new version of the Docker release. We’re a bit cautious about these two environment variables potentially causing side effects, so we haven't activated them yet. They're being included as existing capabilities, and we'll consider enabling them for testing if any specific models need it in the future.

@wenxie-amd wenxie-amd merged commit 8be6a6e into main Nov 4, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants