XQA always sets dynamic smem size on rank 0

https://github.com/flashinfer-ai/flashinfer/blob/9bf007d7403f0a394fda44abd4170b354cac3f05/csrc/xqa/mha.cu#L2589-L2594

I believe this will always set dynamic shared memory size on GPU 0 when the library loads. This makes it difficult to integrate in multi-GPU environments where a PID has access to more than one GPU because we can't control where this value is applied. 

If this value is applied on the incorrect GPU all kernel launches will fail with a CUDA invalid argument error.

	static uint32_t configureKernel() {
	uint32_t size;
	cudaMemcpyFromSymbol(&size, smemSize, sizeof(smemSize));
	cudaFuncSetAttribute(kernel_mha, cudaFuncAttributeMaxDynamicSharedMemorySize, size);
	return size;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XQA always sets dynamic smem size on rank 0 #2494

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

XQA always sets dynamic smem size on rank 0 #2494

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions