Skip to content

PrefillPlan tries to allocate more memory than float_workspace_size_in_bytes passed in. #809

Open
@rchardx

Description

Current PrefillPlan interface:

template <typename IdType>
inline cudaError_t PrefillPlan(void* float_buffer, size_t float_workspace_size_in_bytes,
                               void* int_buffer, void* page_locked_int_buffer,
                               size_t int_workspace_size_in_bytes, PrefillPlanInfo& plan_info,
                               IdType* qo_indptr_h, IdType* kv_indptr_h, uint32_t total_num_rows,
                               uint32_t batch_size, uint32_t num_qo_heads, uint32_t num_kv_heads,
                               uint32_t head_dim_qk, uint32_t head_dim_vo, uint32_t page_size,
                               bool enable_cuda_graph, uint32_t sizeof_dtype_o,
                               cudaStream_t stream);

float_workspace_size_in_bytes is an input parameter that cannot be determined in advance through function calls or other means.

Currently, PrefillPlan may attempt to allocate variables like batch_prefill_tmp_v and batch_prefill_tmp_s without checking if they would exceed the available float workspace size.

To prevent this issue, a function should be implemented to notify users about the required float workspace size.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions