vllm plugin#35
Conversation
- refactored and simplified muilti modal support and forward code - load_weights bug fix - Improved docs throughout to explain the pipeline and batching behavior.
- Downloads checkpoint from source HF repo - Converts .ckpt to model.safetensors - Extracts and saves config.json with placeholder for modifications - Copies tokenizer files and README - Adds reference to original model in README - Uploads to new HF repository
- Reduced from 306 to 143 lines - Simplified logic and removed verbose print statements - Cleaner error handling with try/except passes - More Pythonic file operations - Maintained all functionality
More descriptive name that clearly indicates the script's purpose
Sets architectures to ['BiomedRnaForSequenceEmbedding'] which is required for vLLM plugin to properly register and load the model
- Removed --token argument for better security - Updated documentation to explain HF_TOKEN setup - Token won't appear in shell history or process lists - Follows HuggingFace CLI standard practice
- Removed --- markers that were interpreted as invalid YAML - Use markdown blockquote for reference instead - Cleaner, more standard markdown format
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
|
|
||
|
|
||
| # Call registration when module is imported | ||
| register_biomed_rna_model() |
There was a problem hiding this comment.
I am pretty sure this function is invoked by vLLM when discovering the plugins you have registered in the vllm.general_plugins entrypoint group. there should be no need to explictly invoke it.
| AutoTokenizer.register( | ||
| LlamaForMultiTaskConfig, fast_tokenizer_class=NoOpTokenizer | ||
| ) |
There was a problem hiding this comment.
I would double check why vLLM tries to initialize a tokenizer even if you set skip_tokenizer_init
There was a problem hiding this comment.
the pooling entry point constructs an IO processor for the scoring even though we dont use and that processor expects a tokenizer so I get:
File ".../scoring/io_processor.py", line 45, in __init__
self.tokenizer = self.renderer.get_tokenizer()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../renderers/base.py", line 142, in get_tokenizer
raise ValueError("Tokenizer not available when `skip_tokenizer_init=True`")
ValueError: Tokenizer not available when `skip_tokenizer_init=True`
maybe this could be fixed in vllm code? or bypassed in another way?
| def pre_process( | ||
| self, | ||
| prompt: RnaPrompt, | ||
| request_id: str | None = None, | ||
| **kwargs, |
There was a problem hiding this comment.
Do you really need the IO Processor plugin?
It looks like the pre-process is only creating the attention mask. Other than that data is not really pre-processed.
There was a problem hiding this comment.
isn't an IO processor plugin a must if we want online vllm serving for multimodal data? cause the input format is complex, its not just a list of tokens
| raise ValueError(f"Cannot find embedding data in output: {type(output)}") | ||
|
|
||
| if isinstance(embedding, torch.Tensor): | ||
| embedding_list = embedding.cpu().tolist() |
There was a problem hiding this comment.
Data in on cpu already, no need to move it to cpu here.
| if isinstance(embedding, torch.Tensor): | ||
| embedding_list = embedding.cpu().tolist() | ||
| else: | ||
| embedding_list = list(embedding) |
There was a problem hiding this comment.
Is there ever a case where the output from the pooler is not a list?
EmbeddingIdentityPooler always returns a list as far as I can see.
There was a problem hiding this comment.
but at this point its PoolingOutput.data which is a Tensor
I will remove the else clause
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class BiomedRnaDummyProcessor: |
There was a problem hiding this comment.
Is this class ever used?
There was a problem hiding this comment.
no:) removed it
| }, | ||
| } | ||
|
|
||
| def post_process( |
There was a problem hiding this comment.
Same comment as for the pre_process. Is this needed at all?
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
christian-pinto
left a comment
There was a problem hiding this comment.
LGTM now, thanks
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "cell_type": "markdown", |
There was a problem hiding this comment.
can you remove this from pr?
mmdanziger
left a comment
There was a problem hiding this comment.
lgtm! curious to see how it works :)
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
https://huggingface.co/sivanravid/biomed.rna.llama.47m.wced.multitask.v1.vllm/
https://huggingface.co/sivanravid/biomed.rna.llama.32m.mlm.multitask.v1.vllm/
converted with
scripts/create_vllm_compatible_hf_model_repo.pyTodos for next PR: