Skip to content

vllm plugin#35

Merged
sivanravidos merged 21 commits into
mainfrom
vllm
Jun 21, 2026
Merged

vllm plugin#35
sivanravidos merged 21 commits into
mainfrom
vllm

Conversation

@sivanravidos

@sivanravidos sivanravidos commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Todos for next PR:

  • support pooling types, get it via request argument and move pooling from forward to vlllm Pooler class
  • add classification output in addition to embeddings

@sivanravidos sivanravidos changed the title vllm plugin first draft vllm plugin May 26, 2026
Sivan Ravid and others added 17 commits May 26, 2026 10:03
- refactored and simplified muilti modal support and forward code
- load_weights bug fix
-  Improved
docs throughout to explain the pipeline and batching behavior.
- Downloads checkpoint from source HF repo
- Converts .ckpt to model.safetensors
- Extracts and saves config.json with placeholder for modifications
- Copies tokenizer files and README
- Adds reference to original model in README
- Uploads to new HF repository
- Reduced from 306 to 143 lines
- Simplified logic and removed verbose print statements
- Cleaner error handling with try/except passes
- More Pythonic file operations
- Maintained all functionality
More descriptive name that clearly indicates the script's purpose
Sets architectures to ['BiomedRnaForSequenceEmbedding'] which is required
for vLLM plugin to properly register and load the model
- Removed --token argument for better security
- Updated documentation to explain HF_TOKEN setup
- Token won't appear in shell history or process lists
- Follows HuggingFace CLI standard practice
- Removed --- markers that were interpreted as invalid YAML
- Use markdown blockquote for reference instead
- Cleaner, more standard markdown format
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>
@sivanravidos sivanravidos marked this pull request as ready for review June 11, 2026 07:54
Comment thread vllm/vllm_biomed_rna_plugin/__init__.py Outdated


# Call registration when module is imported
register_biomed_rna_model()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure this function is invoked by vLLM when discovering the plugins you have registered in the vllm.general_plugins entrypoint group. there should be no need to explictly invoke it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines +83 to +85
AutoTokenizer.register(
LlamaForMultiTaskConfig, fast_tokenizer_class=NoOpTokenizer
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would double check why vLLM tries to initialize a tokenizer even if you set skip_tokenizer_init

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pooling entry point constructs an IO processor for the scoring even though we dont use and that processor expects a tokenizer so I get:

File ".../scoring/io_processor.py", line 45, in __init__
    self.tokenizer = self.renderer.get_tokenizer()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../renderers/base.py", line 142, in get_tokenizer
    raise ValueError("Tokenizer not available when `skip_tokenizer_init=True`")
ValueError: Tokenizer not available when `skip_tokenizer_init=True`

maybe this could be fixed in vllm code? or bypassed in another way?

Comment on lines +77 to +81
def pre_process(
self,
prompt: RnaPrompt,
request_id: str | None = None,
**kwargs,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need the IO Processor plugin?

It looks like the pre-process is only creating the attention mask. Other than that data is not really pre-processed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't an IO processor plugin a must if we want online vllm serving for multimodal data? cause the input format is complex, its not just a list of tokens

raise ValueError(f"Cannot find embedding data in output: {type(output)}")

if isinstance(embedding, torch.Tensor):
embedding_list = embedding.cpu().tolist()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data in on cpu already, no need to move it to cpu here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if isinstance(embedding, torch.Tensor):
embedding_list = embedding.cpu().tolist()
else:
embedding_list = list(embedding)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a case where the output from the pooler is not a list?

EmbeddingIdentityPooler always returns a list as far as I can see.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but at this point its PoolingOutput.data which is a Tensor
I will remove the else clause

# ---------------------------------------------------------------------------


class BiomedRnaDummyProcessor:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class ever used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no:) removed it

},
}

def post_process(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the pre_process. Is this needed at all?

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

@christian-pinto christian-pinto left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks

"execution_count": null,
"metadata": {},
"outputs": [],
"cell_type": "markdown",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this from pr?

@mmdanziger mmdanziger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! curious to see how it works :)

Sivan Ravid added 2 commits June 21, 2026 06:14
@sivanravidos sivanravidos merged commit c510024 into main Jun 21, 2026
8 of 9 checks passed
@sivanravidos sivanravidos deleted the vllm branch June 21, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants