vllm plugin by sivanravidos · Pull Request #35 · BiomedSciAI/biomed-multi-omic

sivanravidos · 2026-05-26T10:15:15Z

general plugin for the biomed-rna model with multi modal support for the RNA modality
ioprocessor plugin for handing the RNA data input and output format
supports both MLM and WCED variants
uses a copy of the HF model in safe tensors format here:
https://huggingface.co/sivanravid/biomed.rna.llama.47m.wced.multitask.v1.vllm/
https://huggingface.co/sivanravid/biomed.rna.llama.32m.mlm.multitask.v1.vllm/
converted with scripts/create_vllm_compatible_hf_model_repo.py

Todos for next PR:

support pooling types, get it via request argument and move pooling from forward to vlllm Pooler class
add classification output in addition to embeddings

- refactored and simplified muilti modal support and forward code - load_weights bug fix - Improved docs throughout to explain the pipeline and batching behavior.

- Downloads checkpoint from source HF repo - Converts .ckpt to model.safetensors - Extracts and saves config.json with placeholder for modifications - Copies tokenizer files and README - Adds reference to original model in README - Uploads to new HF repository

- Reduced from 306 to 143 lines - Simplified logic and removed verbose print statements - Cleaner error handling with try/except passes - More Pythonic file operations - Maintained all functionality

More descriptive name that clearly indicates the script's purpose

Sets architectures to ['BiomedRnaForSequenceEmbedding'] which is required for vLLM plugin to properly register and load the model

- Removed --token argument for better security - Updated documentation to explain HF_TOKEN setup - Token won't appear in shell history or process lists - Follows HuggingFace CLI standard practice

- Removed --- markers that were interpreted as invalid YAML - Use markdown blockquote for reference instead - Cleaner, more standard markdown format

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

christian-pinto · 2026-06-16T08:42:21Z

+
+
+# Call registration when module is imported
+register_biomed_rna_model()


I am pretty sure this function is invoked by vLLM when discovering the plugins you have registered in the vllm.general_plugins entrypoint group. there should be no need to explictly invoke it.

christian-pinto · 2026-06-16T08:43:47Z

+        AutoTokenizer.register(
+            LlamaForMultiTaskConfig, fast_tokenizer_class=NoOpTokenizer
+        )


I would double check why vLLM tries to initialize a tokenizer even if you set skip_tokenizer_init

the pooling entry point constructs an IO processor for the scoring even though we dont use and that processor expects a tokenizer so I get:

File ".../scoring/io_processor.py", line 45, in __init__ self.tokenizer = self.renderer.get_tokenizer() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../renderers/base.py", line 142, in get_tokenizer raise ValueError("Tokenizer not available when `skip_tokenizer_init=True`") ValueError: Tokenizer not available when `skip_tokenizer_init=True`

maybe this could be fixed in vllm code? or bypassed in another way?

christian-pinto · 2026-06-16T12:13:41Z

+    def pre_process(
+        self,
+        prompt: RnaPrompt,
+        request_id: str | None = None,
+        **kwargs,


Do you really need the IO Processor plugin?

It looks like the pre-process is only creating the attention mask. Other than that data is not really pre-processed.

isn't an IO processor plugin a must if we want online vllm serving for multimodal data? cause the input format is complex, its not just a list of tokens

christian-pinto · 2026-06-16T12:15:01Z

+            raise ValueError(f"Cannot find embedding data in output: {type(output)}")
+
+        if isinstance(embedding, torch.Tensor):
+            embedding_list = embedding.cpu().tolist()


Data in on cpu already, no need to move it to cpu here.

christian-pinto · 2026-06-16T12:17:59Z

+        if isinstance(embedding, torch.Tensor):
+            embedding_list = embedding.cpu().tolist()
+        else:
+            embedding_list = list(embedding)


Is there ever a case where the output from the pooler is not a list?

EmbeddingIdentityPooler always returns a list as far as I can see.

but at this point its PoolingOutput.data which is a Tensor
I will remove the else clause

christian-pinto · 2026-06-16T12:19:50Z

+# ---------------------------------------------------------------------------
+
+
+class BiomedRnaDummyProcessor:


Is this class ever used?

no:) removed it

christian-pinto · 2026-06-16T12:21:57Z

+            },
+        }
+
+    def post_process(


Same comment as for the pre_process. Is this needed at all?

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

christian-pinto

LGTM now, thanks

mmdanziger · 2026-06-18T09:49:02Z

-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",


can you remove this from pr?

mmdanziger

lgtm! curious to see how it works :)

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

vllm plugin first draft

92ce269

sivanravidos changed the title ~~vllm plugin first draft~~ vllm plugin May 26, 2026

Sivan Ravid and others added 17 commits May 26, 2026 10:03

tests and examples update

6e7406a

pre-commit

0cd67f0

cleanups

6f0fd0d

refactor flow and improve documentation

668c617

- refactored and simplified muilti modal support and forward code - load_weights bug fix - Improved docs throughout to explain the pipeline and batching behavior.

io plugin, online example and tests refactoring

883d25b

firx ruff on old notebook

0cd817e

Refactor: Make conversion script more elegant and concise

a6b6ae5

- Reduced from 306 to 143 lines - Simplified logic and removed verbose print statements - Cleaner error handling with try/except passes - More Pythonic file operations - Maintained all functionality

Rename script to create_vllm_compatible_hf_model_repo.py

7137d59

More descriptive name that clearly indicates the script's purpose

Add architectures key to config for vLLM compatibility

aed415a

Sets architectures to ['BiomedRnaForSequenceEmbedding'] which is required for vLLM plugin to properly register and load the model

Use HF_TOKEN env var instead of command line arg

23b51e2

- Removed --token argument for better security - Updated documentation to explain HF_TOKEN setup - Token won't appear in shell history or process lists - Follows HuggingFace CLI standard practice

Fix README YAML frontmatter validation error

aeb8b4e

- Removed --- markers that were interpreted as invalid YAML - Use markdown blockquote for reference instead - Cleaner, more standard markdown format

get model from HF repo

0e6caed

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

Merge remote-tracking branch 'origin/vllm' into hf_vllm

dfd38fa

convesion script from ckpt to safe tensors

da97cce

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

add mode ckpt to saf tensors conversion script

5dee03f

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

support MLM model as well

59bb80c

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

sivanravidos marked this pull request as ready for review June 11, 2026 07:54

sivanravidos requested review from christian-pinto and mmdanziger June 11, 2026 07:54

christian-pinto reviewed Jun 16, 2026

View reviewed changes

cleanups per PR review

e4a5281

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

christian-pinto approved these changes Jun 18, 2026

View reviewed changes

mmdanziger reviewed Jun 18, 2026

View reviewed changes

mmdanziger approved these changes Jun 18, 2026

View reviewed changes

Sivan Ravid added 2 commits June 21, 2026 06:14

update vllm model ref to ibm-research on HF

937a19a

Signed-off-by: Sivan Ravid <sivanra@il.ibm.com>

update vllm model ref to ibm-research on HF

c934826

sivanravidos merged commit c510024 into main Jun 21, 2026
8 of 9 checks passed

sivanravidos deleted the vllm branch June 21, 2026 13:03



		# Call registration when module is imported
		register_biomed_rna_model()

		# ---------------------------------------------------------------------------


		class BiomedRnaDummyProcessor:

Uh oh!

Conversation

sivanravidos commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christian-pinto left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmdanziger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sivanravidos commented May 26, 2026 •

edited

Loading