[WIP] [Model] Index tts by divyanshsinghvi · Pull Request #334 · vllm-project/vllm-omni

divyanshsinghvi · 2025-12-16T10:20:31Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolves #229 . Integrates index-tts model https://github.com/index-tts/index-tts

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

divyanshsinghvi · 2025-12-16T10:58:00Z

@hsliuustc0106

Thinking of structuring in 3 steps like qwen3-omni. But confused about this specific files below should I be dumping all of them together in PR? Find few which match some implementation and replace those rest dump it with this PR?

├── __pycache__
├── gpt
│   ├── __init__.py
│   ├── conformer
│   │   ├── __init__.py
│   │   ├── attention.py
│   │   ├── embedding.py
│   │   └── subsampling.py
│   ├── conformer_encoder.py
│   ├── model_v2.py
│   └── perceiver.py
├── index_tts.py
├── index_tts_config.py
├── s2mel
│   ├── dac
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── model
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dac.py
│   │   │   ├── discriminator.py
│   │   │   └── encodec.py
│   │   ├── nn
│   │   │   ├── __init__.py
│   │   │   ├── layers.py
│   │   │   ├── loss.py
│   │   │   └── quantize.py
│   │   └── utils
│   │       ├── __init__.py
│   │       ├── decode.py
│   │       └── encode.py
│   └── modules
│       ├── audio.py
│       ├── bigvgan
│       │   ├── activations.py
│       │   ├── alias_free_activation
│       │   │   ├── cuda
│       │   │   │   ├── __init__.py
│       │   │   │   ├── activation1d.py
│       │   │   │   ├── anti_alias_activation.cpp
│       │   │   │   ├── anti_alias_activation_cuda.cu
│       │   │   │   ├── compat.h
│       │   │   │   ├── load.py
│       │   │   │   └── type_shim.h
│       │   │   └── torch
│       │   │       ├── __init__.py
│       │   │       ├── act.py
│       │   │       ├── filter.py
│       │   │       └── resample.py
│       │   ├── bigvgan.py
│       │   ├── config.json
│       │   ├── env.py
│       │   ├── meldataset.py
│       │   └── utils.py
│       ├── campplus
│       │   ├── DTDNN.py
│       │   ├── classifier.py
│       │   └── layers.py
│       ├── commons.py
│       ├── diffusion_transformer.py
│       ├── encodec.py
│       ├── flow_matching.py
│       ├── gpt_fast
│       │   ├── generate.py
│       │   ├── model.py
│       │   └── quantize.py
│       ├── layers.py
│       ├── length_regulator.py
│       ├── quantize.py
│       ├── vocos
│       │   ├── __init__.py
│       │   ├── heads.py
│       │   ├── helpers.py
│       │   ├── loss.py
│       │   ├── models.py
│       │   ├── modules.py
│       │   ├── pretrained.py
│       │   └── spectral_ops.py
│       └── wavenet.py
├── utils
│   ├── __init__.py
│   ├── arch_util.py
│   ├── checkpoint.py
│   ├── common.py
│   ├── feature_extractors.py
│   ├── front.py
│   ├── maskgct
│   │   └── models
│   │       ├── codec
│   │       │   ├── __init__.py
│   │       │   ├── amphion_codec
│   │       │   │   ├── codec.py
│   │       │   │   ├── quantize
│   │       │   │   │   ├── __init__.py
│   │       │   │   │   ├── factorized_vector_quantize.py
│   │       │   │   │   ├── lookup_free_quantize.py
│   │       │   │   │   ├── residual_vq.py
│   │       │   │   │   └── vector_quantize.py
│   │       │   │   └── vocos.py
│   │       │   ├── codec_dataset.py
│   │       │   ├── codec_inference.py
│   │       │   ├── codec_sampler.py
│   │       │   ├── codec_trainer.py
│   │       │   ├── kmeans
│   │       │   │   ├── repcodec_model.py
│   │       │   │   └── vocos.py
│   │       │   └── melvqgan
│   │       │       └── melspec.py
│   │       └── tts
│   │           └── maskgct
│   │               ├── ckpt
│   │               │   └── wav2vec2bert_stats.pt
│   │               ├── llama_nar.py
│   │               └── maskgct_s2a.py
│   ├── maskgct_utils.py
│   ├── qwen0.6bemo4mergeconfig.json
│   ├── qwen_emotion.py
│   ├── tagger_cache
│   │   ├── zh_tn_tagger.fst
│   │   └── zh_tn_verbalizer.fst
│   ├── typical_sampling.py
│   ├── utils.py
│   └── xtransformers.py```

hsliuustc0106 · 2025-12-19T23:21:38Z

this one looks quite complex :) maybe we can try a simple one as a start in tts #315

divyanshsinghvi · 2025-12-23T06:40:05Z

this one looks quite complex :) maybe we can try a simple one as a start in tts #315

Will raise a PR for it today. Ironing out lots of issues that came up while adding #315.

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

divyanshsinghvi · 2026-01-10T10:21:57Z

reopen

hsliuustc0106 · 2026-01-10T22:14:05Z

@linyueqian PTAL

divyanshsinghvi · 2026-01-10T22:15:55Z

I still have to push a working pr, had to clean up, will have to wait for 2 days for me to finish it up.

…on but framewrok works; onto stage2 Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

divyanshsinghvi mentioned this pull request Dec 16, 2025

[New Model]: index-tts2 #229

Open

1 task

david6666666 mentioned this pull request Dec 16, 2025

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

divyanshsinghvi changed the title ~~Index tts~~ [WIP] Index tts Dec 16, 2025

divyanshsinghvi force-pushed the index_tts branch from a50349b to 9a4c7e4 Compare December 16, 2025 12:57

divyanshsinghvi changed the title ~~[WIP] Index tts~~ [WIP] [Model] Index tts Dec 18, 2025

hsliuustc0106 mentioned this pull request Dec 30, 2025

[RFC]: Does the official team of vllm-omni plan to support the integration of the index-tts model? #545

Open

1 task

divyanshsinghvi closed this Jan 10, 2026

divyanshsinghvi force-pushed the index_tts branch from d77b84d to 9c2f746 Compare January 10, 2026 10:13

reopen

9eab38c

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

divyanshsinghvi reopened this Jan 10, 2026

Stage 1 working; need to adapt index_tts2 actual netowrk implementati…

db0550b

…on but framewrok works; onto stage2 Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

david6666666 mentioned this pull request Jan 16, 2026

vLLM-Omni Model Support #808

Open

51 tasks

divyanshsinghvi added 12 commits January 26, 2026 22:22

Fixes indextts2 stage 1 compleyed

3c01363

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Merge branch 'main' into index_tts

8b19975

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

Merge branch 'main' into index_tts

d4f539b

Fixes but needs to be checked

e85536b

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

remove wrong file added

faaafea

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Small import fixes

2417af1

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

few fixes

1508c89

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

few refactoring

386d64a

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

e2e working; output needs to be fixed

c14c5f3

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

e2e working; output now legible, but not correct conditional of prompts

8b2a42c

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

indexing

a827d55

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Merge branch 'main' into index_tts

7991410

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [Model] Index tts#334

[WIP] [Model] Index tts#334
divyanshsinghvi wants to merge 14 commits intovllm-project:mainfrom
divyanshsinghvi:index_tts

divyanshsinghvi commented Dec 16, 2025 •

edited

Loading

Uh oh!

divyanshsinghvi commented Dec 16, 2025 •

edited

Loading

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

divyanshsinghvi commented Dec 23, 2025

Uh oh!

divyanshsinghvi commented Jan 10, 2026

Uh oh!

hsliuustc0106 commented Jan 10, 2026

Uh oh!

divyanshsinghvi commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

divyanshsinghvi commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

divyanshsinghvi commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

divyanshsinghvi commented Dec 23, 2025

Uh oh!

divyanshsinghvi commented Jan 10, 2026

Uh oh!

hsliuustc0106 commented Jan 10, 2026

Uh oh!

divyanshsinghvi commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

divyanshsinghvi commented Dec 16, 2025 •

edited

Loading

divyanshsinghvi commented Dec 16, 2025 •

edited

Loading