Skip to content

[WIP] [Model] Index tts#334

Draft
divyanshsinghvi wants to merge 14 commits intovllm-project:mainfrom
divyanshsinghvi:index_tts
Draft

[WIP] [Model] Index tts#334
divyanshsinghvi wants to merge 14 commits intovllm-project:mainfrom
divyanshsinghvi:index_tts

Conversation

@divyanshsinghvi
Copy link
Contributor

@divyanshsinghvi divyanshsinghvi commented Dec 16, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolves #229 . Integrates index-tts model https://github.com/index-tts/index-tts

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@divyanshsinghvi divyanshsinghvi mentioned this pull request Dec 16, 2025
1 task
@divyanshsinghvi
Copy link
Contributor Author

divyanshsinghvi commented Dec 16, 2025

@hsliuustc0106

Thinking of structuring in 3 steps like qwen3-omni. But confused about this specific files below should I be dumping all of them together in PR? Find few which match some implementation and replace those rest dump it with this PR?

├── __pycache__
├── gpt
│   ├── __init__.py
│   ├── conformer
│   │   ├── __init__.py
│   │   ├── attention.py
│   │   ├── embedding.py
│   │   └── subsampling.py
│   ├── conformer_encoder.py
│   ├── model_v2.py
│   └── perceiver.py
├── index_tts.py
├── index_tts_config.py
├── s2mel
│   ├── dac
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── model
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dac.py
│   │   │   ├── discriminator.py
│   │   │   └── encodec.py
│   │   ├── nn
│   │   │   ├── __init__.py
│   │   │   ├── layers.py
│   │   │   ├── loss.py
│   │   │   └── quantize.py
│   │   └── utils
│   │       ├── __init__.py
│   │       ├── decode.py
│   │       └── encode.py
│   └── modules
│       ├── audio.py
│       ├── bigvgan
│       │   ├── activations.py
│       │   ├── alias_free_activation
│       │   │   ├── cuda
│       │   │   │   ├── __init__.py
│       │   │   │   ├── activation1d.py
│       │   │   │   ├── anti_alias_activation.cpp
│       │   │   │   ├── anti_alias_activation_cuda.cu
│       │   │   │   ├── compat.h
│       │   │   │   ├── load.py
│       │   │   │   └── type_shim.h
│       │   │   └── torch
│       │   │       ├── __init__.py
│       │   │       ├── act.py
│       │   │       ├── filter.py
│       │   │       └── resample.py
│       │   ├── bigvgan.py
│       │   ├── config.json
│       │   ├── env.py
│       │   ├── meldataset.py
│       │   └── utils.py
│       ├── campplus
│       │   ├── DTDNN.py
│       │   ├── classifier.py
│       │   └── layers.py
│       ├── commons.py
│       ├── diffusion_transformer.py
│       ├── encodec.py
│       ├── flow_matching.py
│       ├── gpt_fast
│       │   ├── generate.py
│       │   ├── model.py
│       │   └── quantize.py
│       ├── layers.py
│       ├── length_regulator.py
│       ├── quantize.py
│       ├── vocos
│       │   ├── __init__.py
│       │   ├── heads.py
│       │   ├── helpers.py
│       │   ├── loss.py
│       │   ├── models.py
│       │   ├── modules.py
│       │   ├── pretrained.py
│       │   └── spectral_ops.py
│       └── wavenet.py
├── utils
│   ├── __init__.py
│   ├── arch_util.py
│   ├── checkpoint.py
│   ├── common.py
│   ├── feature_extractors.py
│   ├── front.py
│   ├── maskgct
│   │   └── models
│   │       ├── codec
│   │       │   ├── __init__.py
│   │       │   ├── amphion_codec
│   │       │   │   ├── codec.py
│   │       │   │   ├── quantize
│   │       │   │   │   ├── __init__.py
│   │       │   │   │   ├── factorized_vector_quantize.py
│   │       │   │   │   ├── lookup_free_quantize.py
│   │       │   │   │   ├── residual_vq.py
│   │       │   │   │   └── vector_quantize.py
│   │       │   │   └── vocos.py
│   │       │   ├── codec_dataset.py
│   │       │   ├── codec_inference.py
│   │       │   ├── codec_sampler.py
│   │       │   ├── codec_trainer.py
│   │       │   ├── kmeans
│   │       │   │   ├── repcodec_model.py
│   │       │   │   └── vocos.py
│   │       │   └── melvqgan
│   │       │       └── melspec.py
│   │       └── tts
│   │           └── maskgct
│   │               ├── ckpt
│   │               │   └── wav2vec2bert_stats.pt
│   │               ├── llama_nar.py
│   │               └── maskgct_s2a.py
│   ├── maskgct_utils.py
│   ├── qwen0.6bemo4mergeconfig.json
│   ├── qwen_emotion.py
│   ├── tagger_cache
│   │   ├── zh_tn_tagger.fst
│   │   └── zh_tn_verbalizer.fst
│   ├── typical_sampling.py
│   ├── utils.py
│   └── xtransformers.py```
 

@divyanshsinghvi divyanshsinghvi changed the title Index tts [WIP] Index tts Dec 16, 2025
@divyanshsinghvi divyanshsinghvi changed the title [WIP] Index tts [WIP] [Model] Index tts Dec 18, 2025
@hsliuustc0106
Copy link
Collaborator

this one looks quite complex :) maybe we can try a simple one as a start in tts #315

@divyanshsinghvi
Copy link
Contributor Author

this one looks quite complex :) maybe we can try a simple one as a start in tts #315

Will raise a PR for it today. Ironing out lots of issues that came up while adding #315.

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
@divyanshsinghvi
Copy link
Contributor Author

reopen

@hsliuustc0106
Copy link
Collaborator

@linyueqian PTAL

@divyanshsinghvi
Copy link
Contributor Author

I still have to push a working pr, had to clean up, will have to wait for 2 days for me to finish it up.

…on but framewrok works; onto stage2

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
@david6666666 david6666666 mentioned this pull request Jan 16, 2026
51 tasks
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: index-tts2

2 participants