音声言語モデルに関する講演資料

本リポジトリにて，講演スライド及びデモスクリプトを配布しています．

学会	日時	スライド
音学シンポジウム2025	2025年6月13日	Link
6th joint ASA/ASJ meeting	2025年12月3日	Link
日本音響学会九州支部第3回オンラインセミナー	2026年2月27日 18:00-19:30	Link

質問

demo2では、どのようにLlamaForSpeechLM-Instruct - Built with Llamaの事前学習モデルを行っていますか。詳細に教えていただきたいです。

ご質問ありがとうございます．demo2.pyを用いて，下記の手順で事前学習を行っています．なお，学習にはNVIDIA RTX A6000 48GB VRAM GPUを1基用いました．

sh scripts/download_clotho.shでClotho audio captioningデータセットをダウンロード
Whisper encoderとLlama 3.2 1Bを2層MLPのadapterで接続．事前学習およびinstruction tuningを通して，WhisperおよびLlamaのパラメータを凍結し，adapterのみ更新
train()を用いて，LibrispeechでのASRおよびClothoでのaudio captioningで事前学習
generate_data()を用いて，VITSでテキストベースのalpacaデータセットにおける入力テキストを音声合成し，音声入力のalpacaデータセットを作成
finetune()を用いて，作成したalpacaデータセットでcross-modal instruction tuning

Setup

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
incl_licenses		incl_licenses
scripts		scripts
LICENSE		LICENSE
Notice		Notice
README.md		README.md
asa2025-lecture.pdf		asa2025-lecture.pdf
asj-kyushu-20260227.pdf		asj-kyushu-20260227.pdf
demo1.ipynb		demo1.ipynb
demo2.ipynb		demo2.ipynb
demo2.py		demo2.py
requirements.txt		requirements.txt
slp2025-tutorial.pdf		slp2025-tutorial.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

音声言語モデルに関する講演資料

質問

Setup

Demo

Phi-4-Multimodalで音声翻訳

Llama 3.2とWhisper encoderをadapterで接続してzero-shot instruction following

Phonetic tokenとacoustic tokenとで再合成音声を比較

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

音声言語モデルに関する講演資料

質問

Setup

Demo

Phi-4-Multimodalで音声翻訳

Llama 3.2とWhisper encoderをadapterで接続してzero-shot instruction following

Phonetic tokenとacoustic tokenとで再合成音声を比較

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages