File tree Expand file tree Collapse file tree 3 files changed +28
-1
lines changed
examples/seld_spatialsoundqa Expand file tree Collapse file tree 3 files changed +28
-1
lines changed Original file line number Diff line number Diff line change @@ -15,7 +15,15 @@ Encoder | Projector | LLM |
1515[ Spatial-AST] ( https://huggingface.co/datasets/zhisheng01/SpatialAudio/blob/main/SpatialAST/finetuned.pth ) | [ Q-former] ( https://huggingface.co/datasets/zhisheng01/SpatialAudio/blob/main/BAT/model.pt ) (~ 73.56M) | [ llama-2-7b-hf] ( https://huggingface.co/meta-llama/Llama-2-7b ) |
1616
1717## Demo (Spatial Audio Inference)
18- Try [ ` inference.ipynb ` ] ( https://github.com/X-LANCE/SLAM-LLM/blob/main/examples/seld_spatialsoundqa/inference.ipynb ) .
18+ ### Environment setup
19+ ```
20+ cd SLAM-LLM/examples/seld_spatialsoundqa/
21+ pip install -r requirements.txt
22+ cd SLAM-LLM/
23+ pip install -e .
24+ ```
25+
26+ Then try [ ` inference.ipynb ` ] ( https://github.com/X-LANCE/SLAM-LLM/blob/main/examples/seld_spatialsoundqa/inference.ipynb ) .
1927
2028
2129## Data preparation
Original file line number Diff line number Diff line change 1+ timm == 0.9.10
2+ soundfile
3+ numpy == 1.26.4
4+ HyperPyYAML == 1.2.2
5+ conformer == 0.3.2
6+ deepspeed == 0.14.2 ; sys_platform == 'linux'
7+ diffusers == 0.27.2
8+ gradio == 5.3.0
9+ grpcio == 1.57.0
10+ grpcio-tools == 1.57.0
11+ inflect == 7.3.1
12+ matplotlib == 3.7.5
13+ lightning == 2.2.4
14+ wget == 3.2
15+ librosa
16+ torchaudio == 2.3.0
17+ torchlibrosa
18+ transformers == 4.51.0
Original file line number Diff line number Diff line change @@ -17,6 +17,7 @@ class ModelConfig:
1717 encoder_projector : str = "q-former"
1818 encoder_dim : int = 768
1919 qformer_layers : int = 8
20+ query_len : int = 64
2021
2122@dataclass
2223class PeftConfig :
You can’t perform that action at this time.
0 commit comments