Issue with Embedding shape

I have been trying to train a model for our case but having some issues with the training loop, and I am getting following error - 
Traceback (most recent call last):
  File "[/workspace/KBLaM/experiments/train.py", line 963](https://vr2vg553sgygcb-8888.proxy.runpod.net/lab/tree/workspace/KBLaM/workspace/KBLaM/experiments/train.py#line=962), in <module>
    main()
  File "[/workspace/KBLaM/experiments/train.py", line 948](https://vr2vg553sgygcb-8888.proxy.runpod.net/lab/tree/workspace/KBLaM/workspace/KBLaM/experiments/train.py#line=947), in main
    trainer.train(
  File "[/workspace/KBLaM/experiments/train.py", line 616](https://vr2vg553sgygcb-8888.proxy.runpod.net/lab/tree/workspace/KBLaM/workspace/KBLaM/experiments/train.py#line=615), in train
    kb_embedding = self.kbretriever.get_key_embeddings(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[/workspace/KBLaM/experiments/train.py", line 444](https://vr2vg553sgygcb-8888.proxy.runpod.net/lab/tree/workspace/KBLaM/workspace/KBLaM/experiments/train.py#line=443), in get_key_embeddings
    train_set_key, train_set_val = get_kb_embd(
                                   ^^^^^^^^^^^^
  File "[/workspace/KBLaM/src/kblam/utils/train_utils.py", line 98](https://vr2vg553sgygcb-8888.proxy.runpod.net/lab/tree/workspace/KBLaM/workspace/KBLaM/src/kblam/utils/train_utils.py#line=97), in get_kb_embd
    precomputed_base_embd=np.stack([key_embds[indices], value_embds[indices]]),
                                    ~~~~~~~~~^^^^^^^^^
IndexError: index 1213 is out of bounds for axis 0 with size 448

Now for context my kb size is 448 and my embeddings shapes are - 
Key embeddings shape: (448, 1536)
Value embeddings shape: (448, 1536)

My flow is - 
1. Initialize the base llama model
2. generate KB embeddings using - text-embedding-ada-002
3. I have already generated Synthetic QA file so used that for training
4. once I start the training loop, I get the error above - 
%%bash
python experiments/train.py \
  --dataset_dir datasets \
  --train_dataset synthetic_data \
  --N 4434 \
  --B 16 \
  --total_steps 120 \
  --gradient_accm_step 12 \
  --encoder_spec OAI \
  --key_embd_src key \
  --use_cached_embd \
  --sep_query_head \
  --kb_token_layer_frequency 3 \
  --llm_type llama3 \
  --hf_model_spec meta-llama/Meta-Llama-3-8B-Instruct \
  --hf_token $HF_TOKEN \
  --model_save_dir output/ \
  --max_seq_len 1536


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Embedding shape #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with Embedding shape #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions