Skip to content

Some questions about the implementation and experiment details. #13

@Bamboos2003

Description

@Bamboos2003

Hello,

I'm recently reproducting the table.2 and table.3, but there are some problems I can not figure out by myself.

The questions below are about WorldSense 7B:

1.I use the hyperparams given by paper, like rho_audio=0.3, rho_video=0.6, g=3, context_ratio=0.05 for 45%, but the retained ratios I calculated in two approaches are 49.17% (total_audio_and_video_tokens_after_pruning / total_audio_and_video_tokens_before_pruning across all the 3172 samples) and 48.23% (averaged multimodal token retained ratio per sample), and the 35% configuration, rho_audio=0.4, rho_video=0.7, g=3, context_ratio=0.05, the two retained ratios are 39.59% and 38.67% ? Why the retained ratios are different from the ones in the paper? Are the retained ratio in Table.1 and Table.2 actually calculated or just forecast? Is this because the actual token retention rate differs from the paper's settings for different data, or the results I reproduced are problematic?

2.What kind of gpu memory do you actually use to produce the GPU Mem in table.3? I think it must be the maximum of torch.cuda.max_memory_allocated over all 3172 samples? or torch.cuda.max_memory_reserved?

3.How exactly do you measure your prefill time and latency per example? For the prefill time, I just measure the averaged first forward time over all 3172 examples, and the results in full tokens-3b and 7b are 471ms and 885ms, but yours are 258ms and 291ms. Do you calculate the wall-clock time or the cuda.event time of the first forward? For latency per example, what is being measured, from the start of which step to the end of which step?

4.Are the FLOPs actually calculated or estimated by the predefined retained ratio? That is to say, do you actually calculate the FLOPs by the retained_audio_and_video_tokens per sample and then, get the avg? Because after I back-calculation the formula (9) given FLOPs of the full tokens-3b and 7b: 37.4T and 73.2T, the n is 10003 and 10000. Is this a coincidence?

These are some problems I encountered while trying to reproduce the issue. My skills are limited, and I hope to get your answers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions