Closed
Description
想请教一下Qwen中ntk_alpha的计算逻辑,我看到Qwen中时这样计算的:
def get_ntk_alpha(self, true_seq_len):
context_value = math.log(true_seq_len / self.seq_length, 2) + 1
ntk_alpha = 2 ** math.ceil(context_value) - 1
ntk_alpha = max(ntk_alpha, 1)
return ntk_alpha
请问下是为什么这样计算?为什么不是直接这样子?:
def get_ntk_alpha(self, true_seq_len):
ntk_alpha = true_seq_len / self.seq_length
ntk_alpha = max(ntk_alpha, 1)
return ntk_alpha
期望答复,谢谢!
Metadata
Metadata
Assignees
Labels
No labels