In the YaRN paper, rope_base=10000 (static YaRN) was used, yielding excellent extrapolation results. Could the authors clarify whether setting rope_base to 500000 while using YaRN would produce a synergistic effect, i.e., achieving results that surpass both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @jquesnelle