Skip to content

chinguyen1010/ICSC_PreFinalRound_2025

Repository files navigation

ICSC_PreFinalRound_2025

Solutions and report for the International Computer Science Competition (Pre-Final Round 2025)

📄 View the main report (PDF)
📘 View the submission paper (PDF)

Screenshot 2025-11-02 at 9 58 34 PM Screenshot 2025-11-02 at 9 59 06 PM Screenshot 2025-11-02 at 9 59 35 PM Screenshot 2025-11-02 at 10 00 11 PM

Problem C.1: Zipf’s Meaning-Frequency Law (8 Points) This problem requires you to read the following recently published scientific article:

A New Formulation of Zipf’s Meaning-Frequency Law through Contextual Diversity by Ryo Nagata and Kumiko Tanaka-Ishii (2025) Link: https://aclanthology.org/2025.acl-long.744.pdf

Answer the following questions related to this article: (a) What are the limitations of dictionary-based studies on measuring Zipf’s Meaning-Frequency Law?

(b) Explain the von Mises-Fisher distribution and how v = 1/κ measures contextual diversity.

(c) Why do the authors use the von Mises-Fisher distribution instead of simpler measures like average pairwise cosine similarity between word vectors?

(d) How do autoregressive models compare to masked language models for observing the Meaning- Frequency law?

(e) How can the proposed method serve as a diagnostic tool for language models?

(f) What does the observation that meaning-frequency law breaks down for small models and out-of-domain data suggest?

(Bonus) What factors may lead more frequent words to have more meanings? What factors may lead to fewer meanings? Give examples of each.

Problem C.2: Self-Improvement Capabilities of LLMs (8 Points) This problem requires you to read the following recently published scientific article: Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models by Y. Song, H. Zhang, C. Eisenach, S. M. Kakade, D. Foster, and U. Ghai (2025). Link: https://openreview.net/pdf?id=mtJSMcF3ek

Answer the following questions related to this article:

(a) Describe the term self-improvement using the author’s framework. What key assumption are the authors making that allows for self-improvement?

(b) What is the generation-verification gap (GV-Gap)? Why is it a better metric than measuring performance differences after model updates?

(c) What is greedy decoding and why is self-improvement with greedy decoding impossible?

(d) Explain why the relative GV-Gap scales monotonically with pre-training FLOPs for certain verification methods but not others.

(e) Why do most models fail to self-improve on Sudoku puzzles despite the exponential com- putational complexity separation between generation and verification?

(f) Propose a task domain where you would expect self-improvement to improve performance and explain why.

About

International Computer Science Competition 2025 Pre-FInal Round Submission

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages