Apply relevant model changes with shared sampling implementation and merge to main. Just once incremental PR for TTT Llama3.1-8b is ready this one can be merged.
Relevant pipelines to test functionality:
- metal demo
- vllm nightly
- Models CI for Llama3.3-70b + Qwen3 (expected failures for Llama3.1-8b)