[20230917] Weekly AI ArXiv 만담 시즌2 - 26회차

### Zoom: https://navercorp.zoom.us/j/92208940283 
### 페이스북: https://www.facebook.com/weeklyaiarxivpage

### News
- Conference
  - ICLR 2024 
     - Abs: 9.23 AoE  (9.21에서 변경) , Full paper: 9.28
     - LLM 사용원칙: 써도 되지만 양심껏 해라. 저자는 안된다.
  - CHI 2024: 모두들 수고 많으셨습니다.
- [메타가 다시 AI에 올인? GPT4 능가하는 연구를 시작했다고](https://n.news.naver.com/mnews/article/469/0000760900?sid=105)
- [Google Deepmind Gemini의 출시가 다가오고 있다?](https://ts2.space/en/the-countdown-to-google-gemini-a-new-era-of-ai/)
  - 구글 외부에서 클로즈 베타를 시작했다는
- [If you’d bought Apple shares instead of iPhones, you’d now have $147,000](https://techcrunch.com/2023/09/15/one-meeeelion-dollars-muuhahahahaha/)
  - 이건 왠지 DGX (V100, A100, H100)을 안사고 엔비디아 주식을 샀다면..
  - 테슬라 차를 안사고 테슬라 주식을 샀다면.. 과 비슷한 느낌? ㅎㅎ
- [phi-1.5 이슈]
  - Textbook is all you need: 1.3B with 150B token 으로 압살
  - [데이터 오염 (Test 데이터로 학습) 이 의심됨](https://x.com/suchenzang/status/1701747947191615697?s=20)
  - LLM 은 어떻게 공정하고 정확하게 훈련하고 평가할지에 대한 프로토콜 체계 잡는 것부터 국제 공동연구가 필요할 듯
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/ae2a7ac3-83e9-4cb9-bc81-332fce692f64)
- [BrainLink 2023 LLM conference](https://sites.google.com/g.skku.edu/brainlink2023)
  - OpenAI, MSR 등 세계 최고 LLM 전문가 출동
  - 장소가... 평창 켄싱턴 (주위에 국보 많음 ㅋㅋㅋㅋㅋ)
  - 신청은 요기서: https://forms.gle/QWFKHzWpdWTDo1o1A
   
### ArXiv
- [In the long (context) run](https://www.harmdevries.com/post/context-length/)
  - Long context LLM 에 대해 잘 정리한 블로그
  - FlahAttention 나오면서 GPU 메모리와 연산 최적화로 long context 부담이 많이 줄어듬
  - 최근의 Long context 는 주로 fine-tuning 기반이다. 
  - 그럼 왜 pretraining 시에 long context 를 직접 고려해서 하지 않나?
     - pretraining 의 추가 attention overhead 때문? (이건 모델이 커지면 크게 부담스럽지 않음)
     - Pretraining 할 때 써먹을 long context 데이터 자체가 많이 없어서 (이게 중요한 문제)
  - Common crawl 중에선 C4보단 refineWeb이 좀더 낫고 코드는 확실히 좀더 상황이 좋음
  - 그럼 어떻게?
    - pretraining단에서 long-context 학습 녹록하지 않음. 배치 학습 효율위해 최대 길이 끼워맞춰 넣기 때문에 별로 효과없을 가능성
    - 결국 long-context 성향을 갖도록 웹페이지의 링크를 이용한 이어붙이기 같은 걸 해야
    - 양질의 문서데이터를 더 확보하고 데이터 가공작업에 공을 더 들이는 것도 방법 (비용의 문제)
    - 그런데 long context pretraining vs long context FT 를 비교할려고 보니 애당초 성능 평가 프로토콜이 부족해서 효과 검증이 어려움
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/8d509173-3941-4046-bdb1-c3591cccb218)
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/b46ba542-ad23-48ea-a099-2a2a46f0a6a3)
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/13086d3a-a05e-428a-bab7-228ca2d3f093)
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/9799bf4d-f941-41d4-8d75-022213baabbf)

 - [DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning](https://arxiv.org/abs/2309.05173)
   - Parameter efficient tranfer learning 과 prefix-tuning (p-tuning v2) + LoRA의 합작품 같은? (from UCL)
   - p-tuning v2의 학습가능한 soft prompt 의 일부 파라미터를 low rank로 분할해서 embedding 계산 때 적용
   - 뭔가 그럴듯 하긴 한데.. 얼마만큼 LLM 전체에 영향을 줄지... 큰 모델에서 얼마만큼 효과가 있을 지..
   - 실험은 주로 작은 모델에서 NLU task위주로 수행. LLaMA 시리즈 실험이 없어서 아쉽
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/41b1d6b0-28af-4793-a5b8-492d09a2269d)
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/3d895b8d-4ed9-4977-a502-32f04f3f87cb)
![image](https://github.com/jungwoo-ha/WeeklyArxivTalk/assets/11782739/e02dd19a-3b8e-4313-a4f4-99c9d2285f92)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20230917] Weekly AI ArXiv 만담 시즌2 - 26회차 #92

Zoom: https://navercorp.zoom.us/j/92208940283

페이스북: https://www.facebook.com/weeklyaiarxivpage

News

ArXiv

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[20230917] Weekly AI ArXiv 만담 시즌2 - 26회차 #92

Description

Zoom: https://navercorp.zoom.us/j/92208940283

페이스북: https://www.facebook.com/weeklyaiarxivpage

News

ArXiv

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions