Skip to content

corybaird/graspp_2025_spring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

graspp_2025_spring

Overview

  • Run the code/notebooks in the cloud via Binder

    • Binder
  • Course materials for "Data Science for Public Policy", a course at the University of Tokyo's Graduate School of Public Policy (Graspp)

  • Instructor: Cory Baird

Schedule

Module 1: How to Run Statistical Software (3 weeks)

  • Week 1 (Apr. 7): The Easy Way to Code and Useful Tools
  • Week 2 (Apr. 14): Acquiring Data through APIs
  • Week 3 (Apr. 21): Downloading and transforming with tools (functions)

Module 2: Visualization (3 weeks)

  • Week 4 (Apr. 28): Introduction to Data Visualization
  • Week 5 (May 12): More visualization and mapping libraries
  • Week 6 (May 19): Data pipeline and regression

Module 3: Regression, ML, AI

  • Week 7 (May 26): Regression & Machine Learning
  • Week 8 (June 2): ML & Neural Networks (A.I.)

Module 4: AI, LLM and Text analysis

  • Week 9 (June 9): Scraping
  • Week 10 (June 16): Reading PDF, NLP basics (Bag-of-words)
  • Week 11 (June 23): Using LLMs
  • Week 12 (June 30): Fine-tuning/training LLMs

Final Presentations

  • Week 13 (July 7): Final presentations

Group Assignments/Milestones

  • Milestone 1: Data selection and research question

    • Grade: 20% of grade
    • Task: Import and manipulate the data and show descriptive statistics in table or graphs.
    • Due: by Week 4 (Apr. 28)
  • Milestone 2: Data Visulaization and Interpretation

    • Grade: 20% of grade
    • Task: Create at least 5 different visualizations (including charts) of the dataset.
    • Due: by Week 4 (May. 26)
  • Milestone 3: Analytical Presentation

    • Grade: 20% of grade
    • Task: Present analysis in a whitepaper, slides or a dashboard
    • Due: by Week 11 (June 23)

Course Objectives

  • Use Python to collect, clean, and analyze policy-relevant data.
  • Design and implement reproducible research workflows to effectively manage and utilize public data.
  • Apply statistical and machine learning methods to analyze policy problems
  • Process and analyze text data using traditional NLP and modern LLMs (ChatGPT) to extract meaningful insights.
  • Develop visualization to communicate research findings effectively to both technical and non-technical audiences.
  • Collaborate effectively using professional data science tools like GitHub, Overleaf, and Google Colab.

Necessary software

  • Code version control: Git/Github

  • Running code AND notebooks

    • VSCode: For running notebooks and code (Download Link)
      • Sublime/PyCharm also acceptable
    • UV: Python version control and running notebooks (Download Link)
  • If you are having issues running the previous software

About

Course materials for "Data Science for Public Policy", University of Tokyo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors