graspp_2025_spring

Overview

Run the code/notebooks in the cloud via Binder
Course materials for "Data Science for Public Policy", a course at the University of Tokyo's Graduate School of Public Policy (Graspp)
Instructor: Cory Baird
- Github
- Linkedin

Schedule

Module 1: How to Run Statistical Software (3 weeks)

Week 1 (Apr. 7): The Easy Way to Code and Useful Tools
Week 2 (Apr. 14): Acquiring Data through APIs
Week 3 (Apr. 21): Downloading and transforming with tools (functions)

Module 2: Visualization (3 weeks)

Week 4 (Apr. 28): Introduction to Data Visualization
Week 5 (May 12): More visualization and mapping libraries
Week 6 (May 19): Data pipeline and regression

Module 3: Regression, ML, AI

Week 7 (May 26): Regression & Machine Learning
Week 8 (June 2): ML & Neural Networks (A.I.)

Module 4: AI, LLM and Text analysis

Week 9 (June 9): Scraping
Week 10 (June 16): Reading PDF, NLP basics (Bag-of-words)
Week 11 (June 23): Using LLMs
Week 12 (June 30): Fine-tuning/training LLMs

Final Presentations

Week 13 (July 7): Final presentations

Group Assignments/Milestones

Milestone 1: Data selection and research question
- Grade: 20% of grade
- Task: Import and manipulate the data and show descriptive statistics in table or graphs.
- Due: by Week 4 (Apr. 28)
Milestone 2: Data Visulaization and Interpretation
- Grade: 20% of grade
- Task: Create at least 5 different visualizations (including charts) of the dataset.
- Due: by Week 4 (May. 26)
Milestone 3: Analytical Presentation
- Grade: 20% of grade
- Task: Present analysis in a whitepaper, slides or a dashboard
- Due: by Week 11 (June 23)

Course Objectives

Use Python to collect, clean, and analyze policy-relevant data.
Design and implement reproducible research workflows to effectively manage and utilize public data.
Apply statistical and machine learning methods to analyze policy problems
Process and analyze text data using traditional NLP and modern LLMs (ChatGPT) to extract meaningful insights.
Develop visualization to communicate research findings effectively to both technical and non-technical audiences.
Collaborate effectively using professional data science tools like GitHub, Overleaf, and Google Colab.

Necessary software

Code version control: Git/Github
- GitHub Account: Create account then "star" class page
- GitHub Desktop: For collaboration on code/notebooks
- Git software: https://git-scm.com/downloads
  - git software is automatically downloaded with github desktop for mac but may not be for windows
Running code AND notebooks
- VSCode: For running notebooks and code (Download Link)
  - Sublime/PyCharm also acceptable
- UV: Python version control and running notebooks (Download Link)
If you are having issues running the previous software
- The easiest way is to use github code space: This launches vscode in the cloud
- Other solutions:
  - Anaconda: https://www.anaconda.com/
  - Jupyter.org Try: https://jupyter.org/try
  - Google Colab: https://colab.research.google.com/
  - pip: The standard tool for installing and managing extra Python libraries that provide specialized functions for data analysis, machine learning, and more.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
notebooks		notebooks
references		references
reports		reports
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

graspp_2025_spring

Overview

Schedule

Group Assignments/Milestones

Course Objectives

Necessary software

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

graspp_2025_spring

Overview

Schedule

Group Assignments/Milestones

Course Objectives

Necessary software

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages