Skip to content

Refactor notebooks into reusable src helpers while preserving the existing spatial analysis workflow#1

Open
panghuanzhi62 wants to merge 43 commits into
mainfrom
agentic-upgrade
Open

Refactor notebooks into reusable src helpers while preserving the existing spatial analysis workflow#1
panghuanzhi62 wants to merge 43 commits into
mainfrom
agentic-upgrade

Conversation

@panghuanzhi62
Copy link
Copy Markdown
Owner

@panghuanzhi62 panghuanzhi62 commented Apr 10, 2026

Summary

This PR consolidates the current engineering upgrade for the Tokyo foreign population spatial analysis repository while preserving the existing research workflow.

The source of truth for this branch is:

  • notebooks 0008 remain the reference orchestration and interpretation layer
  • notebook-to-module extraction is already completed through 08
  • reusable helper logic now lives under src/tokyo_foreigners/
  • data_raw/ remains the canonical raw-data directory
  • this PR does not migrate paths to data/raw/
  • this PR does not use bulk notebook rewrite scripts
  • this PR does not redo completed extraction work

In addition to the extraction/refactor work, this branch now includes the minimum safe engineering hardening needed to make the repository more reproducible and reviewable.


What changed

1. Reusable helper layer under src/tokyo_foreigners/

Logic previously repeated across notebooks has been centralized into reusable helper modules, including:

  • paths.py
  • boundaries.py
  • station_accessibility.py
  • land_price.py
  • ols.py
  • spatial_diagnostics.py
  • mgwr.py

This keeps the notebook workflow intact while reducing duplication and making future testing and maintenance easier.

2. Path handling aligned with current repository policy

Path handling has been centralized around the repository’s current canonical structure:

  • data_raw/
  • data_processed/
  • outputs/
  • notebooks/
  • docs/

This branch intentionally keeps data_raw/ as canonical and does not introduce a data/raw/ migration.

3. Environment baseline moved to uv

This branch adds a minimal, repository-level environment baseline:

  • pyproject.toml
  • uv.lock

uv is now the primary environment and dependency entry point for the project.

This change is meant to standardize environment setup while preserving the current notebook-centered workflow.

4. Minimal linting for reusable helpers

A small Ruff baseline has been added and applied to the reusable helper layer under src/tokyo_foreigners/.

Scope is intentionally limited:

  • linting/formatting is focused on src/ and related testable helper code
  • notebooks are not bulk-reformatted or mass-rewritten

5. Minimal pytest coverage for stable helper modules

This branch adds a lightweight tests/ directory with focused tests for stable helper behavior, including examples such as:

  • path inventory / project root checks
  • boundary helper behavior
  • baseline OLS preparation logic
  • station-attribute attachment logic
  • land-price filtering logic

The goal is not full coverage. The goal is to protect the most stable reusable logic with small, reviewable tests.

6. Lightweight GitHub Actions CI

A minimal CI workflow has been added to run on push / pull request. It currently performs:

  • uv sync --locked
  • uv run ruff check ...
  • uv run pytest -q

This is intentionally lightweight and does not attempt notebook execution in CI.

7. Documentation and repository narrative updates

Repository documentation has been updated to reflect the current state more accurately, including:

  • notebook-centered workflow still being the primary analytical path
  • src/tokyo_foreigners/ as the reusable helper layer
  • uv as the environment entry point
  • minimal lint / test / CI hardening now in place
  • a short engineering status note under docs/refactor_status.md

Why this PR matters

This PR moves the repository from a mostly notebook-driven research project with environment assumptions tied to a local machine toward a more reproducible and maintainable research codebase, without changing the project’s core analytical style.

In practical terms, the repository now has:

  • a clearer helper/module boundary
  • a standardized environment entry point
  • minimal static checks
  • minimal tests for stable helpers
  • lightweight CI validation
  • improved portfolio/readability value for reviewers

What this PR does not do

This PR intentionally does not:

  • replace notebooks with a fully packaged pipeline
  • migrate data_raw/ to data/raw/
  • use bulk notebook rewrite scripts
  • rerun or redesign the substantive analytical workflow
  • add Docker, PostGIS, Streamlit, or app-layer infrastructure
  • execute the full notebook workflow automatically in CI
  • redo extraction work that has already been completed through notebook 08

Validation

The branch has been validated at a minimal engineering level through:

  • successful uv sync
  • successful uv run jupyter lab
  • successful import smoke check for src/tokyo_foreigners
  • passing Ruff checks for the reusable helper layer
  • passing pytest for the current minimal stable test set
  • passing GitHub Actions CI for lint + tests

Reviewer guidance

The most important review questions for this PR are:

  1. Does the branch preserve the existing notebook workflow and repository intent?
  2. Is the helper extraction under src/tokyo_foreigners/ coherent and non-disruptive?
  3. Are the environment, lint, test, and CI additions appropriately minimal for the current stage?
  4. Does the branch correctly keep data_raw/ as canonical without introducing path-policy drift?

Net result

After this PR, the repository remains a notebook-centered spatial analysis portfolio, but with a stronger engineering baseline:

  • reusable helpers under src/tokyo_foreigners/
  • canonical data_raw/ path policy preserved
  • uv-managed environment
  • minimal Ruff
  • minimal pytest
  • lightweight GitHub Actions CI
  • clearer documentation of current repository status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant