Skip to content

Tufts-University/py-onramp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

py-onramp

An onramp to the modern data / scientific python workflow.

Building The Docs Site Locally

Install the project dependencies you need with uv:

uv sync --group docs --group examples

Then build the docs site locally with Quarto:

uv run quarto render

This writes the generated site to site/. Use this when you want to error check if everything builds cleanly. Alternatively, you can serve the docs locally during development:

uv run quarto preview

Serving starts a local development server and rebuilds automatically as you edit the docs, which is more convenient while writing.

The Docs GitHub Actions workflow builds and deploys the site automatically for pushes to master, so generated site artifacts should not be committed or pushed.

Docs Modernization TODO

Must Have

  • docs/README.md
    • Replace the Anaconda-first setup guidance with a uv-first workflow.
    • Add a short "getting started" path for local development: install Python, create/sync the environment, and run examples with uv run.
  • docs/packaging.md
    • Rewrite the installable-project section around pyproject.toml instead of setup.py.
    • Prefer modern packaging guidance based on uv and PEP 517/518/621 rather than setuptools plus manual pip install -e . --user.
    • Add examples for editable installs, dependency groups, and lockfile-driven reproducibility.
  • docs/usability.md
    • Replace setup.py entry_points examples with [project.scripts] in pyproject.toml.
    • Update install/run examples so they reflect uv run, project-local environments, and modern CLI/TOML workflows.
    • Debugging with uv and VSCode
  • docs/performance.md
    • Replace or supplement the Numba section with a more current accelerator/JIT section built around JAX.
    • Refresh the parallelization discussion to mention current options such as JAX, Dask, multiprocessing/joblib, and when each is appropriate.
    • Add guidance on CPU vs GPU execution and the tradeoff between vectorization, JIT compilation, and distributed execution.
  • New docs section: numerical reliability and scientific testing
    • Cover floating-point behavior, tolerances, regression tests for computed results, and the difference between exact and approximate equality.
    • Add examples of testing mathematical invariants, edge cases, and algorithmic correctness, not just code paths.
    • Consider a short introduction to property-based testing for mathematical code.
  • New docs section: reproducible research workflows
    • Cover pinned dependencies, lockfiles, runtime metadata, seeds, and JAX deterministic PRNG patterns.
    • Show how to organize inputs, outputs, and experiment artifacts so results can be rerun months later.
    • Include guidance on config files and recording command invocations or parameters used for a run.
  • New docs section: notebooks versus scripts and packages
    • Explain when notebooks are useful, when code should move into modules, and how to avoid notebooks becoming the only source of truth.
    • Include basic reproducibility guidance for notebook-heavy workflows.
  • New docs section: git and collaboration for research code
    • Cover branching, small commits, reviewable changes, .gitignore, and what data or generated artifacts should stay out of version control.
    • Frame this as basic research hygiene rather than only "software engineering for teams".
  • New docs section: running on a SLURM cluster
    • Add a dedicated page covering sbatch, srun, resource requests, log files, modules, and activating project environments on the cluster.
    • Include a minimal batch script template and an example workflow for launching Python jobs reproducibly.
    • Cover common HPC topics that beginners hit immediately: job arrays, scratch space, file staging, checkpointing, and debugging failed jobs.
  • New docs section: code quality and automation
    • Add a short section on ruff, formatting, optional type checking, and pre-commit.
    • Explain the minimum automation worth adopting even for a small research codebase.
    • Add CI guidance so tests and docs build automatically on each push. Maybe.

Nice to Have

  • New docs section: plotting and communication
    • Cover publication-quality figures, labeling, styles, vector exports, and reproducible figure generation with Matplotlib.
    • TikZ? Liz?
  • New docs section: core scientific Python tools for math research
    • Add guidance on when to use NumPy, SciPy, sparse matrices, optimization/integration/linear algebra routines, and where JAX fits relative to NumPy/SciPy.
  • New docs section: research project layout
    • Show a recommended directory structure for src, tests, docs, scripts, data, results, and generated artifacts.
    • Discuss how to store metrics, including timings
    • Reproducibility
  • New docs section: remote and cluster-native development
    • Cover remote editing, data transfer, checkpoint/restart patterns, and environment portability across laptop and cluster.
  • New docs section: Github workflow
    • Unit tests and Github Actions
    • Collaborating
  • New docs section: OOP
  • New docs section: using Claude

Potential Small Projects

  • Using SLURM
    • Hello, World!
    • Parallelization: within script and using SLURM calls
    • Optuna: train basic DNN
    • Python vs. MATLAB
  • Compare optimizers/samplers on testbed 2D problems
    • GD vs Newton
    • JAX vs. loop
    • Deterministics vs. Stochastic
  • Testing numerics
    • Derivatives: finite difference/Taylor approximation, first and second order
    • Linear solvers: CG vs. BFGS
  • Replicate results from a paper
    • Write up documentation
  • Image Applications
    • Compression (SVD)
    • Deblurring (Inverse Problem)
  • Solve ODE/PDE
  • ...

About

An onramp to the modern data / scientific python workflow.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages