Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Analysis Figures (8+ figures)

This directory documents the analysis and motivation figures supporting the LabLink paper. These figures demonstrate the need for LabLink through software complexity analysis, GPU cost trends, dependency visualization, and deployment impact assessment.

Overview

Analysis figures are generated from data analysis, external datasets, and API calls (PyPI, conda-forge, Epoch AI, GitHub). Unlike architecture diagrams which parse Terraform, these figures require data collection and processing steps.

Scripts location: scripts/plotting/ and scripts/analysis/ Output location: figures/main/ and figures/supplementary/

The 8+ Figure Types

1. Software Complexity Over Time

Script: plot_software_complexity.py Purpose: Show exponential growth in scientific software dependencies (2000-2025) motivating need for reproducible environments

Data source: PyPI API, conda-forge feedstocks, GitHub Packages analyzed: NumPy, SciPy, matplotlib, pandas, TensorFlow, PyTorch, scikit-learn, AstroPy, Jupyter

Workflow:

# 1. Collect data (20-40 minutes)
cd scripts/analysis
uv run python collect_dependency_data.py --github-token YOUR_TOKEN

# 2. Process data
uv run python process_dependency_data.py

# 3. Generate figure
cd ../plotting
uv run python plot_software_complexity.py --preset paper --format both

Full documentation

2. SLEAP Dependency Network

Script: generate_sleap_dependency_graph.py Purpose: Visualize complexity of computational research software through SLEAP's dependency network

Data source: PyPI JSON API (transitive dependency resolution) Visualization: Network graph with degree centrality sizing, categorical coloring

Quick start:

uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --sleap-source ../sleap \
  --preset paper \
  --format pdf

Full documentation

3. GPU Cost Trends

Script: plot_gpu_cost_trends.py Purpose: Demonstrate sustained high cost of GPU hardware (2006-2025) motivating cloud-based access

Data source: Epoch AI Machine Learning Hardware Database (CC BY 4.0) Coverage: 160+ GPUs with pricing and performance metrics

Data setup:

Download from https://epoch.ai/data/machine-learning-hardware
Extract to data/raw/gpu_prices/ml_hardware.csv

Generate:

uv run python scripts/plotting/plot_gpu_cost_trends.py --preset poster --include-performance

Full documentation

4. Deployment Impact Timeline

Script: plot_deployment_impact.py Purpose: Show LabLink deployment history and workshop impact

Data source: data/processed/deployment_impact/workshops.csv

Generate:

uv run python scripts/plotting/plot_deployment_impact.py --preset paper

Full documentation

5. OS Distribution Analysis

Script: plot_os_distribution.py Purpose: Analyze operating system distribution in computational research

Data source: TBD (survey data, GitHub, conda-forge)

Generate:

uv run python scripts/plotting/plot_os_distribution.py --preset paper

Full documentation

6. GPU Reliance Scoring

Script: plot_gpu_reliance.py Purpose: Score Python packages by GPU dependency level (0-5 scale)

Data source: PyPI requires_dist metadata Scoring methodology:

0: No GPU support
1-2: Optional GPU acceleration
3-4: GPU-recommended
5: GPU-first packages (CuPy, Numba)

Generate:

uv run python scripts/plotting/plot_gpu_reliance.py --preset paper

Full documentation

7. Configuration Hierarchy

Script: plot_configuration_hierarchy.py, plot_configuration_hierarchy_simple.py Purpose: Visualize LabLink's hierarchical configuration system

Data source: Analysis of config.yaml structure

Generate:

# Detailed version
uv run python scripts/plotting/plot_configuration_hierarchy.py --preset poster

# Simple version
uv run python scripts/plotting/plot_configuration_hierarchy_simple.py --preset paper

Full documentation

8. QR Codes

Script: generate_qr_codes.py Purpose: Generate QR codes for demo access, GitHub repos, documentation

Generate:

uv run python scripts/plotting/generate_qr_codes.py

Full documentation

Common Patterns

Font Size Presets

All scripts support consistent presets:

--preset paper          # 14pt fonts, 300 DPI, ~6.5" width
--preset poster         # 20pt fonts, 300 DPI, ~12" width
--preset presentation   # 16pt fonts, 150 DPI, 10"x7.5"

Output Formats

--format png      # Default, 300 DPI raster
--format pdf      # Vector for LaTeX
--format svg      # Vector for web/presentations
--format both     # PNG + PDF

Output Location

--output figures/main           # Main text figures
--output figures/supplementary  # Supplementary figures

Verbose Logging

--verbose  # or -v for detailed logging

Data Collection Workflow

Some figures require multi-step data collection:

1. Software Complexity

scripts/analysis/collect_dependency_data.py    # Fetch from APIs (20-40 min)
scripts/analysis/process_dependency_data.py    # Clean and aggregate
scripts/plotting/plot_software_complexity.py   # Visualize

2. SLEAP Dependencies

# Auto-fetches and caches on first run
scripts/plotting/generate_sleap_dependency_graph.py
# Uses cached data on subsequent runs

3. GPU Cost Trends

# Manual download required
# 1. Visit https://epoch.ai/data/machine-learning-hardware
# 2. Download ml_hardware.csv to data/raw/gpu_prices/
scripts/plotting/plot_gpu_cost_trends.py

Data Sources Summary

Figure	Data Source	Collection Method	Cache Location
Software Complexity	PyPI, conda-forge, GitHub	`collect_dependency_data.py`	`data/processed/dependency_data.csv`
SLEAP Dependencies	PyPI JSON API	Auto-fetch on first run	`data/processed/sleap_dependencies.json`
GPU Costs	Epoch AI dataset	Manual download	`data/raw/gpu_prices/ml_hardware.csv`
Deployment Impact	Manual curation	CSV file	`data/processed/deployment_impact/workshops.csv`
OS Distribution	TBD	TBD	TBD
GPU Reliance	PyPI metadata	Part of software complexity	Uses processed data
Configuration	Code analysis	Static analysis	N/A
QR Codes	URLs	Encode to QR	`figures/main/qr-*.png`

Caching and Refresh

Most scripts cache API responses to avoid repeated network calls:

# Use cached data (default)
uv run python scripts/plotting/<script>.py

# Force refresh from APIs
uv run python scripts/plotting/<script>.py --force-refresh

Environment Variables

Optional convenience variables:

# SLEAP dependencies
export SLEAP_PATH=../sleap

# GitHub API (higher rate limits)
export GITHUB_TOKEN=ghp_your_token_here

Metadata Files

All scripts generate metadata files documenting:

Data sources (URLs, API endpoints)
Generation timestamp
Package versions
Data statistics (counts, date ranges)
Parameters used

Example: software_complexity_metadata.txt, gpu_cost_trends-metadata.txt

Troubleshooting

API rate limits:

# Set GitHub token for higher limits
export GITHUB_TOKEN=ghp_your_token_here

Missing data files:

# Check data/ subdirectory READMEs for download instructions
ls data/raw/*/README.md

Slow data collection:

# Use cached data (don't use --force-refresh unless needed)
# Expected times:
# - Software complexity: 20-40 minutes first run
# - SLEAP dependencies: 1-2 minutes first run
# - GPU costs: <1 minute (pre-downloaded dataset)

Font size issues:

# For large-format posters, use poster preset
--preset poster  # 20pt fonts, 43% larger

# Test at actual print dimensions

Examples

Generate All Analysis Figures for Paper

# Software complexity
uv run python scripts/plotting/plot_software_complexity.py --preset paper --format both

# SLEAP dependencies
uv run python scripts/plotting/generate_sleap_dependency_graph.py --preset paper --format pdf

# GPU costs
uv run python scripts/plotting/plot_gpu_cost_trends.py --preset paper --format both

# Deployment impact
uv run python scripts/plotting/plot_deployment_impact.py --preset paper --format both

# OS distribution
uv run python scripts/plotting/plot_os_distribution.py --preset paper --format both

# GPU reliance
uv run python scripts/plotting/plot_gpu_reliance.py --preset paper --format both

# Configuration hierarchy
uv run python scripts/plotting/plot_configuration_hierarchy_simple.py --preset paper --format both

# QR codes
uv run python scripts/plotting/generate_qr_codes.py

Generate for Poster Presentation

# Use poster preset for all figures
uv run python scripts/plotting/plot_software_complexity.py --preset poster --format pdf
uv run python scripts/plotting/plot_gpu_cost_trends.py --preset poster --include-performance --format pdf
uv run python scripts/plotting/generate_sleap_dependency_graph.py --preset poster --format pdf

Recent Updates

Added QR code generation
Added configuration hierarchy visualization
Fixed GPU reliance scoring bug (see docs/archived/gpu-reliance-bug-2025-11.md)
Added deployment impact timeline
Improved font preset consistency across all scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Analysis Figures (8+ figures)

Overview

The 8+ Figure Types

1. Software Complexity Over Time

2. SLEAP Dependency Network

3. GPU Cost Trends

4. Deployment Impact Timeline

5. OS Distribution Analysis

6. GPU Reliance Scoring

7. Configuration Hierarchy

8. QR Codes

Common Patterns

Font Size Presets

Output Formats

Output Location

Verbose Logging

Data Collection Workflow

1. Software Complexity

2. SLEAP Dependencies

3. GPU Cost Trends

Data Sources Summary

Caching and Refresh

Environment Variables

Metadata Files

Troubleshooting

Examples

Generate All Analysis Figures for Paper

Generate for Poster Presentation

Related Documentation

Recent Updates

FilesExpand file tree

analysis-figures

Directory actions

More options

Directory actions

More options

Latest commit

History

analysis-figures

Folders and files

parent directory

README.md

Analysis Figures (8+ figures)

Overview

The 8+ Figure Types

1. Software Complexity Over Time

2. SLEAP Dependency Network

3. GPU Cost Trends

4. Deployment Impact Timeline

5. OS Distribution Analysis

6. GPU Reliance Scoring

7. Configuration Hierarchy

8. QR Codes

Common Patterns

Font Size Presets

Output Formats

Output Location

Verbose Logging

Data Collection Workflow

1. Software Complexity

2. SLEAP Dependencies

3. GPU Cost Trends

Data Sources Summary

Caching and Refresh

Environment Variables

Metadata Files

Troubleshooting

Examples

Generate All Analysis Figures for Paper

Generate for Poster Presentation

Related Documentation

Recent Updates