Skip to content

corsa-center/metrics

Repository files navigation

CASS Metrics Collection Framework

A system for collecting and analyzing software sustainability metrics for scientific open-source software.

Overview

This framework collects metrics from multiple sources and integrates with the CORSA Sustainability Dashboard.

Key Features

  • Multi-Source Data Collection: GitHub, Semantic Scholar, OpenAlex, Zenodo
  • Orchestrated Workflows: Configurable collection pipelines
  • CASS Framework: Four dimensions - Impact, Community, Viability, Quality
  • Dashboard Integration: Generate JSON data for CORSA dashboard
  • Automated Collection: GitHub Actions workflows
  • Extensible Framework: Placeholder metrics for incremental implementation

Quick Start

Prerequisites

  • Python 3.11+
  • Git

Installation

# Clone repository
git clone https://github.com/brtnfld/metrics
cd metrics

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Basic Usage

Using the Orchestrator

# Configure your workflow
cp config/api_credentials.yaml.example config/api_credentials.yaml
# Edit config/api_credentials.yaml with your API keys

# Run orchestrator
python orchestrator.py config/orchestrator.yaml

Generate Citation Metrics

# Set environment variables (optional but recommended)
export GITHUB_TOKEN="your_github_token"
export SEMANTIC_SCHOLAR_KEY="your_api_key"
export OPENALEX_EMAIL="your-email@example.com"

# Generate metrics
python scripts/generate_corsa_citations.py \
  --catalog config/software_catalog.yaml \
  --output output/citationMetrics.json

CASS Dimensions

The framework follows the CASS (Consortium for Advancement of Scientific Software) sustainability model with four main dimensions:

Dimension Status Description
Impact ✅ Implemented Software citation, adoption, and field research impact
Community 🔄 Placeholder Community health, engagement, and diversity
Viability ✅ Implemented Long-term sustainability, security, and licensing
Quality 🔄 Placeholder Documentation, code quality, testing, and usability

Each dimension contains multiple sub-categories and metrics that contribute to an overall sustainability score.

Project Structure

metrics/
├── collectors/              # CASS dimension collectors
│   ├── impact/
│   │   ├── citation.py     # Citation metrics (✅ implemented)
│   │   └── dimension.py    # Impact dimension (🔄 placeholder)
│   ├── community/
│   │   ├── community_health.py  # Legacy community health collector
│   │   └── dimension.py         # Community dimension (🔄 placeholder)
│   ├── viability/
│   │   ├── licensing.py         # License analysis (✅ implemented)
│   │   └── dimension.py         # Viability dimension (🔄 placeholder)
│   ├── quality/
│   │   └── dimension.py         # Quality dimension (🔄 placeholder)
│   └── catalog_sync.py          # Catalog synchronization
│
├── integrations/            # API integrations
│   ├── base.py             # Base API client
│   ├── github_api.py       # GitHub API
│   ├── semantic_scholar.py # Semantic Scholar API
│   ├── openalex.py         # OpenAlex API
│   └── zenodo.py           # Zenodo API
│
├── scripts/
│   └── generate_corsa_citations.py  # CORSA integration
│
├── config/
│   ├── orchestrator.yaml          # Workflow configuration
│   ├── software_catalog.yaml      # Software catalog
│   └── api_credentials.yaml.example
│
├── .github/workflows/
│   └── collect-and-sync.yml      # Automated collection
│
└── orchestrator.py          # Main orchestrator

Configuration

Environment variables (all optional for better rate limits):

Variable Description
GITHUB_TOKEN GitHub personal access token
SEMANTIC_SCHOLAR_KEY Semantic Scholar API key
OPENALEX_EMAIL Email for OpenAlex polite pool
ZENODO_TOKEN Zenodo access token

See CONFIGURATION.md for detailed setup.

Documentation

API Integrations

  • Semantic Scholar: Academic citations
  • OpenAlex: Citation database
  • Zenodo: DOI resolution and downloads
  • GitHub: Repository metadata and dependents

License

MIT License - See LICENSE for details.

Contact

About

Tools for the collection and analysis of metrics related to software sustainability

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages