Skip to content

[Feature]: “Airtimebill” Per-User/NS Usage Tracking for Fair Use #66

@KenKilty

Description

@KenKilty

Is there an existing feature request for this?

  • I have searched the existing issues

Problem or Motivation

In a shared KubeAirunway cluster, there isn't a way to tell who’s burning through GPUs or using certain models. One team can crowd out everyone else without anyone noticing. This could lead to people just deploying separate instances per team to keep the peace, but that chews up extra compute and more work to manage. "Airtimebill" would add straightforward tracking by user or namespace, so you can see the big users, nudge them with quotas if needed, and have a shared dashboard for openness.

Down the road, for larger self-hosted setups, it could output reports for showback or even chargeback if you have a need to billback.

Proposed Solution

Airtimebill tracks GPU, memory, and inference usage per user/namespace via labeled Prometheus metrics from an auth proxy, displaying simple dashboards, CSV exports, to ensure fair sharing in multi-team setups.

Components:

  • Proxy Layer: Injects  x-user,  x-nas from OIDC claims; logs requests.
  • Metrics Endpoint: Backend emits  /metrics  with labels (CRD controller annotates pods)
  • Storage/Query: Prometheus (user-provided) or in-mem for small setups; query sum(gpu_time) by (user,ns) 
  • UI: Embed Grafana panel or static charts; export via PromQL-to-CSV.

Alternatives Considered

No response

Feature Area

Metrics / Monitoring

How important is this feature to you?

Nice to have

Mockups or Examples

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions