Skip to content

feat: Implement automatic package version compaction #13

@mairas

Description

@mairas

Problem

The APT repository currently accumulates all historical versions of packages indefinitely. This leads to:

  • Repository size bloat over time
  • Multiple versions of the same package in metadata files
  • Confusion about which version to install
  • Unnecessary storage and bandwidth costs

Proposed Solution

Implement automatic package version compaction with intelligent retention policies:

For Semantic Version Packages (e.g., 1.2.3)

Keep a maximum of 4 previous releases:

  • One previous major version (e.g., if current is 2.x.x, keep latest 1.x.x)
  • One previous minor version (e.g., if current is 1.3.x, keep latest 1.2.x)
  • One previous patch version (e.g., if current is 1.2.3, keep 1.2.2)
  • One previous build/pre-release version

Example retention for package at v2.3.4:

  • Keep: 2.3.4 (current), 2.3.3 (prev patch), 2.2.5 (prev minor), 1.8.2 (prev major)
  • Remove: All other versions

For Non-Semantic Version Packages

Keep a maximum of 2 previous versions:

  • Current version
  • One previous version

Example for package at version 20240115-3:

  • Keep: 20240115-3 (current), 20240115-2 (previous)
  • Remove: All older versions

Implementation Details

The compaction should run:

  1. After each package update - When new packages are added
  2. During scheduled rebuilds - Clean up during daily maintenance
  3. Via manual trigger - Allow forced cleanup when needed

Algorithm

for each package in pool:
  versions = get_all_versions(package)
  if is_semver(versions):
    retained = select_semver_retention(versions)
  else:
    retained = select_simple_retention(versions)
  
  for version in versions:
    if version not in retained:
      remove_package(version)

Benefits

  1. Controlled repository size - Predictable storage requirements
  2. Cleaner package listings - Users see relevant versions only
  3. Faster metadata generation - Fewer packages to scan
  4. Rollback capability - Still maintains previous versions for recovery

Configuration Options

Consider making retention configurable:

  • Environment variables for retention counts
  • Per-package override capability
  • Different policies for stable vs unstable channels

Edge Cases to Handle

  1. New packages - Don't remove anything if fewer versions than retention policy
  2. Rollback protection - Never remove currently installed versions (if trackable)
  3. Pre-release versions - Special handling for -dev, -beta, -rc versions
  4. Epoch versions - Handle Debian epoch (1:2.3.4) correctly

Alternative Approaches

  1. Time-based retention - Keep packages less than X days old
  2. Size-based retention - Keep total repository under X GB
  3. Manual cleanup - Require explicit removal commands
  4. Reference counting - Track which versions are in use

Related Issues

Acceptance Criteria

  • Semver packages retain appropriate previous versions
  • Non-semver packages retain 2 versions maximum
  • Old versions are automatically removed after new uploads
  • Repository size remains bounded
  • Compaction is logged for audit purposes
  • No data loss of important versions

🔧 Implementation Requirements

MANDATORY: Follow halos-distro/docs/DEVELOPMENT_WORKFLOW.md during implementation.

See the complete workflow at: https://github.com/hatlabs/halos-distro/blob/main/docs/DEVELOPMENT_WORKFLOW.md

✅ Implementation Checklist

Track your progress through the mandatory development workflow:

□ Phase 1: EXPLORE (No Coding!)

  • Read GitHub issue completely
  • Read /docs/SPEC.md relevant sections (if applicable)
  • Read /docs/ARCHITECTURE.md relevant sections (if applicable)
  • Explore existing code WITHOUT writing anything
  • Use Task tool with subagent_type=Explore for complex navigation

□ Phase 2: PLAN

  • Use think hard to evaluate approaches
  • Create detailed implementation plan
  • Identify test scenarios
  • Document plan in comment/file

□ Phase 3: TEST (TDD)

  • Write comprehensive tests FIRST
  • Verify tests FAIL (no mocks)
  • Commit tests: git commit -m "test: add X tests"

□ Phase 4: IMPLEMENT

  • Write code to pass tests
  • Run tests after each change
  • DO NOT modify tests to pass
  • Iterate until ALL tests pass

□ Phase 5: VERIFY

  • Use subagents to verify correctness
  • Check for edge cases
  • Verify matches requirements
  • Ensure follows architecture patterns

□ Phase 6: COMMIT

  • All tests pass
  • Code linted and formatted
  • Type checks pass
  • Commit: git commit -m "feat: implement X\n\nFixes #13"
  • Create PR if needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions