Skip to content

Add Cloud-Native Performance Monitoring System with GitHub Integration #15624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

satwikmishra11
Copy link

Description

This PR introduces an end-to-end performance monitoring system for Apache DataFusion, automating benchmark execution on cloud infrastructure (AWS/GCP), integrating regression alerts with GitHub, and providing a React-based dashboard for performance analysis.

Key Changes:

  1. Benchmark Automation:
    • GitHub Actions workflows to trigger benchmarks on PRs/releases.
    • Terraform scripts for provisioning EC2 Spot Instances (AWS) or Preemptible VMs (GCP).
  2. Regression Detection:
    • GitHub Checks API integration to flag performance regressions in PRs.
    • Results stored in PostgreSQL + S3 for historical tracking.
  3. Dashboard:
    • React frontend with Plotly visualizations for comparing query performance across versions.

Dependencies:

  • Terraform >= 1.5.0
  • React >= 18.0
  • @datafusion-dev/client (new SDK for querying benchmark results)

Code Snippets

Terraform Script (AWS)

# infra/aws/benchmark.tf  
resource "aws_instance" "benchmark_runner" {  
  ami                  = "ami-0c55b159cbfafe1f0"  
  instance_type        = "c5.4xlarge"  
  spot_price           = "0.15"  
  wait_for_fulfillment = true  

  tags = {  
    Name = "datafusion-benchmark-runner"  
  }  
}  

@github-actions github-actions bot added development-process Related to development process of DataFusion core Core DataFusion crate labels Apr 7, 2025
@berkaysynnada
Copy link
Contributor

Thank you @satwikmishra11. This seems really promising, but before discussing about method/implementation details, there is such a concern #5504 (comment). Have you thought about how to deal with this problem?

@satwikmishra11
Copy link
Author

satwikmishra11 commented Apr 8, 2025

certainly, thank you for considering my pr @berkaysynnada

Automated Performance Benchmarking Solution for Apache DataFusion

Objective

Implement continuous performance monitoring using Conbench to:

  • Catch performance regressions early
  • Enable data-driven optimization decisions
  • Provide historical trend analysis

Solution Architecture

1. Automated Benchmark Execution (GitHub Actions)

.github/workflows/benchmarks.yml:

name: Performance Benchmarks

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 12 * * *'  # Daily runs

env:
  CONBENCH_URL: https://datafusion-conbench.ursa.dev
  CONBENCH_EMAIL: [email protected]
  CONBENCH_PASSWORD: ${{ secrets.CONBENCH_PASSWORD }}

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          sudo apt-get install -y cmake
          pip install -e conbench/
          
      - name: Run benchmarks
        run: |
          cd conbench
          conbench run --python-datafusion --capture=no
2. PR Integration
.github/workflows/benchmark-comment.yml:

yaml
name: Benchmark Results Comment
on:
  workflow_run:
    workflows: ["Performance Benchmarks"]
    types:
      - completed

jobs:
  comment:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/github-script@v6
        with:
          script: |
            // GitHub Script implementation
            // Posts comparison link to PR
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3. Dashboard Setup
conbench/docker-compose.yml:

yaml
version: '3'
services:
  conbench:
    image: conbench/conbench:latest
    ports:
      - "5000:5000"
    environment:
      - CONBENCH_DB_NAME=conbench
      - CONBENCH_DB_USER=postgres
      - CONBENCH_DB_PASSWORD=postgres
      - CONBENCH_DB_HOST=postgres
    depends_on:
      - postgres

  postgres:
    image: postgres:13
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=conbench
4. Benchmark Maintenance
Example benchmark (conbench/benchmarks/sort.py):

python
from conbench import Benchmark

class SortBenchmark(Benchmark):
    name = "sort"
    
    def run(self, **kwargs):
        ctx = datafusion.SessionContext()
        # Benchmark implementation
        self.record(
            {"time": duration},
            {},
            output=result
        )
Implementation Checklist
Secrets Configuration

Add CONBENCH_PASSWORD in GitHub repository secrets

Ensure GITHUB_TOKEN has appropriate permissions

Infrastructure Requirements

Dedicated runner for consistent benchmarking

Conbench instance hosting (cloud/on-prem)

Alert Configuration

Set statistical significance threshold (p < 0.05)

Configure notification channels (Slack/Email)

Expected Outcomes
Automated Execution

PR-triggered benchmarks

Daily performance snapshots

Historical commit-associated data

@alamb alamb marked this pull request as draft April 9, 2025 11:11
@alamb
Copy link
Contributor

alamb commented Apr 9, 2025

I am not sure about this PR -- it has many CI failures and doesn't quite seem to be a complete solution

Can you please get the CI checks passing and show an example of it working? Typically this would be done in your own personal fork and then we could evaluate how it would work in the main DataFusion repo

@alamb
Copy link
Contributor

alamb commented Apr 9, 2025

Thank you for this contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate development-process Related to development process of DataFusion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants