Add Cloud-Native Performance Monitoring System with GitHub Integration #15624

satwikmishra11 · 2025-04-07T17:55:54Z

Description

This PR introduces an end-to-end performance monitoring system for Apache DataFusion, automating benchmark execution on cloud infrastructure (AWS/GCP), integrating regression alerts with GitHub, and providing a React-based dashboard for performance analysis.

Key Changes:

Benchmark Automation:
- GitHub Actions workflows to trigger benchmarks on PRs/releases.
- Terraform scripts for provisioning EC2 Spot Instances (AWS) or Preemptible VMs (GCP).
Regression Detection:
- GitHub Checks API integration to flag performance regressions in PRs.
- Results stored in PostgreSQL + S3 for historical tracking.
Dashboard:
- React frontend with Plotly visualizations for comparing query performance across versions.

Dependencies:

Terraform >= 1.5.0
React >= 18.0
@datafusion-dev/client (new SDK for querying benchmark results)

Code Snippets

Terraform Script (AWS)

# infra/aws/benchmark.tf  
resource "aws_instance" "benchmark_runner" {  
  ami                  = "ami-0c55b159cbfafe1f0"  
  instance_type        = "c5.4xlarge"  
  spot_price           = "0.15"  
  wait_for_fulfillment = true  

  tags = {  
    Name = "datafusion-benchmark-runner"  
  }  
}

berkaysynnada · 2025-04-08T07:08:24Z

Thank you @satwikmishra11. This seems really promising, but before discussing about method/implementation details, there is such a concern #5504 (comment). Have you thought about how to deal with this problem?

satwikmishra11 · 2025-04-08T12:18:54Z

certainly, thank you for considering my pr @berkaysynnada

Automated Performance Benchmarking Solution for Apache DataFusion

Objective

Implement continuous performance monitoring using Conbench to:

Catch performance regressions early
Enable data-driven optimization decisions
Provide historical trend analysis

Solution Architecture

1. Automated Benchmark Execution (GitHub Actions)

.github/workflows/benchmarks.yml:

name: Performance Benchmarks

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 12 * * *'  # Daily runs

env:
  CONBENCH_URL: https://datafusion-conbench.ursa.dev
  CONBENCH_EMAIL: [email protected]
  CONBENCH_PASSWORD: ${{ secrets.CONBENCH_PASSWORD }}

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          sudo apt-get install -y cmake
          pip install -e conbench/
          
      - name: Run benchmarks
        run: |
          cd conbench
          conbench run --python-datafusion --capture=no
2. PR Integration
.github/workflows/benchmark-comment.yml:

yaml
name: Benchmark Results Comment
on:
  workflow_run:
    workflows: ["Performance Benchmarks"]
    types:
      - completed

jobs:
  comment:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/github-script@v6
        with:
          script: |
            // GitHub Script implementation
            // Posts comparison link to PR
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3. Dashboard Setup
conbench/docker-compose.yml:

yaml
version: '3'
services:
  conbench:
    image: conbench/conbench:latest
    ports:
      - "5000:5000"
    environment:
      - CONBENCH_DB_NAME=conbench
      - CONBENCH_DB_USER=postgres
      - CONBENCH_DB_PASSWORD=postgres
      - CONBENCH_DB_HOST=postgres
    depends_on:
      - postgres

  postgres:
    image: postgres:13
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=conbench
4. Benchmark Maintenance
Example benchmark (conbench/benchmarks/sort.py):

python
from conbench import Benchmark

class SortBenchmark(Benchmark):
    name = "sort"
    
    def run(self, **kwargs):
        ctx = datafusion.SessionContext()
        # Benchmark implementation
        self.record(
            {"time": duration},
            {},
            output=result
        )
Implementation Checklist
Secrets Configuration

Add CONBENCH_PASSWORD in GitHub repository secrets

Ensure GITHUB_TOKEN has appropriate permissions

Infrastructure Requirements

Dedicated runner for consistent benchmarking

Conbench instance hosting (cloud/on-prem)

Alert Configuration

Set statistical significance threshold (p < 0.05)

Configure notification channels (Slack/Email)

Expected Outcomes
Automated Execution

PR-triggered benchmarks

Daily performance snapshots

Historical commit-associated data

alamb · 2025-04-09T11:13:06Z

I am not sure about this PR -- it has many CI failures and doesn't quite seem to be a complete solution

Can you please get the CI checks passing and show an example of it working? Typically this would be done in your own personal fork and then we could evaluate how it would work in the main DataFusion repo

alamb · 2025-04-09T11:13:36Z

Thank you for this contribution

github-actions · 2025-06-24T02:12:34Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

satwikmishra11 added 7 commits April 7, 2025 23:09

Create main.rs

f0ed869

Create handlers.rs

50e84df

Create cleanup.sh

5832dd7

Update cleanup.sh

844b188

Create notify.yml

456feec

Create test_handlers.rs

59ef2a8

Create default.toml

c197091

github-actions bot added development-process Related to development process of DataFusion core Core DataFusion crate labels Apr 7, 2025

alamb marked this pull request as draft April 9, 2025 11:11

Merge branch 'apache:main' into main

fbdc1e9

github-actions bot added the Stale PR has not had any activity for some time label Jun 24, 2025

github-actions bot closed this Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Cloud-Native Performance Monitoring System with GitHub Integration #15624

Add Cloud-Native Performance Monitoring System with GitHub Integration #15624

Uh oh!

satwikmishra11 commented Apr 7, 2025

Uh oh!

berkaysynnada commented Apr 8, 2025

Uh oh!

satwikmishra11 commented Apr 8, 2025 •

edited

Loading

Uh oh!

alamb commented Apr 9, 2025

Uh oh!

alamb commented Apr 9, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

Uh oh!

Add Cloud-Native Performance Monitoring System with GitHub Integration #15624

Add Cloud-Native Performance Monitoring System with GitHub Integration #15624

Uh oh!

Conversation

satwikmishra11 commented Apr 7, 2025

Description

Code Snippets

Terraform Script (AWS)

Uh oh!

berkaysynnada commented Apr 8, 2025

Uh oh!

satwikmishra11 commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Performance Benchmarking Solution for Apache DataFusion

Objective

Solution Architecture

1. Automated Benchmark Execution (GitHub Actions)

Uh oh!

alamb commented Apr 9, 2025

Uh oh!

alamb commented Apr 9, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

Uh oh!

satwikmishra11 commented Apr 8, 2025 •

edited

Loading