Skip to content

707 migrate to graviton instances#708

Open
MuhammadQadora wants to merge 11 commits into
mainfrom
707-migrate-to-graviton-instances
Open

707 migrate to graviton instances#708
MuhammadQadora wants to merge 11 commits into
mainfrom
707-migrate-to-graviton-instances

Conversation

@MuhammadQadora
Copy link
Copy Markdown
Collaborator

@MuhammadQadora MuhammadQadora commented May 20, 2026

Summary by CodeRabbit

  • Infrastructure Updates

    • Updated available compute options to newer ARM-based families across deployment plans (multiple AWS instance type migrations).
  • CI/CD Improvements

    • Added a dedicated multi-architecture healthcheck build that produces amd64/arm64 artifacts consumed by the main pipeline.
  • Docker / Packaging

    • Images now apply a targeted library upgrade during build and package prebuilt healthcheck binaries for each architecture.
  • Misc

    • Ignore list extended to exclude the prebuilt binary directory.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CI jobs to cross-compile and upload amd64/arm64 healthcheck binaries and consume them in multi-arch image builds; switches healthcheck image to use prebuilts; migrates AWS instance types to ARM families across compose manifests and test matrices; and adds targeted libgnutls30t64 upgrades in runtime Dockerfiles.

Changes

Instance type migration, CI multi-arch, and image/runtime updates

Layer / File(s) Summary
CI: build-healthcheck job and integrate artifact into builds
.github/workflows/build-test-image.yaml, .github/workflows/build-release-image.yaml
Adds build-healthcheck jobs that cross-compile amd64/arm64 healthcheck binaries and upload them as healthcheck-binaries; build-and-push depends on and downloads these prebuilts and builds images for linux/amd64,linux/arm64.
Healthcheck image uses prebuilts & ignore
src/healthcheck_rs/Dockerfile, src/healthcheck_rs/.gitignore
Converts the healthcheck image to copy architecture-specific prebuilt binaries via ARG TARGETARCH; marks them executable; adds bin/ to .gitignore.
Test workflow instance matrix updates
.github/workflows/build-test-image.yaml
Updates Enterprise and Pro AWS test matrix entries from t2.medium/m6i.large to t4g.medium/m6g.large; updates commented upgrade scenarios to t4g.medium.
Compose manifests: Pro & Enterprise instance migration
compose/omnistrate.pro.yaml, compose/omnistrate.enterprise.yaml, compose/omnistrate.enterprise.byoa.yaml
Migrates selectable AWS instance families to ARM equivalents (e.g., t3a.smallt4g.small, t2.medium/m6i*t4g.medium/m6g* etc.) across sentinel, node, and cluster rebalance entries.
Runtime images: libgnutls30t64 only-upgrade
src/falkordb-cluster/Dockerfile, src/falkordb-node/Dockerfile, src/falkordb-sentinel/Dockerfile
Each Dockerfile now runs apt-get install -y --only-upgrade libgnutls30t64 during package installation.
Release workflow Omnistrate plan cleanup
.github/workflows/build-release-image.yaml
Omnistrate plan cleanup sed now removes t4g.medium (was t2.medium).

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels: enhancement

Suggested reviewers:

  • AviAvni
  • dudizimber

"A rabbit hums beside the CI,
prebuilts hopping, tiny and spry,
t2s bow out as t4g leaps high,
binaries cross the arching sky,
libgnutls snug, all images ply."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'migrate to graviton instances' directly and clearly reflects the main objective of the pull request: switching AWS instance types from x86_64-based instances (t2, t3a, m6i, r6i, c6i) to ARM64-based Graviton instances (t4g, m6g, r6g, c6g).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 707-migrate-to-graviton-instances

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MuhammadQadora MuhammadQadora requested a review from dudizimber May 20, 2026 12:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/build-release-image.yaml (1)

203-209: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pro plan cleanup likely no-op after labeledOptions migration.

This step now removes t4g.medium, but pro options were migrated to human-readable labels. On pro, this likely won’t remove the intended AWS 2 vCPU option during release plan publishing.

Suggested fix
       - name: Remove e2-machine from pro and enterprise
         if: ${{ contains( vars.OMNISTRATE_RELEASE_PLANS, matrix.plans.key) && (matrix.plans.key == 'pro' || matrix.plans.key == 'enterprise') }}
         working-directory: compose
         run: |
           sed -i 's/- "e2-medium"//g' ${{ matrix.plans.file }}
-          sed -i 's/- "t4g.medium"//g' ${{ matrix.plans.file }}
+          # enterprise (raw instance type)
+          sed -i 's/- "t4g.medium"//g' ${{ matrix.plans.file }}
+          # pro (labeled options)
+          sed -i 's/- "AWS 2 vCPU, 4 GB"//g' ${{ matrix.plans.file }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-release-image.yaml around lines 203 - 209, The
current workflow step ("Remove e2-machine from pro and enterprise") uses sed to
remove legacy instance keys "- \"e2-medium\"" and "- \"t4g.medium\"" from
matrix.plans.file but pro plans were migrated to human-readable labels, so the
sed will be a no-op for pro; update this step to also remove the new
human-readable label(s) used in the pro release plan (or add a second sed that
strips both legacy keys and the migrated label variant), e.g. add/remove a sed
pattern matching the exact human-readable option string introduced by the
labeledOptions migration (verify the label text in the release plan files and
include that exact string in the sed command alongside the existing "-
\"t4g.medium\"" and "- \"e2-medium\"" patterns).
🧹 Nitpick comments (1)
compose/omnistrate.pro.yaml (1)

375-400: ⚡ Quick win

Deduplicate repeated labeledOptions maps to avoid config drift.

The same 24-entry labeledOptions mapping is copied in six places. A single anchor would reduce update mistakes in future instance-family migrations.

♻️ Proposed refactor
+x-node-instance-labeled-options: &node-instance-labeled-options
+  "GCP 2 vCPU, 4 GB": "e2-medium"
+  "GCP 2 vCPU, 8 GB": "e2-standard-2"
+  "GCP 2 vCPU, 16 GB": "e2-highmem-2"
+  "GCP 4 vCPU, 16 GB": "e2-standard-4"
+  "GCP 4 vCPU, 32 GB": "e2-highmem-4"
+  "GCP 8 vCPU, 32 GB": "e2-standard-8"
+  "GCP 8 vCPU, 64 GB": "e2-highmem-8"
+  "GCP 4 vCPU, 8 GB": "e2-custom-4-8192"
+  "GCP 8 vCPU, 16 GB": "e2-custom-8-16384"
+  "GCP 16 vCPU, 32 GB": "e2-custom-16-32768"
+  "GCP 32 vCPU, 64 GB": "e2-custom-32-65536"
+  "AWS 2 vCPU, 4 GB": "t4g.medium"
+  "AWS 2 vCPU, 8 GB": "m6g.large"
+  "AWS 4 vCPU, 16 GB": "m6g.xlarge"
+  "AWS 8 vCPU, 32 GB": "m6g.2xlarge"
+  "AWS 16 vCPU, 64 GB": "m6g.4xlarge"
+  "AWS 32 vCPU, 128 GB": "m6g.8xlarge"
+  "AWS 2 vCPU, 16 GB": "r6g.large"
+  "AWS 4 vCPU, 32 GB": "r6g.xlarge"
+  "AWS 8 vCPU, 64 GB": "r6g.2xlarge"
+  "AWS 16 vCPU, 128 GB": "r6g.4xlarge"
+  "AWS 4 vCPU, 8 GB": "c6g.xlarge"
+  "AWS 8 vCPU, 16 GB": "c6g.2xlarge"
+  "AWS 16 vCPU, 32 GB": "c6g.4xlarge"
+  "AWS 32 vCPU, 64 GB": "c6g.8xlarge"

-        labeledOptions:
-          ...
+        labeledOptions: *node-instance-labeled-options

Also applies to: 577-602, 880-905, 1095-1120, 1674-1699, 1889-1914

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compose/omnistrate.pro.yaml` around lines 375 - 400, Extract the repeated
24-entry labeledOptions map into a single YAML anchor (e.g., define
labeledInstanceOptions: &labeledInstanceOptions { ... } or place the mapping
under a reusable top-level key with &labeledInstanceOptions) and replace each
duplicated labeledOptions block with an alias reference
(*labeledInstanceOptions) so all occurrences (the labeledOptions maps shown in
the diff) point to the single anchored mapping; ensure indentation and mapping
syntax remain valid for each replacement and update all duplicated instances
reported in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/build-release-image.yaml:
- Around line 203-209: The current workflow step ("Remove e2-machine from pro
and enterprise") uses sed to remove legacy instance keys "- \"e2-medium\"" and
"- \"t4g.medium\"" from matrix.plans.file but pro plans were migrated to
human-readable labels, so the sed will be a no-op for pro; update this step to
also remove the new human-readable label(s) used in the pro release plan (or add
a second sed that strips both legacy keys and the migrated label variant), e.g.
add/remove a sed pattern matching the exact human-readable option string
introduced by the labeledOptions migration (verify the label text in the release
plan files and include that exact string in the sed command alongside the
existing "- \"t4g.medium\"" and "- \"e2-medium\"" patterns).

---

Nitpick comments:
In `@compose/omnistrate.pro.yaml`:
- Around line 375-400: Extract the repeated 24-entry labeledOptions map into a
single YAML anchor (e.g., define labeledInstanceOptions: &labeledInstanceOptions
{ ... } or place the mapping under a reusable top-level key with
&labeledInstanceOptions) and replace each duplicated labeledOptions block with
an alias reference (*labeledInstanceOptions) so all occurrences (the
labeledOptions maps shown in the diff) point to the single anchored mapping;
ensure indentation and mapping syntax remain valid for each replacement and
update all duplicated instances reported in the comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f68930d5-f31f-4ffd-be18-4c2b55c0bfad

📥 Commits

Reviewing files that changed from the base of the PR and between a55eb2d and 6d1dc5a.

📒 Files selected for processing (5)
  • .github/workflows/build-release-image.yaml
  • .github/workflows/build-test-image.yaml
  • compose/omnistrate.enterprise.byoa.yaml
  • compose/omnistrate.enterprise.yaml
  • compose/omnistrate.pro.yaml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
.github/workflows/build-test-image.yaml (2)

113-121: ⚡ Quick win

Add persist-credentials: false and explicit permissions to the build-healthcheck job.

Same security concerns as in the release workflow - the checkout persists git credentials by default and the job lacks explicit permissions.

🔒 Proposed fix
   build-healthcheck:
     needs: shellspec
     runs-on: ubuntu-latest
     container: rust:1-slim-bullseye
+    permissions:
+      contents: read
     steps:
       - name: Checkout
         uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
         with:
           ref: ${{ github.event.pull_request.head.sha || github.sha }}
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-test-image.yaml around lines 113 - 121, The
build-healthcheck job currently uses actions/checkout without disabling
credential persistence and lacks explicit workflow permissions; update the
build-healthcheck job configuration (job name: build-healthcheck) to add
persist-credentials: false under the Checkout step (the actions/checkout entry)
and add an explicit permissions block for the job specifying least-privilege
scopes needed (e.g., contents: read, id-token: write if required) so credentials
are not persisted and permissions are explicit.

131-136: ⚖️ Poor tradeoff

Consider cache poisoning risk for PR workflows.

The static analysis flagged that rust-cache is enabled on a workflow triggered by pull_request, which includes PRs from forks. While GitHub provides some branch-level cache isolation, a sophisticated attacker could potentially poison the cache. For security-sensitive builds, consider restricting caching to trusted branches only.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-test-image.yaml around lines 131 - 136, The "Cache
Cargo" step using Swatinem/rust-cache should be skipped for untrusted forked PRs
to avoid cache-poisoning; update the step (named "Cache Cargo" / action
Swatinem/rust-cache@...) to run only for trusted events—e.g., add an if
condition such as checking github.event_name != 'pull_request' or
github.event.pull_request.head.repo.fork == false (so caching runs on push/main
branch builds but not on forked PRs), or otherwise restrict by branch
names/shared-key usage; ensure the conditional prevents the cache inputs
(workspaces, shared-key, cache-on-failure) from executing for forked PR
workflows.
.github/workflows/build-release-image.yaml (1)

30-38: ⚡ Quick win

Add persist-credentials: false and explicit permissions to the build-healthcheck job.

The static analysis tool flagged two security concerns:

  1. The checkout step persists git credentials by default, which could leak via the uploaded artifact
  2. The job lacks an explicit permissions block, inheriting broader default permissions than needed
🔒 Proposed fix
   build-healthcheck:
     if: github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch'
     runs-on: ubuntu-latest
     container: rust:1-slim-bullseye
+    permissions:
+      contents: read
     steps:
       - name: Checkout
         uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
         with:
           ref: ${{ github.event.workflow_run.head_branch }}
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-release-image.yaml around lines 30 - 38, The
build-healthcheck job currently persists git credentials and inherits broad
permissions; update the build-healthcheck job by adding persist-credentials:
false to the Checkout step that uses actions/checkout@... (step named
"Checkout") and add an explicit permissions block at the job level (for example:
set permissions.contents: read and any other minimal scopes actually required by
this job) so the job no longer exposes writable or unnecessary scopes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/falkordb-sentinel/Dockerfile`:
- Line 15: Remove the redundant apt-get install --only-upgrade libgnutls30t64
line in the Dockerfile (the explicit string "apt-get install -y --only-upgrade
libgnutls30t64") because apt-get upgrade -y already updates packages and the t64
suffix may not exist on older base images; if the intent was to ensure a
specific GnuTLS package is present, replace that line with a canonical install
like apt-get install -y libgnutls30 (or the correct package name for the base)
instead of using --only-upgrade.

---

Nitpick comments:
In @.github/workflows/build-release-image.yaml:
- Around line 30-38: The build-healthcheck job currently persists git
credentials and inherits broad permissions; update the build-healthcheck job by
adding persist-credentials: false to the Checkout step that uses
actions/checkout@... (step named "Checkout") and add an explicit permissions
block at the job level (for example: set permissions.contents: read and any
other minimal scopes actually required by this job) so the job no longer exposes
writable or unnecessary scopes.

In @.github/workflows/build-test-image.yaml:
- Around line 113-121: The build-healthcheck job currently uses actions/checkout
without disabling credential persistence and lacks explicit workflow
permissions; update the build-healthcheck job configuration (job name:
build-healthcheck) to add persist-credentials: false under the Checkout step
(the actions/checkout entry) and add an explicit permissions block for the job
specifying least-privilege scopes needed (e.g., contents: read, id-token: write
if required) so credentials are not persisted and permissions are explicit.
- Around line 131-136: The "Cache Cargo" step using Swatinem/rust-cache should
be skipped for untrusted forked PRs to avoid cache-poisoning; update the step
(named "Cache Cargo" / action Swatinem/rust-cache@...) to run only for trusted
events—e.g., add an if condition such as checking github.event_name !=
'pull_request' or github.event.pull_request.head.repo.fork == false (so caching
runs on push/main branch builds but not on forked PRs), or otherwise restrict by
branch names/shared-key usage; ensure the conditional prevents the cache inputs
(workspaces, shared-key, cache-on-failure) from executing for forked PR
workflows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 936634ba-7375-4088-b6bb-b6ebfa228d5e

📥 Commits

Reviewing files that changed from the base of the PR and between 6d1dc5a and 8ced790.

📒 Files selected for processing (7)
  • .github/workflows/build-release-image.yaml
  • .github/workflows/build-test-image.yaml
  • src/falkordb-cluster/Dockerfile
  • src/falkordb-node/Dockerfile
  • src/falkordb-sentinel/Dockerfile
  • src/healthcheck_rs/.gitignore
  • src/healthcheck_rs/Dockerfile
✅ Files skipped from review due to trivial changes (1)
  • src/healthcheck_rs/.gitignore

Comment thread src/falkordb-sentinel/Dockerfile
@MuhammadQadora MuhammadQadora force-pushed the 707-migrate-to-graviton-instances branch from 746bace to a55eb2d Compare May 24, 2026 11:15
@MuhammadQadora MuhammadQadora force-pushed the 707-migrate-to-graviton-instances branch from c0fec43 to 8ced790 Compare May 24, 2026 12:14
@MuhammadQadora MuhammadQadora force-pushed the 707-migrate-to-graviton-instances branch from 8ced790 to a55eb2d Compare May 24, 2026 12:29
@MuhammadQadora MuhammadQadora force-pushed the 707-migrate-to-graviton-instances branch from a81c23f to cca3ad4 Compare May 24, 2026 14:17
Comment on lines +31 to +85
build-rust-binaries:
if: github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
container: rust:1-slim-bullseye
steps:
- name: Checkout
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
ref: ${{ github.event.workflow_run.head_branch }}

- name: Install build dependencies
run: |
apt-get update -y
apt-get install -y pkg-config build-essential cmake libtool openssl libssl-dev gcc-aarch64-linux-gnu

- name: Add ARM64 target
run: rustup target add aarch64-unknown-linux-gnu

- name: Cache Cargo
uses: Swatinem/rust-cache@c19371144df3bb44fab255c43d04cbc2ab54d1c4 # v2.9.1
with:
workspaces: |
src/healthcheck_rs -> target
src/oom-guard -> target
shared-key: "rust-binaries-release"
cache-on-failure: "true"

- name: Build healthcheck for x86_64
working-directory: src/healthcheck_rs
run: cargo build --release --locked --target x86_64-unknown-linux-gnu

- name: Build healthcheck for aarch64
working-directory: src/healthcheck_rs
env:
CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER: aarch64-linux-gnu-gcc
CC_aarch64_unknown_linux_gnu: aarch64-linux-gnu-gcc
CXX_aarch64_unknown_linux_gnu: aarch64-linux-gnu-g++
run: cargo build --release --locked --target aarch64-unknown-linux-gnu

- name: Build oom-guard for x86_64
working-directory: src/oom-guard
run: cargo build --release --locked --target x86_64-unknown-linux-gnu

- name: Build oom-guard for aarch64
working-directory: src/oom-guard
env:
CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER: aarch64-linux-gnu-gcc
CC_aarch64_unknown_linux_gnu: aarch64-linux-gnu-gcc
CXX_aarch64_unknown_linux_gnu: aarch64-linux-gnu-g++
run: cargo build --release --locked --target aarch64-unknown-linux-gnu

- name: Prepare binaries
run: |
mkdir -p src/healthcheck_rs/bin
cp src/healthcheck_rs/target/x86_64-unknown-linux-gnu/release/healthcheck src/healthcheck_rs/bin/healthcheck-amd64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prepare migration to graviton instances by using labeledOptions and arm builds

2 participants