Skip to content

SWATCH-4617: WIP#5808

Draft
lindseyburnett wants to merge 21 commits intomainfrom
lburnett/split-nightly-hourly
Draft

SWATCH-4617: WIP#5808
lindseyburnett wants to merge 21 commits intomainfrom
lburnett/split-nightly-hourly

Conversation

@lindseyburnett
Copy link
Collaborator

@lindseyburnett lindseyburnett commented Mar 5, 2026

Jira issue: SWATCH-4617

Description

Problem
When the nightly tally runs, it enqueues a large batch of tasks to the same Kafka topic that hourly tally uses. A single consumer pool processes that topic, so hourly tasks sit behind the nightly batch for 3–6 hours. That delays hourly usage metrics and triggers “remittance halted” alerts (e.g. no increase in swatch_producer_metered_total within the 3h window).

Approach
Isolate hourly snapshot work from nightly by using a separate Kafka topic and dedicated consumer for hourly tally tasks. Nightly tasks stay on the existing tasks topic; hourly tasks go to a new topic and are consumed by a second listener. Same application and task logic, different topic/consumer so hourly is never blocked by the nightly backlog.

Changes (conceptual)

  • New topic
    platform.rhsm-subscriptions.tally-hourly-tasks (same partition/replica layout as the main tasks topic). Declared in swatch-tally’s ClowdApp.

  • Producer (swatch-tally)
    CaptureSnapshotsTaskManager now enqueues UPDATE_HOURLY_SNAPSHOTS to the hourly topic (from new config) and keeps enqueueing UPDATE_SNAPSHOTS (nightly) to the existing tasks topic. One producer, topic chosen by task type.

  • Consumer (swatch-tally)
    A second Kafka listener (tallyHourlyTaskProcessor) subscribes only to the hourly topic with its own consumer group (swatch-tally-hourly-processor). Same TallyTaskFactory and task execution; no double-processing.

  • Config
    New rhsm-subscriptions.tally-hourly-tasks block (topic + consumer group). Optional env (e.g. TALLY_HOURLY_TASKS_TOPIC) for Clowder; defaults work for local/bonfire.

  • Tests
    Unit test updated so hourly descriptors use the hourly topic. New component test (TallyTaskQueueIsolationComponentTest) verifies that triggering “hourly for all orgs” produces UPDATE_HOURLY_SNAPSHOTS on the hourly topic and that single-org nightly produces UPDATE_SNAPSHOTS on the main tasks topic.

Result
Hourly and nightly tally run on separate streams. Hourly tasks are processed by the dedicated consumer and no longer wait behind the nightly batch, which should reduce delayed hourly metrics and related alerts. No changes to swatch-billable-usage or downstream; isolation is entirely within swatch-tally’s task routing and consumers.

@lindseyburnett lindseyburnett changed the title SWATCH-4616: WIP SWATCH-4617: WIP Mar 5, 2026
@lindseyburnett lindseyburnett added the work in progress WIP, don't review yet. label Mar 5, 2026
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

⛏️ Workflow Run

🧹 Checkstyle

🧪 JUnit

TestsPassed ☑️Skipped ⚠️Failed ❌️
JUnit Test Report971 ran967 passed3 skipped1 failed
Details
TestResult
JUnit Test Report
CaptureSnapshotsTaskManagerTest.tallyOrgByHourlyWithThreadPoolExecutorUsesInMemoryQueue❌ failure

lindseyburnett and others added 7 commits March 5, 2026 14:41
- Add tallyOrgAsync() to TallySwatchService for async tally path (enqueues
  UPDATE_SNAPSHOTS to main tasks topic vs sync in-process)
- Add nightlySnapshotTasksAreProducedToMainTasksTopic test to verify nightly
  tasks go to main tasks topic; use TASKS_TOPIC env for Clowder topic suffix
- Add Given/When/Then structure and should* naming per component test standards

Made-with: Cursor
Keep triggerHourlySnapshotsForAllOrgs and tallyOrgAsync (task queue isolation),
add getInstancesByProduct Javadoc from incoming branch. Remove duplicates.

Made-with: Cursor
- Add Task Queue Isolation section to TEST_PLAN.md (tally-task-queue-isolation-TC001,
  tally-task-queue-isolation-TC002)
- Add task queue isolation to Scope
- Annotate TallyTaskQueueIsolationComponentTest methods with @TestPlanName
- Apply spotless formatting

Made-with: Cursor
- CaptureSnapshotsTaskManagerTest: tallyOrgByHourlyWithThreadPoolExecutorUsesInMemoryQueue,
  nightlyAndHourlyUseDifferentTopics
- TEST_PLAN: tally-task-queue-isolation-TC003 through TC006 (idempotency, no duplicates,
  no cross-topic leakage)
- Apply spotless formatting

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants