Skip to content

Conversation

@jessejlt
Copy link
Member

@jessejlt jessejlt commented Dec 5, 2025

Summary

This PR modifies the operator to maintain named entity-groups in the form of {DatabaseName}Spoke and then utilize them when constructing our alias functions instead of inlining the full cluster list.

There are also several bug that the LLM identified and fixed, each in its own commit to aid reviewing.

Changes

Bug Fixes

  • Fix HeartbeatTTL nil pointer dereference: Added nil check for HeartbeatTTL in FederateClusters guard clause to prevent panic when the pointer is dereferenced in queryHeartbeatTable

Code Quality

  • Remove unused functions: Removed dead code including getBestAvailableSKU, collectInventoryByDatabase, databaseExists, and recommendedSKUs variable
  • Extract magic numbers into named constants: Added requeueShort (1 min), requeueMedium (5 min), requeueLong (10 min), and maxKustoScriptSz (1MB) constants for better maintainability
  • Add error logging for ignored status updates: Added logSetClusterCondition helper and updated all ignored status update calls to log errors at WARN level
  • Add documentation for helper functions: Added Go doc comments for diffSkus, diffIdentities, toSku, toTier, and toDatabase

Testing

  • All existing tests pass
  • Changes are primarily refactoring with no functional changes except the nil pointer fix

Implement Stage 1 of named entity groups migration for federated ADX clusters.
This creates named entity groups based on spoke database inventory, replicated
across all hub databases.

Changes:
- Add mapSpokeDatabases() to map spoke databases to their cluster endpoints
- Add generateEntityGroupDefinitions() to create entity group KQL statements
- Integrate entity group generation into FederateClusters before function creation
- Use drop-then-create pattern since ADX doesn't support .create-or-alter for entity groups

Entity groups follow the naming pattern {SpokeDatabaseName}Spoke (e.g., MetricsSpoke)
and contain all cluster endpoints that have that database.

Includes comprehensive unit tests for:
- Spoke database mapping
- Entity group generation and naming
- Multi-endpoint scenarios
- Deterministic ordering

Documentation updated in concepts.md and designs/operator.md.
Resolved conflicts in operator/adx.go and operator/adx_test.go:
- Integrated entity group definitions (from branch) with hub tables ensuring (from main)
- Combined view support tests from main with entity group tests from branch
- Updated step numbering in FederateClusters to accommodate both features
Add nil check for HeartbeatTTL in the guard clause at the beginning of
FederateClusters to prevent a panic when dereferencing the pointer later
in queryHeartbeatTable.
Remove the following unused functions and their tests:
- getBestAvailableSKU: queries Azure for SKUs but never called
- collectInventoryByDatabase: aggregates schemas but never called
- databaseExists: checks database existence but never called
- recommendedSKUs variable: only used by getBestAvailableSKU

These functions were defined but not used anywhere in production code.
Add named constants for requeue durations and script size limits:
- requeueShort (1 minute): for in-progress operations
- requeueMedium (5 minutes): for provider registration
- requeueLong (10 minutes): for heartbeat/federation cycles
- maxKustoScriptSz (1MB): max size for Kusto script batches

This improves maintainability by centralizing these values and making
their purpose clear through documentation.
Previously, status update errors in non-critical code paths were silently
discarded. This adds logging for these cases to improve observability
while still not blocking reconciliation on status update failures.

- Add logSetClusterCondition helper method that wraps setClusterCondition
  and logs any errors at WARN level
- Replace all '_ = r.setClusterCondition' calls with logSetClusterCondition
- Add explicit error logging for direct r.Status().Update calls in the
  CriteriaExpression handling code
Add Go doc comments for the following functions to improve code
readability and maintainability:

- diffSkus: Explains the SKU comparison logic and when updates are triggered
- diffIdentities: Describes identity reconciliation and preservation of
  unmanaged identities
- toSku: Documents string to AzureSKUName conversion with validation
- toTier: Documents string to AzureSKUTier conversion with validation
- toDatabase: Describes the Azure resource ID construction for databases
@jessejlt jessejlt changed the title ADX operator code review fixes ADX operator maintain named entity-groups Dec 5, 2025
@jessejlt jessejlt requested a review from Copilot December 5, 2025 15:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the ADX operator's federation capabilities by introducing named entity groups for better manageability of cross-cluster queries, while also addressing several bugs and code quality improvements.

Key Changes

  • Named Entity Groups: Replaces inline cluster lists in KQL functions with persistent named entity groups (e.g., MetricsSpoke, LogsSpoke) that are replicated across all hub databases
  • Critical Bug Fix: Adds nil pointer check for HeartbeatTTL to prevent panic during federation reconciliation
  • Code Quality: Removes unused functions, extracts magic numbers into named constants, and adds comprehensive error logging for status updates

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
operator/adx.go Implements named entity group generation and replication, adds HeartbeatTTL nil check, extracts constants (requeueShort/Medium/Long, maxKustoScriptSz), adds logSetClusterCondition helper, removes dead code (getBestAvailableSKU, collectInventoryByDatabase, databaseExists), adds function documentation
operator/adx_test.go Removes tests for deleted functions, adds comprehensive tests for mapSpokeDatabases, generateEntityGroupDefinitions (covering replication, naming, multiple endpoints, deterministic ordering), updates TestGenerateKustoFunctionDefinitions to verify entity group references
docs/designs/operator.md Documents named entity groups feature including naming pattern, drop-then-create workflow, and manageability benefits
docs/concepts.md Updates federation sections to mention named entity groups alongside existing capabilities

When referencing a stored entity group in the macro-expand operator,
the 'entity_group' keyword should NOT be used. The correct syntax is:

  macro-expand MyEntityGroup as X ( X.TableName )

Not:

  macro-expand entity_group MyEntityGroup as X ( X.TableName )

The 'entity_group' keyword is only used in management commands
(.create/.drop/.show entity_group) and inline entity group definitions
with brackets.
Changed 'executing %d entity groups' to 'executing %d entity group statements'
since each entity group generates 2 statements (drop + create).
@jessejlt jessejlt marked this pull request as ready for review December 5, 2025 15:37
Resolved conflict in operator/adx.go by keeping the PR's deletion of
unused functions (collectInventoryByDatabase, databaseExists). Main had
a fix for databaseExists (Name -> DatabaseName), but since the PR
intentionally removes these unused functions, the fix is not needed.
@jessejlt jessejlt enabled auto-merge (squash) December 9, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants