Skip to content

some refactoring #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

some refactoring #114

wants to merge 5 commits into from

Conversation

shubhamrai1993
Copy link
Contributor

@shubhamrai1993 shubhamrai1993 commented May 28, 2025

Description

Please include a summary of the change and which issue is fixed. Include relevant motivation and context.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

  • Test A
  • Test B

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • A PR is open to update the helm chart (link)[https://github.com/truefoundry/elasti/tree/main/charts/elasti] if applicable
  • I have updated the CHANGELOG.md file with the changes I made

Summary by CodeRabbit

  • Bug Fixes
    • Prometheus queries now consistently use the default server address, improving reliability of scaling metrics.
    • ElastiService resource listing is now restricted to a specific namespace, enhancing resource targeting.
  • Refactor
    • Internal scaling logic has been streamlined and integrated into the main controller, resulting in more efficient and maintainable operations.
    • Error handling for server and manager processes is now unified for improved robustness.
    • Unified resource identification parameters to use namespaced names, simplifying event handling and improving code consistency.
    • Server startup sequence adjusted for better synchronization with controller initialization.
  • New Features
    • Added support for recognizing Argo Rollout resources by kind, enabling broader workload compatibility.

Copy link

coderabbitai bot commented May 28, 2025

Walkthrough

The changes refactor scaling logic from a dedicated ScaleHandler struct and package into the ElastiServiceReconciler, consolidating scaling operations and related methods. Controller-runtime's client replaces previous usage of client-go and dynamic clients. The elasti server now operates with the reconciler, and error handling in the main function is streamlined. Additional minor adjustments include hardcoding the Prometheus server address and adding a new rollout GVK variable.

Changes

Files/Groups Change Summary
operator/cmd/main.go Removed scaling package import and scaleHandler; elasti server now uses reconciler; error handling refactored.
operator/internal/controller/elastiservice_controller.go Removed ScaleHandler and SwitchModeFunc; added ScaleLocks; updated methods to use reconciler fields; restricted CRD listing namespace.
operator/internal/controller/scale_handler.go Refactored from scaling package to controller; moved methods to reconciler; switched to controller-runtime client; renamed and moved scaleLocks.
operator/internal/elastiserver/elastiServer.go Replaced scaleHandler with reconciler in server struct and methods; updated imports and function signatures.
operator/internal/controller/opsCRD.go, opsDeployment.go, opsInformer.go, opsModes.go, opsRollout.go Replaced ctrl.Request parameters with types.NamespacedName in multiple methods for resource identification; updated related logging and calls.
operator/internal/informer/informer.go Replaced ctrl.Request field in RequestWatch struct with types.NamespacedName; updated usages accordingly.
pkg/scaling/scalers/prometheus_scaler.go Hardcoded Prometheus server address to http://localhost:9090 in query URL construction.
pkg/values/values.go Added RolloutGVK variable for Argo Rollout group-version-kind.
operator/internal/controller/opsServices.go Added a TODO comment questioning a specific log condition in private service creation check.

Sequence Diagram(s)

sequenceDiagram
    participant Main
    participant Manager
    participant ElastiServer
    participant ElastiServiceReconciler

    Main->>ElastiServiceReconciler: Initialize reconciler
    Main->>Manager: Start controller manager (async)
    Main->>ElastiServer: Start elasti server with reconciler (async)
    ElastiServer->>ElastiServiceReconciler: Invoke scaling methods (e.g., ScaleTargetFromZero)
    Main->>Main: Wait for error from Manager or ElastiServer
Loading

Poem

In the warren of code, a change hops through,
ScaleHandler's gone, and the reconciler grew!
With mutexes mapped and clients anew,
The elasti server runs with a streamlined view.
Prometheus listens on localhost’s ear,
And RolloutGVK brings new cheer.
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (3)
pkg/scaling/scale_handler.go (3)

181-199: ⚠️ Potential issue

Potential resource leak: scaler.Close not called on early break paths

When createScalerForTrigger fails or ShouldScaleFromZero returns true, the loop breaks without closing the scaler, leaking open HTTP connections or goroutines.

A simple fix is to defer the close immediately after creation:

scaler, err := h.createScalerForTrigger(&trigger)
if err != nil {
    ...
}
defer scaler.Close(ctx)

Or call Close explicitly before each break.


320-331: ⚠️ Potential issue

Nil-pointer risk in debug log when spec.replicas is undefined

deploy.Spec.Replicas can be nil (it defaults to the controller’s replicas), yet the debug line dereferences it unconditionally:

h.logger.Debug("Deployment found", zap.String("deployment", targetName),
-    zap.Int32("current replicas", *deploy.Spec.Replicas), ...
+    zap.Int32("current replicas", func() int32 {
+        if deploy.Spec.Replicas == nil {
+            return 0
+        }
+        return *deploy.Spec.Replicas
+    }()), ...
)

This prevents a crash in clusters where replicas was never set.


362-380: ⚠️ Potential issue

UpdateKedaScaledObjectPausedState patches the wrong resource

The method intends to update a KEDA ScaledObject yet issues a patch against an empty ElastiService{} object, which silently does nothing (or fails with “object has no namespace/name”). You need an object with the correct GVK (or an unstructured) and the target NamespacedName:

-so := &v1alpha1.ElastiService{}
-if err := h.client.Patch(ctx, so, client.RawPatch(...)); err != nil {
+so := &unstructured.Unstructured{}
+so.SetGroupVersionKind(schema.GroupVersionKind{
+    Group:   "keda.sh",
+    Version: "v1alpha1",
+    Kind:    "ScaledObject",
+})
+so.SetNamespace(namespace)
+so.SetName(scaledObjectName)
+if err := h.client.Patch(ctx, so, client.RawPatch(...)); err != nil {
     ...
 }
🧰 Tools
🪛 golangci-lint (1.64.8)

[warning] 362-362: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d439011 and fac82d6.

📒 Files selected for processing (5)
  • operator/cmd/main.go (2 hunks)
  • operator/internal/controller/elastiservice_controller.go (2 hunks)
  • pkg/scaling/scale_handler.go (10 hunks)
  • pkg/scaling/scalers/prometheus_scaler.go (1 hunks)
  • pkg/values/values.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
operator/cmd/main.go (2)
pkg/scaling/scale_handler.go (1)
  • NewScaleHandler (44-49)
operator/internal/elastiserver/elastiServer.go (1)
  • NewServer (43-51)
pkg/scaling/scale_handler.go (2)
operator/api/v1alpha1/elastiservice_types.go (2)
  • ElastiServiceList (76-80)
  • ElastiService (65-71)
pkg/values/values.go (2)
  • ServeMode (17-17)
  • RolloutGVK (33-37)
🪛 golangci-lint (1.64.8)
pkg/scaling/scale_handler.go

[warning] 44-44: unused-parameter: parameter 'config' seems to be unused, consider removing or renaming it as _

(revive)

🔇 Additional comments (3)
pkg/values/values.go (1)

33-37:

✅ Verification successful

GVK addition looks good, but ensure single source of truth.

RolloutGVK duplicates information that already exists in RolloutGVR. If both are required, keep them co-located as here, but ensure they stay in sync when versions upgrade.


I’ll locate where RolloutGVR is defined to confirm whether it’s co-located with RolloutGVK and ensure we maintain a single source of truth.


🏁 Script executed:

#!/bin/bash
# Search for RolloutGVR definition
rg -n 'RolloutGVR'

Length of output: 85


Ensure single source of truth for Rollout API mappings

The RolloutGVK definition is correctly co-located with RolloutGVR. Please maintain these side-by-side in pkg/values/values.go when upgrading versions so they stay in sync:

  • pkg/values/values.go: lines 27–37 (definitions of RolloutGVR and RolloutGVK)
operator/internal/controller/elastiservice_controller.go (1)

108-114:

❓ Verification inconclusive

Commented-out watch may break automatic resyncs.

The watch on ScaleTargetRef is now disabled. If this was intentional for debugging, remove the block entirely or guard it behind a feature flag; leaving dead code complicates maintenance and risks divergence.

Confirm no other component relies on this watch for reconciling public/private service drift.


#!/usr/bin/env bash
set -e

FILE="operator/internal/controller/elastiservice_controller.go"

echo "=== Snippet around commented watch (lines 90–130) ==="
sed -n '90,130p' "$FILE"

echo
echo "=== watchScaleTargetRef definitions/usage ==="
rg -n "watchScaleTargetRef" "$FILE" || true

echo
echo "=== SetupWithManager method and watchers ==="
rg -n "func .*SetupWithManager" -n "$FILE" || true
rg -n "\.Watches" -n "$FILE" || true
rg -n "Owns" -n "$FILE" || true

Review or Remove Dead Watch Code

The watchScaleTargetRef block in operator/internal/controller/elastiservice_controller.go (around lines 108–114) is fully commented out. Leaving this dead code risks confusion and bit-rotting behavior. Either:

  • Remove the commented-out block entirely if it’s no longer needed,
  • Or wrap it behind a clear feature flag or TODO with tracking issue if it’s needed later.

Also, ensure no other part of the controller (e.g. in SetupWithManager or additional Watches(...) calls) depends on watchScaleTargetRef for reconciling public/private service drift.

operator/cmd/main.go (1)

181-181: Passing controller-runtime client into ScaleHandler looks correct.

This aligns ScaleHandler with the new client abstraction. Nice job keeping construction in one place.

@@ -65,7 +65,7 @@ func parsePrometheusMetadata(jsonMetadata json.RawMessage) (*prometheusMetadata,
func (s *prometheusScaler) executePromQuery(ctx context.Context) (float64, error) {
t := time.Now().UTC().Format(time.RFC3339)
queryEscaped := url.QueryEscape(s.metadata.Query)
queryURL := fmt.Sprintf("%s/api/v1/query?query=%s&time=%s", s.metadata.ServerAddress, queryEscaped, t)
queryURL := fmt.Sprintf("%s/api/v1/query?query=%s&time=%s", "http://localhost:9090", queryEscaped, t)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Avoid hard-coding Prometheus address; respect metadata.ServerAddress.

Hard-coding "http://localhost:9090" breaks deployments where Prometheus is remote or uses TLS, and silently renders the serverAddress field in prometheusMetadata dead code.

-	queryURL := fmt.Sprintf("%s/api/v1/query?query=%s&time=%s", "http://localhost:9090", queryEscaped, t)
+	serverAddr := s.metadata.ServerAddress
+	if serverAddr == "" {
+		// Fallback preserves current behaviour while staying configurable.
+		serverAddr = "http://localhost:9090"
+	}
+	queryURL := fmt.Sprintf("%s/api/v1/query?query=%s&time=%s", serverAddr, queryEscaped, t)

Consider also validating serverAddr with url.Parse to catch malformed inputs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
queryURL := fmt.Sprintf("%s/api/v1/query?query=%s&time=%s", "http://localhost:9090", queryEscaped, t)
serverAddr := s.metadata.ServerAddress
if serverAddr == "" {
// Fallback preserves current behaviour while staying configurable.
serverAddr = "http://localhost:9090"
}
queryURL := fmt.Sprintf("%s/api/v1/query?query=%s&time=%s", serverAddr, queryEscaped, t)
🤖 Prompt for AI Agents
In pkg/scaling/scalers/prometheus_scaler.go at line 68, replace the hard-coded
Prometheus address "http://localhost:9090" with the value from the
prometheusMetadata's ServerAddress field to support remote or TLS deployments.
Additionally, validate the ServerAddress using url.Parse before use to ensure it
is well-formed and handle any parsing errors appropriately.

@@ -152,7 +152,7 @@ func (r *ElastiServiceReconciler) Initialize(ctx context.Context) error {

func (r *ElastiServiceReconciler) reconcileExistingCRDs(ctx context.Context) error {
crdList := &v1alpha1.ElastiServiceList{}
if err := r.List(ctx, crdList); err != nil {
if err := r.List(ctx, crdList, client.InNamespace("shub-ws")); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Namespace hard-coded to "shub-ws"; extract to configuration.

Restricting reconciliation with client.InNamespace("shub-ws") prevents the operator from managing ElastiServices in any other namespace and hinders reuse in different environments.

Expose the namespace via:

  1. an environment variable (e.g. ELASTI_NAMESPACE), or
  2. a CLI flag.

Fallback to watching all namespaces when unspecified.

-	if err := r.List(ctx, crdList, client.InNamespace("shub-ws")); err != nil {
+	ns := os.Getenv("ELASTI_NAMESPACE")
+	listOpts := []client.ListOption{}
+	if ns != "" {
+		listOpts = append(listOpts, client.InNamespace(ns))
+	}
+	if err := r.List(ctx, crdList, listOpts...); err != nil {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := r.List(ctx, crdList, client.InNamespace("shub-ws")); err != nil {
// instead of hard-coding "shub-ws", read from ELASTI_NAMESPACE env var (fall back to all namespaces)
ns := os.Getenv("ELASTI_NAMESPACE")
listOpts := []client.ListOption{}
if ns != "" {
listOpts = append(listOpts, client.InNamespace(ns))
}
if err := r.List(ctx, crdList, listOpts...); err != nil {
// …
}
🤖 Prompt for AI Agents
In operator/internal/controller/elastiservice_controller.go at line 155, the
namespace "shub-ws" is hard-coded in the client.InNamespace call, limiting
flexibility. Refactor the code to read the namespace from a configuration source
such as an environment variable (e.g., ELASTI_NAMESPACE) or a CLI flag. If the
namespace is not specified, remove the InNamespace filter to watch all
namespaces. Update the code to conditionally apply the namespace filter based on
this configuration.

Comment on lines 235 to 251
// Start the elasti server
eServer := elastiserver.NewServer(zapLogger, scaleHandler, 30*time.Second)
elastiServerErrChan := make(chan error, 1)
go func() {
if err := eServer.Start(elastiServerPort); err != nil {
setupLog.Error(err, "elasti server failed to start")
sentry.CaptureException(err)
elastiServerErrChan <- fmt.Errorf("elasti server: %w", err)
}
}()

return nil
select {
case err := <-elastiServerErrChan:
return fmt.Errorf("elasti server error: %w", err)
case err := <-mgrErrChan:
return fmt.Errorf("manager error: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Graceful shutdown & channel leak risks in concurrent server/manager run-loop.

  1. eServer.Start may return nil on normal shutdown, leaving elastiServerErrChan empty and the select blocking forever.
  2. Context cancellation (e.g. SIGTERM) isn’t propagated to the elasti server; you rely on the manager’s signal handling only.

Consider using an errgroup or propagate the same context:

+ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
+defer stop()
...
-	go func() {
-		if err := eServer.Start(elastiServerPort); err != nil {
+	go func() {
+		if err := eServer.StartWithContext(ctx, elastiServerPort); err != nil && !errors.Is(err, http.ErrServerClosed) {
             ...
         }
 }()

-select {
+select {
 case <-ctx.Done():
     return nil

This ensures both components exit cleanly and avoids goroutine leaks.


🏁 Script executed:

#!/bin/bash
set -e

echo "Finding NewServer instantiation:"
rg "elastiserver.NewServer" -n

echo -e "\nLocating Server type definition:"
rg "type Server" -n operator/pkg/elastiserver

echo -e "\nListing Start methods on Server:"
rg "func (.*Server) Start" -n operator/pkg/elastiserver

echo -e "\nChecking for StartWithContext method:"
rg "StartWithContext" -n operator/pkg/elastiserver || echo "No StartWithContext found"

echo -e "\nSearching for Shutdown/Stop methods:"
rg "Shutdown" -n operator/pkg/elastiserver || echo "No Shutdown method found"

Length of output: 410


🏁 Script executed:

#!/bin/bash
set -e

echo "Finding directories named elastiserver:"
find . -type d | grep -i elastiserver || echo "No elastiserver directory found"

echo -e "\nSearching for 'package elastiserver' declarations:"
rg "package elastiserver" -n .

echo -e "\nSearching for the NewServer constructor:"
rg "func NewServer" -n .

echo -e "\nSearching for the Server type:"
rg "type Server" -n .

echo -e "\nListing Start methods on Server:"
rg "func (.*Server).*Start" -n .

echo -e "\nChecking for StartWithContext usage:"
rg "StartWithContext" -n .

echo -e "\nLooking for Shutdown/Stop methods on Server:"
rg "Shutdown" -n . || echo "No Shutdown methods found"
rg "Stop(" -n . || echo "No Stop methods found"

Length of output: 787


🏁 Script executed:

#!/bin/bash
set -e

echo "Showing initial lines of elastiServer.go:"
sed -n '1,200p' operator/internal/elastiserver/elastiServer.go

echo -e "\nLooking for StartWithContext in elastiServer.go:"
rg "StartWithContext" -n operator/internal/elastiserver/elastiServer.go || echo "StartWithContext not found"

echo -e "\nLooking for Shutdown methods:"
rg "Shutdown" -n operator/internal/elastiserver/elastiServer.go || echo "Shutdown not found"

echo -e "\nLooking for Stop methods:"
rg "Stop(" -n operator/internal/elastiserver/elastiServer.go || echo "Stop not found"

Length of output: 6676


Coordinate server & manager shutdown via shared context and errgroup

The current select on elastiServerErrChan and mgrErrChan will block forever when both Start calls return nil on graceful exit (no errors sent to their channels). Also, elastiserver.Server.Start installs its own signal handler, preventing a unified shutdown signal from main.

To address this, refactor as follows:

• operator/internal/elastiserver/elastiServer.go
– Remove internal signal.Notify / done logic from Start.
– Add a new method:
```go
func (s Server) StartWithContext(ctx context.Context, port string) error {
// set up mux + http.Server as before...
srv := &http.Server{ /
... */ }

    // Listen for shutdown from ctx
    go func() {
        <-ctx.Done()
        shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()
        if err := srv.Shutdown(shutdownCtx); err != nil {
            s.logger.Error("failed to shut down elasti server", zap.Error(err))
        }
    }()

    s.logger.Info("Starting ElastiServer", zap.String("port", port))
    if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
        return fmt.Errorf("failed to start ElastiServer: %w", err)
    }
    s.logger.Info("Server stopped")
    return nil
}
```

• operator/cmd/main.go
– At the top of main, derive a signal‐aware context:
go ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM) defer stop()
– Replace the existing go func{ eServer.Start… } + select with an errgroup:
diff + g, ctx := errgroup.WithContext(ctx) + + g.Go(func() error { + return mgr.Start(ctx) + }) + g.Go(func() error { + return eServer.StartWithContext(ctx, elastiServerPort) + }) + + if err := g.Wait(); err != nil { + setupLog.Error(err, "shutdown due to error") + return err + } + return nil

This ensures a single shutdown signal drives both components, avoids blocking on empty channels, and prevents goroutine/channel leaks.

🤖 Prompt for AI Agents
In operator/cmd/main.go around lines 235 to 251, the current code starts the
elasti server in a goroutine and waits on error channels, which can block
indefinitely if the server shuts down gracefully without sending errors. To fix
this, refactor to use a shared context derived from signal.NotifyContext at the
start of main, remove internal signal handling from elastiserver.Server.Start,
and implement a new StartWithContext method in elastiserver.Server that listens
for context cancellation to shut down cleanly. Then replace the goroutine and
select with an errgroup that runs both the elasti server and manager with the
shared context, ensuring coordinated shutdown and no goroutine leaks.

Comment on lines 384 to 395
func (h *ScaleHandler) updateLastScaledUpTime(ctx context.Context, crdName, namespace string) error {
now := metav1.Now()
patchBytes := []byte(fmt.Sprintf(`{"status": {"lastScaledUpTime": "%s"}}`, now.Format(time.RFC3339Nano)))

_, err := h.kDynamicClient.Resource(values.ElastiServiceGVR).
Namespace(namespace).
Patch(ctx, crdName, types.MergePatchType, patchBytes, metav1.PatchOptions{}, "status")
if err != nil {
elastiService := &v1alpha1.ElastiService{}
if err := h.client.Get(ctx, types.NamespacedName{Namespace: namespace, Name: crdName}, elastiService); err != nil {
return fmt.Errorf("failed to get ElastiService: %w", err)
}

if err := h.client.Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
return fmt.Errorf("failed to patch ElastiService status: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Status update bypasses the /status sub-resource

Patching status.lastScaledUpTime via a regular MergePatch on the main object often fails with
"the body of the request was in an unknown format" or "the server does not allow this method on the requested resource".
Use the status client instead:

if err := h.client.Status().Patch(ctx, elastiService,
    client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
    return fmt.Errorf("failed to patch ElastiService status: %w", err)
}

This also avoids accidental modification of spec/metadata fields.

🤖 Prompt for AI Agents
In pkg/scaling/scale_handler.go around lines 384 to 395, the code patches the
status field using a regular MergePatch on the main object, which can cause
errors and unintended changes. To fix this, change the patch call to use the
status sub-resource client by calling h.client.Status().Patch instead of
h.client.Patch. This ensures the patch only affects the status field and avoids
errors related to unsupported methods on the resource.

Comment on lines 44 to 48
func NewScaleHandler(logger *zap.Logger, config *rest.Config, client client.Client) *ScaleHandler {
return &ScaleHandler{
logger: logger.Named("ScaleHandler"),
kClient: kClient,
kDynamicClient: kDynamicClient,
logger: logger.Named("ScaleHandler"),
client: client,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Remove the unused config parameter from the constructor

config is no longer referenced after the migration to controller-runtime’s client.Client; keeping it around produces a linter warning and may mislead future readers about hidden side-effects.

-func NewScaleHandler(logger *zap.Logger, config *rest.Config, client client.Client) *ScaleHandler {
+func NewScaleHandler(logger *zap.Logger, client client.Client) *ScaleHandler {
     return &ScaleHandler{
         logger: logger.Named("ScaleHandler"),
         client: client,
     }
 }

Remember to update every call-site (e.g. operator/cmd/main.go) accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewScaleHandler(logger *zap.Logger, config *rest.Config, client client.Client) *ScaleHandler {
return &ScaleHandler{
logger: logger.Named("ScaleHandler"),
kClient: kClient,
kDynamicClient: kDynamicClient,
logger: logger.Named("ScaleHandler"),
client: client,
}
func NewScaleHandler(logger *zap.Logger, client client.Client) *ScaleHandler {
return &ScaleHandler{
logger: logger.Named("ScaleHandler"),
client: client,
}
}
🧰 Tools
🪛 golangci-lint (1.64.8)

[warning] 44-44: unused-parameter: parameter 'config' seems to be unused, consider removing or renaming it as _

(revive)

🤖 Prompt for AI Agents
In pkg/scaling/scale_handler.go lines 44 to 48, the NewScaleHandler function has
an unused config parameter that should be removed to avoid linter warnings and
confusion. Remove the config parameter from the function signature and from the
function calls inside this file. Then, update all call-sites of NewScaleHandler,
such as in operator/cmd/main.go, to remove the config argument accordingly.

Comment on lines 79 to 82
elastiServiceList := &v1alpha1.ElastiServiceList{}
if err := h.client.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
return fmt.Errorf("failed to list ElastiServices: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

🛠️ Refactor suggestion

Avoid hard-coding the namespace when listing ElastiService CRs

The literal "shub-ws" makes the reconciler unusable in any other namespace and forces code changes for every tenant. Consider one of the following:

  1. Read the namespace from an env-var / CLI flag with a sane default.
  2. Pass the namespace as a field on ScaleHandler, initialised by the caller.
-if err := h.client.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
+targetNS := h.targetNamespace            // e.g. injected via constructor
+if err := h.client.List(ctx, elastiServiceList, client.InNamespace(targetNS)); err != nil {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elastiServiceList := &v1alpha1.ElastiServiceList{}
if err := h.client.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
return fmt.Errorf("failed to list ElastiServices: %w", err)
}
elastiServiceList := &v1alpha1.ElastiServiceList{}
targetNS := h.targetNamespace // e.g. injected via constructor
if err := h.client.List(ctx, elastiServiceList, client.InNamespace(targetNS)); err != nil {
return fmt.Errorf("failed to list ElastiServices: %w", err)
}
🤖 Prompt for AI Agents
In pkg/scaling/scale_handler.go around lines 79 to 82, the namespace "shub-ws"
is hard-coded when listing ElastiService CRs, limiting flexibility. To fix this,
remove the hard-coded string and instead obtain the namespace dynamically by
either reading it from an environment variable or CLI flag with a default
fallback, or by adding a namespace field to the ScaleHandler struct that is set
by the caller. Then use this dynamic namespace value in the client.List call to
make the reconciler reusable across different namespaces.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
operator/internal/controller/elastiservice_controller.go (1)

154-154: Namespace hard-coded to "shub-ws"; extract to configuration.

This is the same issue identified in previous reviews. The hardcoded namespace limits the operator's flexibility and reusability across different environments.

As previously suggested, expose the namespace via environment variable or CLI flag with fallback to watching all namespaces when unspecified:

-	if err := r.List(ctx, crdList, client.InNamespace("shub-ws")); err != nil {
+	ns := os.Getenv("ELASTI_NAMESPACE")
+	listOpts := []client.ListOption{}
+	if ns != "" {
+		listOpts = append(listOpts, client.InNamespace(ns))
+	}
+	if err := r.List(ctx, crdList, listOpts...); err != nil {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fac82d6 and 8249d51.

📒 Files selected for processing (1)
  • operator/internal/controller/elastiservice_controller.go (2 hunks)

Comment on lines +108 to +112
// if err := r.watchScaleTargetRef(ctx, es, req); err != nil {
// r.Logger.Error("Failed to add watch for ScaleTargetRef", zap.String("es", req.String()), zap.Any("scaleTargetRef", es.Spec.ScaleTargetRef), zap.Error(err))
// return res, err
// }
// r.Logger.Info("Watch added for ScaleTargetRef", zap.String("es", req.String()), zap.Any("scaleTargetRef", es.Spec.ScaleTargetRef))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify that removing the ScaleTargetRef watch doesn't break reconciliation.

The commented-out code that adds a watch for the ScaleTargetRef could impact the controller's ability to respond to changes in the target resource being scaled. This might cause the controller to miss important state changes.

Please run the following script to verify how ScaleTargetRef changes are handled elsewhere in the codebase:


🏁 Script executed:

#!/bin/bash
# Description: Check if ScaleTargetRef watching is handled elsewhere and verify impact

# Search for alternative ScaleTargetRef handling
echo "=== Searching for ScaleTargetRef usage ==="
rg -A 5 -B 5 "ScaleTargetRef|watchScaleTargetRef"

# Check if informer manager or scale handler provides equivalent functionality
echo "=== Checking informer manager and scale handler for ScaleTargetRef handling ==="
ast-grep --pattern 'func $_($$) {
  $$$
  ScaleTargetRef
  $$$
}'

echo "=== Checking for watch setup in other files ==="
rg -A 3 -B 3 "\.Watch\(" --type go

Length of output: 23526


Re-enable ScaleTargetRef watcher to preserve reconciliation on target changes

The commented-out call to r.watchScaleTargetRef means no informer is registered for ScaleTargetRef updates, so the controller will never react when the target resource scales. Restore this block in operator/internal/controller/elastiservice_controller.go:

@@ operator/internal/controller/elastiservice_controller.go:108
-   // if err := r.watchScaleTargetRef(ctx, es, req); err != nil {
-   //     r.Logger.Error("Failed to add watch for ScaleTargetRef", zap.String("es", req.String()), zap.Any("scaleTargetRef", es.Spec.ScaleTargetRef), zap.Error(err))
-   //     return res, err
-   // }
-   // r.Logger.Info("Watch added for ScaleTargetRef", zap.String("es", req.String()), zap.Any("scaleTargetRef", es.Spec.ScaleTargetRef))
+   if err := r.watchScaleTargetRef(ctx, es, req); err != nil {
+       r.Logger.Error("Failed to add watch for ScaleTargetRef", zap.String("es", req.String()), zap.Any("scaleTargetRef", es.Spec.ScaleTargetRef), zap.Error(err))
+       return res, err
+   }
+   r.Logger.Info("Watch added for ScaleTargetRef", zap.String("es", req.String()), zap.Any("scaleTargetRef", es.Spec.ScaleTargetRef))

Locations needing attention:

  • operator/internal/controller/elastiservice_controller.go (lines ~108–112): re-enable the watcher.
  • Verify in opsCRD.go and opsInformer.go that the informer logic remains correctly wired to handle scale-target events.
🤖 Prompt for AI Agents
In operator/internal/controller/elastiservice_controller.go around lines 108 to
112, the code that adds a watch for ScaleTargetRef is commented out, which
disables the controller's ability to respond to changes in the scale target
resource. To fix this, uncomment the block that calls r.watchScaleTargetRef and
the associated logging statements to re-enable the watcher. Additionally, verify
that the informer logic in opsCRD.go and opsInformer.go is correctly set up to
handle ScaleTargetRef events to ensure full reconciliation functionality.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
operator/cmd/main.go (1)

219-223: 💡 Verification agent

🧩 Analysis chain

WaitForCacheSync uses a background context – consider propagating shutdown signals.

Passing context.Background() means the wait will not be cancelled if the main signal handler is triggered during cache-sync.
Forward ctrl.SetupSignalHandler()’s context instead:

- if !mgr.GetCache().WaitForCacheSync(context.Background()) {
+ sigCtx := ctrl.SetupSignalHandler()
+ if !mgr.GetCache().WaitForCacheSync(sigCtx) {

🏁 Script executed:

#!/bin/bash
set -e

echo "== Import block and ctrl alias =="
sed -n '1,50p' operator/cmd/main.go

echo
echo "== Checking for SetupSignalHandler usage =="
grep -R "SetupSignalHandler" -n operator/cmd/main.go || echo "No occurrences of SetupSignalHandler found"

echo
echo "== Checking for WaitForCacheSync invocation =="
grep -n "WaitForCacheSync" operator/cmd/main.go

Length of output: 2085


🏁 Script executed:

#!/bin/bash
set -e

echo "== main function context: lines 180-260 =="
sed -n '180,260p' operator/cmd/main.go

Length of output: 2648


Propagate signal-handling context to WaitForCacheSync

Using context.Background() means the cache‐sync wait won’t be cancelled on SIGINT/SIGTERM. Capture the context from ctrl.SetupSignalHandler() once and reuse it for both starting the manager and syncing the cache:

--- operator/cmd/main.go
@@ func main() {
-   // start manager
-   go func() {
-       if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
+   // set up signal‐aware context
+   sigCtx := ctrl.SetupSignalHandler()
+
+   // start manager with signal context
+   go func() {
+       if err := mgr.Start(sigCtx); err != nil {
            setupLog.Error(err, "problem running manager")
            mgrErrChan <- fmt.Errorf("manager: %w", err)
        }
    }()
 
-   // Wait for cache to sync
-   if !mgr.GetCache().WaitForCacheSync(context.Background()) {
+   // Wait for cache to sync, aborting on shutdown signal
+   if !mgr.GetCache().WaitForCacheSync(sigCtx) {
        return fmt.Errorf("failed to sync cache")
    }

• File: operator/cmd/main.go at the manager startup and cache-sync section
• Capture sigCtx := ctrl.SetupSignalHandler() and pass sigCtx to both mgr.Start() and WaitForCacheSync() to ensure graceful shutdown during cache sync.

operator/internal/controller/scale_handler.go (1)

346-364: ⚠️ Potential issue

Patching the wrong resource & unused parameter – scaling will silently fail.

UpdateKedaScaledObjectPausedState:

  1. scaledObjectName is never used → linter failure.
  2. The patch is applied to ElastiService, not the KEDA ScaledObject,
    so the annotation update never reaches the autoscaler.
-func (h *ElastiServiceReconciler) UpdateKedaScaledObjectPausedState(ctx context.Context, scaledObjectName, namespace string, paused bool) error {
+func (h *ElastiServiceReconciler) UpdateKedaScaledObjectPausedState(ctx context.Context, scaledObjectName, namespace string, paused bool) error {
     ...
-    if err := h.Patch(ctx, &v1alpha1.ElastiService{}, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
-        return fmt.Errorf("failed to patch ElastiService: %w", err)
+    so := &unstructured.Unstructured{}
+    so.SetGroupVersionKind(schema.GroupVersionKind{
+        Group:   "keda.sh",
+        Version: "v1alpha1",
+        Kind:    "ScaledObject",
+    })
+    so.SetNamespace(namespace)
+    so.SetName(scaledObjectName)
+
+    if err := h.Patch(ctx, so, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
+        return fmt.Errorf("failed to patch ScaledObject %s: %w", scaledObjectName, err)
     }
     return nil
 }
🧰 Tools
🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 346-346:
unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

🪛 golangci-lint (1.64.8)

[warning] 346-346: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

🪛 GitHub Actions: Lint and test

[error] 346-346: golangci-lint: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

♻️ Duplicate comments (3)
operator/cmd/main.go (1)

230-246: ⚠️ Potential issue

Graceful-shutdown can still dead-lock; switch to a shared context + errgroup.

The code keeps the previous pattern of:

  1. starting the manager and the Elasti server in independent goroutines,
  2. waiting on two error channels.

If both mgr.Start and eServer.Start terminate gracefully (they return nil), neither goroutine writes to its error channel, so the select{} blocks forever and the process never exits.
A single context propagated to both components via errgroup.WithContext (or run.Group) eliminates this class of leak.

The same concern was raised in an earlier review and remains unresolved.
See previous review for an outline of the refactor.

operator/internal/controller/elastiservice_controller.go (1)

153-155: ⚠️ Potential issue

Hard-coded "shub-ws" namespace prevents multi-tenant use.

The reconciler is limited to one namespace when listing CRDs.
Expose the namespace via an environment variable / CLI flag, or remove the
filter to watch all namespaces.

This issue was pointed out previously and still exists.

operator/internal/controller/scale_handler.go (1)

62-67: ⚠️ Potential issue

Namespace hard-coded to "shub-ws" – same scalability concern.

checkAndScale repeats the single-namespace assumption. Please parameterise or
remove.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8249d51 and 68542b5.

📒 Files selected for processing (4)
  • operator/cmd/main.go (1 hunks)
  • operator/internal/controller/elastiservice_controller.go (3 hunks)
  • operator/internal/controller/scale_handler.go (14 hunks)
  • operator/internal/elastiserver/elastiServer.go (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
operator/cmd/main.go (1)
operator/internal/elastiserver/elastiServer.go (1)
  • NewServer (43-51)
operator/internal/elastiserver/elastiServer.go (2)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (27-36)
operator/api/v1alpha1/elastiservice_types.go (1)
  • ScaleTargetRef (44-50)
operator/internal/controller/elastiservice_controller.go (1)
operator/api/v1alpha1/elastiservice_types.go (1)
  • ElastiServiceList (76-80)
🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)
operator/internal/controller/scale_handler.go

[failure] 346-346:
unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

🪛 golangci-lint (1.64.8)
operator/internal/controller/scale_handler.go

[warning] 346-346: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

🪛 GitHub Actions: Lint and test
operator/internal/controller/scale_handler.go

[error] 346-346: golangci-lint: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

🔇 Additional comments (3)
operator/internal/elastiserver/elastiServer.go (1)

68-85: Embed shutdown into caller-supplied context instead of local signal.Notify.

The server installs its own signal handler and done channel, making it impossible for
the operator’s main program to coordinate a single, orderly shutdown of both
controller-manager and Elasti server.
Accept a context.Context in Start (or provide StartWithContext) and listen for
ctx.Done() to invoke http.Server.Shutdown.
This removes the duplicated signal handling and works seamlessly with an
errgroup in main.go.

-func (s *Server) Start(port string) error {
+func (s *Server) Start(ctx context.Context, port string) error {
   ...
-  quit := make(chan os.Signal, 1)
-  signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
+  go func() {
+        <-ctx.Done()
+        _ = server.Shutdown(context.Background())
+  }()

[ suggest_essential_refactor ]

operator/internal/controller/elastiservice_controller.go (1)

106-112: Re-enable the ScaleTargetRef watcher.

All code that registers a watch on the target resource is still commented out;
the reconciler therefore never reacts to updates on Deployments/Rollouts after
initial reconciliation, risking drift between desired and actual replicas.

Uncomment (or delete if replaced elsewhere) and ensure watchScaleTargetRef
is functional.

[ suggest_essential_refactor ]

operator/internal/controller/scale_handler.go (1)

384-410: Event construction duplicates logic – consider using k8s record.EventRecorder.

The controller-runtime manager already offers mgr.GetEventRecorderFor().
Using it gives consistent event formatting and avoids manual v1.Event
creation.

[ suggest_optional_refactor ]

Comment on lines +368 to 381
func (h *ElastiServiceReconciler) updateLastScaledUpTime(ctx context.Context, crdName, namespace string) error {
now := metav1.Now()
patchBytes := []byte(fmt.Sprintf(`{"status": {"lastScaledUpTime": "%s"}}`, now.Format(time.RFC3339Nano)))

_, err := h.kDynamicClient.Resource(values.ElastiServiceGVR).
Namespace(namespace).
Patch(ctx, crdName, types.MergePatchType, patchBytes, metav1.PatchOptions{}, "status")
if err != nil {
elastiService := &v1alpha1.ElastiService{}
if err := h.Get(ctx, types.NamespacedName{Namespace: namespace, Name: crdName}, elastiService); err != nil {
return fmt.Errorf("failed to get ElastiService: %w", err)
}

if err := h.Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
return fmt.Errorf("failed to patch ElastiService status: %w", err)
}
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Status update should use Status().Patch for clarity and RBAC correctness.

updateLastScaledUpTime crafts a status-only JSON patch but calls the
non-status Patch, which may fail with RBAC restrictions and is less explicit.

- if err := h.Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
+ if err := h.Status().Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (h *ElastiServiceReconciler) updateLastScaledUpTime(ctx context.Context, crdName, namespace string) error {
now := metav1.Now()
patchBytes := []byte(fmt.Sprintf(`{"status": {"lastScaledUpTime": "%s"}}`, now.Format(time.RFC3339Nano)))
_, err := h.kDynamicClient.Resource(values.ElastiServiceGVR).
Namespace(namespace).
Patch(ctx, crdName, types.MergePatchType, patchBytes, metav1.PatchOptions{}, "status")
if err != nil {
elastiService := &v1alpha1.ElastiService{}
if err := h.Get(ctx, types.NamespacedName{Namespace: namespace, Name: crdName}, elastiService); err != nil {
return fmt.Errorf("failed to get ElastiService: %w", err)
}
if err := h.Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
return fmt.Errorf("failed to patch ElastiService status: %w", err)
}
return nil
}
func (h *ElastiServiceReconciler) updateLastScaledUpTime(ctx context.Context, crdName, namespace string) error {
now := metav1.Now()
patchBytes := []byte(fmt.Sprintf(`{"status": {"lastScaledUpTime": "%s"}}`, now.Format(time.RFC3339Nano)))
elastiService := &v1alpha1.ElastiService{}
if err := h.Get(ctx, types.NamespacedName{Namespace: namespace, Name: crdName}, elastiService); err != nil {
return fmt.Errorf("failed to get ElastiService: %w", err)
}
if err := h.Status().Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
return fmt.Errorf("failed to patch ElastiService status: %w", err)
}
return nil
}
🤖 Prompt for AI Agents
In operator/internal/controller/scale_handler.go around lines 368 to 381, the
updateLastScaledUpTime function uses h.Patch to update the status field, but it
should use h.Status().Patch to explicitly patch the status subresource. Modify
the code to call h.Status().Patch with the same patch bytes to ensure RBAC
permissions are correct and the intent is clear.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (3)
operator/internal/controller/opsCRD.go (2)

127-182: 🧹 Nitpick (assertive)

Consider completing the refactoring for consistency.

The watchScaleTargetRef function still uses ctrl.Request parameter while other functions in this file have been refactored to use types.NamespacedName. This creates inconsistency in the API.

Consider updating this function signature to match the pattern used in other functions:

-func (r *ElastiServiceReconciler) watchScaleTargetRef(ctx context.Context, es *v1alpha1.ElastiService, req ctrl.Request) error {
+func (r *ElastiServiceReconciler) watchScaleTargetRef(ctx context.Context, es *v1alpha1.ElastiService, elastiServiceNamespacedName types.NamespacedName) error {

And update the internal references accordingly.

🧰 Tools
🪛 golangci-lint (1.64.8)

127-127: func (*ElastiServiceReconciler).watchScaleTargetRef is unused

(unused)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 127-127:
func (*ElastiServiceReconciler).watchScaleTargetRef is unused (unused)


208-231: 🧹 Nitpick (assertive)

Mixed parameter usage creates inconsistency.

The finalizeCRDIfDeleted function still receives ctrl.Request but only uses req.NamespacedName when calling finalizeCRD. This creates an inconsistent API where some functions expect the full request and others expect just the namespaced name.

Consider completing the refactoring for consistency:

-func (r *ElastiServiceReconciler) finalizeCRDIfDeleted(ctx context.Context, es *v1alpha1.ElastiService, req ctrl.Request) (isDeleted bool, err error) {
+func (r *ElastiServiceReconciler) finalizeCRDIfDeleted(ctx context.Context, es *v1alpha1.ElastiService, elastiServiceNamespacedName types.NamespacedName) (isDeleted bool, err error) {

And update the prometheus logging and other references accordingly.

operator/internal/controller/scale_handler.go (1)

352-370: ⚠️ Potential issue

Patching the wrong resource and unused parameter scaledObjectName

UpdateKedaScaledObjectPausedState should patch the KEDA ScaledObject, not an ElastiService. The function also never uses scaledObjectName, so the patch cannot target the intended object.

-func (h *ElastiServiceReconciler) UpdateKedaScaledObjectPausedState(ctx context.Context, scaledObjectName, namespace string, paused bool) error {
+func (h *ElastiServiceReconciler) UpdateKedaScaledObjectPausedState(ctx context.Context, scaledObjectName, namespace string, paused bool) error {
   ...
- if err := h.Patch(ctx, &v1alpha1.ElastiService{}, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
-   return fmt.Errorf("failed to patch ElastiService: %w", err)
+ so := &unstructured.Unstructured{}
+ so.SetGroupVersionKind(schema.GroupVersionKind{
+   Group:   "keda.sh",
+   Version: "v1alpha1",
+   Kind:    "ScaledObject",
+ })
+
+ if err := h.Get(ctx, types.NamespacedName{Name: scaledObjectName, Namespace: namespace}, so); err != nil {
+   return fmt.Errorf("failed to fetch ScaledObject: %w", err)
+ }
+
+ if err := h.Patch(ctx, so, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
+   return fmt.Errorf("failed to patch ScaledObject: %w", err)
   }
   return nil
 }
🧰 Tools
🪛 golangci-lint (1.64.8)

[warning] 352-352: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 352-352:
unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

♻️ Duplicate comments (1)
operator/internal/controller/scale_handler.go (1)

374-387: Status patch should use Status().Patch (duplicate of prior feedback)

Previous review highlighted that updating status via Patch may fail due to RBAC. Please switch to h.Status().Patch.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68542b5 and 85c23b6.

📒 Files selected for processing (7)
  • operator/internal/controller/opsCRD.go (5 hunks)
  • operator/internal/controller/opsDeployment.go (1 hunks)
  • operator/internal/controller/opsInformer.go (3 hunks)
  • operator/internal/controller/opsModes.go (3 hunks)
  • operator/internal/controller/opsRollout.go (2 hunks)
  • operator/internal/controller/scale_handler.go (14 hunks)
  • operator/internal/informer/informer.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
operator/internal/controller/opsRollout.go (3)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (27-36)
operator/api/v1alpha1/elastiservice_types.go (2)
  • ElastiService (65-71)
  • ScaleTargetRef (44-50)
pkg/values/values.go (3)
  • ProxyMode (18-18)
  • ArgoPhaseHealthy (10-10)
  • ServeMode (17-17)
operator/internal/controller/opsDeployment.go (4)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (27-36)
operator/api/v1alpha1/elastiservice_types.go (1)
  • ElastiService (65-71)
pkg/k8shelper/utils.go (1)
  • UnstructuredToResource (10-17)
pkg/values/values.go (3)
  • ProxyMode (18-18)
  • DeploymentConditionStatusTrue (11-11)
  • ServeMode (17-17)
operator/internal/controller/opsInformer.go (5)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (27-36)
operator/api/v1alpha1/elastiservice_types.go (2)
  • ElastiService (65-71)
  • ScaleTargetRef (44-50)
operator/internal/informer/informer.go (1)
  • KeyParams (343-348)
pkg/values/values.go (4)
  • KindService (15-15)
  • Success (21-21)
  • KindDeployments (13-13)
  • KindRollout (14-14)
operator/internal/prom/prom.go (1)
  • InformerHandlerCounter (58-64)
operator/internal/controller/opsCRD.go (7)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (27-36)
operator/api/v1alpha1/elastiservice_types.go (2)
  • ElastiService (65-71)
  • ScaleTargetRef (44-50)
operator/internal/crddirectory/directory.go (1)
  • RemoveCRD (41-46)
pkg/utils/utils.go (1)
  • ParseAPIVersion (39-48)
operator/internal/informer/informer.go (1)
  • RequestWatch (60-66)
pkg/k8shelper/errors.go (1)
  • ErrNoPublicServiceFound (8-8)
pkg/values/values.go (1)
  • ServiceGVR (39-43)
🪛 golangci-lint (1.64.8)
operator/internal/controller/opsRollout.go

16-16: func (*ElastiServiceReconciler).handleTargetRolloutChanges is unused

(unused)

operator/internal/controller/opsDeployment.go

[warning] 18-18: unused-parameter: parameter 'es' seems to be unused, consider removing or renaming it as _

(revive)

operator/internal/controller/opsInformer.go

108-108: func (*ElastiServiceReconciler).getScaleTargetRefChangeHandler is unused

(unused)


131-131: func (*ElastiServiceReconciler).handleScaleTargetRefChanges is unused

(unused)

operator/internal/controller/scale_handler.go

[warning] 352-352: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)
operator/internal/controller/opsRollout.go

[failure] 16-16:
func (*ElastiServiceReconciler).handleTargetRolloutChanges is unused (unused)

operator/internal/controller/opsDeployment.go

[failure] 18-18:
unused-parameter: parameter 'es' seems to be unused, consider removing or renaming it as _ (revive)

operator/internal/controller/opsInformer.go

[failure] 108-108:
func (*ElastiServiceReconciler).getScaleTargetRefChangeHandler is unused (unused)


[failure] 131-131:
func (*ElastiServiceReconciler).handleScaleTargetRefChanges is unused (unused)

operator/internal/controller/scale_handler.go

[failure] 352-352:
unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

🪛 GitHub Actions: Lint and test
operator/internal/controller/opsDeployment.go

[error] 18-18: golangci-lint (revive): parameter 'es' seems to be unused, consider removing or renaming it as _ (unused-parameter)

🔇 Additional comments (9)
operator/internal/controller/opsRollout.go (2)

16-36: LGTM! Function signature refactoring is consistent.

The refactoring from ctrl.Request to types.NamespacedName simplifies the interface and aligns with the broader codebase changes. The function logic correctly handles rollout status changes and mode switching.

🧰 Tools
🪛 golangci-lint (1.64.8)

16-16: func (*ElastiServiceReconciler).handleTargetRolloutChanges is unused

(unused)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 16-16:
func (*ElastiServiceReconciler).handleTargetRolloutChanges is unused (unused)


16-16:

❌ Incorrect review comment

Verify function usage across the codebase.

The static analysis indicates this function is unused, which could mean it's dead code or missing integration with the new refactored interface.

Run the following script to verify if this function is called anywhere in the codebase:


🏁 Script executed:

#!/bin/bash
# Description: Search for calls to handleTargetRolloutChanges to verify usage
# Expected: Find function calls or confirm it's unused

ast-grep --pattern 'handleTargetRolloutChanges($$$)'

Length of output: 54


To get complete coverage, let’s search for any textual references (not just AST matches):

This will confirm whether the function is truly unused or invoked indirectly.


🏁 Script executed:

#!/bin/bash
# Search for all occurrences of handleTargetRolloutChanges in the repository
rg -n 'handleTargetRolloutChanges' .

Length of output: 400


Function Invocation Verified – No Dead Code

A search shows handleTargetRolloutChanges is invoked in operator/internal/controller/opsInformer.go:137, so it’s not dead code. The signature change from ctrl.Request to types.NamespacedName is correctly applied in both the definition (opsRollout.go) and its caller (opsInformer.go). No further changes are needed.

Likely an incorrect or invalid review comment.

🧰 Tools
🪛 golangci-lint (1.64.8)

16-16: func (*ElastiServiceReconciler).handleTargetRolloutChanges is unused

(unused)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 16-16:
func (*ElastiServiceReconciler).handleTargetRolloutChanges is unused (unused)

operator/internal/controller/opsDeployment.go (1)

18-37: Function signature refactoring looks good.

The change from ctrl.Request to types.NamespacedName is consistent with the broader refactoring effort and simplifies the interface.

🧰 Tools
🪛 golangci-lint (1.64.8)

[warning] 18-18: unused-parameter: parameter 'es' seems to be unused, consider removing or renaming it as _

(revive)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 18-18:
unused-parameter: parameter 'es' seems to be unused, consider removing or renaming it as _ (revive)

🪛 GitHub Actions: Lint and test

[error] 18-18: golangci-lint (revive): parameter 'es' seems to be unused, consider removing or renaming it as _ (unused-parameter)

operator/internal/informer/informer.go (3)

60-66: LGTM! RequestWatch struct refactoring improves consistency.

The consolidation of ElastiService identification into a single types.NamespacedName field is cleaner and more consistent with Kubernetes conventions.


232-243: LGTM! WatchDeployment method updated correctly.

The method properly maps the controller request to the new RequestWatch struct format.


335-341: LGTM! Key generation logic updated consistently.

The getKeyFromRequestWatch method correctly uses the new ElastiServiceNamespacedName field structure to generate informer keys.

operator/internal/controller/opsCRD.go (2)

78-123: LGTM! finalizeCRD function refactoring is consistent.

The function signature change from ctrl.Request to types.NamespacedName simplifies the interface and all internal usage has been updated correctly.


185-206: LGTM! watchPublicService function refactoring is well-executed.

The function has been properly refactored to use types.NamespacedName and all internal usage is consistent.

operator/internal/controller/opsInformer.go (1)

108-129: Static-analysis flags these handlers as unused

getScaleTargetRefChangeHandler and handleScaleTargetRefChanges are not referenced after the refactor, so the informer is never started for the scale target ref. Either:

  1. Wire the handler into the informer manager, or
  2. Remove the dead code to silence golangci-lint.

Please verify which path is intended.

🧰 Tools
🪛 golangci-lint (1.64.8)

108-108: func (*ElastiServiceReconciler).getScaleTargetRefChangeHandler is unused

(unused)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 108-108:
func (*ElastiServiceReconciler).getScaleTargetRefChangeHandler is unused (unused)

r.Logger.Info("ScaleTargetRef Deployment has 0 replicas", zap.String("deployment_name", targetDeployment.Name), zap.String("es", req.String()))
if err := r.switchMode(ctx, req, values.ProxyMode); err != nil {
r.Logger.Info("ScaleTargetRef Deployment has 0 replicas", zap.String("deployment_name", targetDeployment.Name), zap.String("es", elastiServiceNamespacedName.String()))
if err := r.switchMode(ctx, elastiServiceNamespacedName, values.ProxyMode); err != nil {
return fmt.Errorf("failed to switch mode: %w", err)
}
} else if targetDeployment.Status.Replicas > 0 && condition[1].Status == values.DeploymentConditionStatusTrue {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential array bounds error.

Accessing condition[1] without checking the array length could cause a panic if the conditions slice has fewer than 2 elements.

Apply this diff to add bounds checking:

-	} else if targetDeployment.Status.Replicas > 0 && condition[1].Status == values.DeploymentConditionStatusTrue {
+	} else if targetDeployment.Status.Replicas > 0 && len(condition) > 1 && condition[1].Status == values.DeploymentConditionStatusTrue {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if targetDeployment.Status.Replicas > 0 && condition[1].Status == values.DeploymentConditionStatusTrue {
} else if targetDeployment.Status.Replicas > 0 && len(condition) > 1 && condition[1].Status == values.DeploymentConditionStatusTrue {
🤖 Prompt for AI Agents
In operator/internal/controller/opsDeployment.go at line 30, the code accesses
condition[1] without verifying the length of the condition slice, which can
cause a panic if the slice has fewer than two elements. Add a check to ensure
the length of condition is greater than 1 before accessing condition[1]. This
prevents out-of-bounds errors and ensures safe access to the slice element.

)

func (r *ElastiServiceReconciler) handleTargetDeploymentChanges(ctx context.Context, obj interface{}, _ *v1alpha1.ElastiService, req ctrl.Request) error {
func (r *ElastiServiceReconciler) handleTargetDeploymentChanges(ctx context.Context, obj interface{}, elastiServiceNamespacedName types.NamespacedName, es *v1alpha1.ElastiService) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Address unused parameter warning.

The static analysis correctly identifies that the es parameter is not used in this function. Consider removing it or renaming it to _ to indicate it's intentionally unused.

Apply this diff to address the unused parameter:

-func (r *ElastiServiceReconciler) handleTargetDeploymentChanges(ctx context.Context, obj interface{}, elastiServiceNamespacedName types.NamespacedName, es *v1alpha1.ElastiService) error {
+func (r *ElastiServiceReconciler) handleTargetDeploymentChanges(ctx context.Context, obj interface{}, elastiServiceNamespacedName types.NamespacedName, _ *v1alpha1.ElastiService) error {
🧰 Tools
🪛 golangci-lint (1.64.8)

[warning] 18-18: unused-parameter: parameter 'es' seems to be unused, consider removing or renaming it as _

(revive)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 18-18:
unused-parameter: parameter 'es' seems to be unused, consider removing or renaming it as _ (revive)

🪛 GitHub Actions: Lint and test

[error] 18-18: golangci-lint (revive): parameter 'es' seems to be unused, consider removing or renaming it as _ (unused-parameter)

🤖 Prompt for AI Agents
In operator/internal/controller/opsDeployment.go at line 18, the parameter `es`
in the function handleTargetDeploymentChanges is unused, causing a warning. To
fix this, either remove the `es` parameter entirely if it is not needed, or
rename it to `_` to explicitly mark it as unused and suppress the warning.

Comment on lines 36 to 38
//nolint: errcheck
defer r.updateCRDStatus(ctx, req.NamespacedName, mode)
defer r.updateCRDStatus(ctx, elastiServiceNamespacedName, mode)
switch mode {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t ignore updateCRDStatus errors – they signal reconciliation drift

Suppressing the error with //nolint: errcheck means failures in persisting status never surface, potentially leaving the CRD in an incorrect state that later logic relies on.

Options:

- //nolint: errcheck
- defer r.updateCRDStatus(ctx, elastiServiceNamespacedName, mode)
+ defer func() {
+   if err := r.updateCRDStatus(ctx, elastiServiceNamespacedName, mode); err != nil {
+     r.Logger.Error("failed to update CR status", 
+       zap.String("es", elastiServiceNamespacedName.String()), 
+       zap.Error(err))
+   }
+ }()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//nolint: errcheck
defer r.updateCRDStatus(ctx, req.NamespacedName, mode)
defer r.updateCRDStatus(ctx, elastiServiceNamespacedName, mode)
switch mode {
// replace the unchecked defer with an error‐checking wrapper
defer func() {
if err := r.updateCRDStatus(ctx, elastiServiceNamespacedName, mode); err != nil {
r.Logger.Error("failed to update CR status",
zap.String("es", elastiServiceNamespacedName.String()),
zap.Error(err))
}
}()
switch mode {
🤖 Prompt for AI Agents
In operator/internal/controller/opsModes.go around lines 36 to 38, the error
returned by the deferred call to updateCRDStatus is currently ignored using
//nolint: errcheck. To fix this, remove the nolint directive and handle the
error properly by capturing the returned error and logging it or returning it up
the call stack to ensure reconciliation issues are detected and addressed.

Comment on lines +22 to 28
func (r *ElastiServiceReconciler) switchMode(ctx context.Context, elastiServiceNamespacedName types.NamespacedName, mode string) error {
{
r.Logger.Debug(fmt.Sprintf("[Switching to %s Mode]", strings.ToUpper(mode)), zap.String("es", req.NamespacedName.String()))
mutex := r.getMutexForSwitchMode(req.NamespacedName.String())
r.Logger.Debug(fmt.Sprintf("[Switching to %s Mode]", strings.ToUpper(mode)), zap.String("es", elastiServiceNamespacedName.String()))
mutex := r.getMutexForSwitchMode(elastiServiceNamespacedName.String())
mutex.Lock()
defer mutex.Unlock()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use structured logging instead of fmt.Sprintf for better observability

The zap logger already supports fields; constructing a message with fmt.Sprintf defeats log-level filtering and structured searching. You can achieve the same output (and richer filtering) by supplying the mode as a field.

- r.Logger.Debug(fmt.Sprintf("[Switching to %s Mode]", strings.ToUpper(mode)), zap.String("es", elastiServiceNamespacedName.String()))
+ r.Logger.Debug("[Switching mode]", 
+   zap.String("desiredMode", strings.ToUpper(mode)),
+   zap.String("es", elastiServiceNamespacedName.String()))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (r *ElastiServiceReconciler) switchMode(ctx context.Context, elastiServiceNamespacedName types.NamespacedName, mode string) error {
{
r.Logger.Debug(fmt.Sprintf("[Switching to %s Mode]", strings.ToUpper(mode)), zap.String("es", req.NamespacedName.String()))
mutex := r.getMutexForSwitchMode(req.NamespacedName.String())
r.Logger.Debug(fmt.Sprintf("[Switching to %s Mode]", strings.ToUpper(mode)), zap.String("es", elastiServiceNamespacedName.String()))
mutex := r.getMutexForSwitchMode(elastiServiceNamespacedName.String())
mutex.Lock()
defer mutex.Unlock()
}
func (r *ElastiServiceReconciler) switchMode(ctx context.Context, elastiServiceNamespacedName types.NamespacedName, mode string) error {
{
- r.Logger.Debug(fmt.Sprintf("[Switching to %s Mode]", strings.ToUpper(mode)), zap.String("es", elastiServiceNamespacedName.String()))
+ r.Logger.Debug("[Switching mode]",
+ zap.String("desiredMode", strings.ToUpper(mode)),
+ zap.String("es", elastiServiceNamespacedName.String()))
mutex := r.getMutexForSwitchMode(elastiServiceNamespacedName.String())
mutex.Lock()
defer mutex.Unlock()
}
🤖 Prompt for AI Agents
In operator/internal/controller/opsModes.go around lines 22 to 28, replace the
use of fmt.Sprintf inside the r.Logger.Debug call with structured logging by
passing the mode as a zap.String field instead of embedding it in the message
string. This means removing fmt.Sprintf and using a static message like
"[Switching Mode]" while adding zap.String("mode", strings.ToUpper(mode)) as a
field to enable better log-level filtering and structured searching.

Comment on lines +34 to 40
func (r *ElastiServiceReconciler) getMutexKeyForPublicSVC(namespacedName types.NamespacedName) string {
return namespacedName.String() + lockKeyPostfixForPublicSVC
}

func (r *ElastiServiceReconciler) getMutexKeyForTargetRef(req ctrl.Request) string {
return req.String() + lockKeyPostfixForTargetRef
func (r *ElastiServiceReconciler) getMutexKeyForTargetRef(namespacedName types.NamespacedName) string {
return namespacedName.String() + lockKeyPostfixForTargetRef
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add separator in mutex-key to avoid accidental collisions

Concatenating namespacedName.String() and the postfix without a delimiter can yield ambiguous keys (e.g., foo-barpublic-service vs foo-bar-publicservice).
Recommend inserting ":" (or /) for clarity:

-return namespacedName.String() + lockKeyPostfixForPublicSVC
+return namespacedName.String() + ":" + lockKeyPostfixForPublicSVC

(Apply the same to getMutexKeyForTargetRef.)

🤖 Prompt for AI Agents
In operator/internal/controller/opsInformer.go around lines 34 to 40, the mutex
key generation functions concatenate namespacedName.String() and postfixes
without a separator, which can cause ambiguous keys. Fix this by inserting a
clear delimiter such as ":" or "/" between namespacedName.String() and the
postfix in both getMutexKeyForPublicSVC and getMutexKeyForTargetRef functions to
ensure unique and unambiguous keys.

Comment on lines +108 to 114
func (r *ElastiServiceReconciler) getScaleTargetRefChangeHandler(ctx context.Context, es *v1alpha1.ElastiService, elastiServiceNamespacedName types.NamespacedName) cache.ResourceEventHandlerFuncs {
key := r.InformerManager.GetKey(informer.KeyParams{
Namespace: req.Namespace,
CRDName: req.Name,
Namespace: elastiServiceNamespacedName.Namespace,
CRDName: elastiServiceNamespacedName.Name,
ResourceName: es.Spec.ScaleTargetRef.Kind,
Resource: es.Spec.ScaleTargetRef.Name,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resource and ResourceName are swapped, breaking informer key uniqueness

KeyParams expects:
Resource → kind (e.g., deployments)
ResourceName → actual object name

Here, Kind and Name are assigned in reverse order, which will create duplicate keys and metrics labels when multiple CRDs reference the same workload kind.

-   ResourceName: es.Spec.ScaleTargetRef.Kind,
-   Resource:     es.Spec.ScaleTargetRef.Name,
+   Resource:     es.Spec.ScaleTargetRef.Kind,
+   ResourceName: es.Spec.ScaleTargetRef.Name,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (r *ElastiServiceReconciler) getScaleTargetRefChangeHandler(ctx context.Context, es *v1alpha1.ElastiService, elastiServiceNamespacedName types.NamespacedName) cache.ResourceEventHandlerFuncs {
key := r.InformerManager.GetKey(informer.KeyParams{
Namespace: req.Namespace,
CRDName: req.Name,
Namespace: elastiServiceNamespacedName.Namespace,
CRDName: elastiServiceNamespacedName.Name,
ResourceName: es.Spec.ScaleTargetRef.Kind,
Resource: es.Spec.ScaleTargetRef.Name,
})
func (r *ElastiServiceReconciler) getScaleTargetRefChangeHandler(ctx context.Context, es *v1alpha1.ElastiService, elastiServiceNamespacedName types.NamespacedName) cache.ResourceEventHandlerFuncs {
key := r.InformerManager.GetKey(informer.KeyParams{
Namespace: elastiServiceNamespacedName.Namespace,
CRDName: elastiServiceNamespacedName.Name,
Resource: es.Spec.ScaleTargetRef.Kind,
ResourceName: es.Spec.ScaleTargetRef.Name,
})
// ...
}
🧰 Tools
🪛 golangci-lint (1.64.8)

108-108: func (*ElastiServiceReconciler).getScaleTargetRefChangeHandler is unused

(unused)

🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 108-108:
func (*ElastiServiceReconciler).getScaleTargetRefChangeHandler is unused (unused)

🤖 Prompt for AI Agents
In operator/internal/controller/opsInformer.go around lines 108 to 114, the
fields Resource and ResourceName in the KeyParams struct are assigned
incorrectly: Resource should be the kind (e.g., deployments) and ResourceName
should be the actual object name. Swap the assignments so that Resource receives
es.Spec.ScaleTargetRef.Kind and ResourceName receives
es.Spec.ScaleTargetRef.Name to ensure unique informer keys and correct metrics
labeling.

Comment on lines +30 to 33
func (h *ElastiServiceReconciler) getMutexForScale(key string) *sync.Mutex {
l, _ := h.ScaleLocks.LoadOrStore(key, &sync.Mutex{})
return l.(*sync.Mutex)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Memory growth risk with ScaleLocks map

Locks are never deleted, so long-running clusters with many distinct services will leak memory. Consider:

defer h.ScaleLocks.Delete(key)

once scaling completes.

🤖 Prompt for AI Agents
In operator/internal/controller/scale_handler.go around lines 30 to 33, the
ScaleLocks map retains mutexes indefinitely, causing memory growth for
long-running clusters with many services. To fix this, add a call to delete the
mutex from ScaleLocks after the scaling operation completes, for example by
deferring h.ScaleLocks.Delete(key) in the function that uses the mutex to ensure
cleanup and prevent memory leaks.

Comment on lines +62 to 66
func (h *ElastiServiceReconciler) checkAndScale(ctx context.Context) error {
elastiServiceList := &v1alpha1.ElastiServiceList{}
if err := h.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
return fmt.Errorf("failed to list ElastiServices: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hard-coded namespace prevents scaling outside shub-ws

Listing ElastiServices with client.InNamespace("shub-ws") ignores resources in other namespaces and will break multi-tenant clusters.

- if err := h.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
+ // List across all namespaces. Use label selectors if you need scoping.
+ if err := h.List(ctx, elastiServiceList); err != nil {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (h *ElastiServiceReconciler) checkAndScale(ctx context.Context) error {
elastiServiceList := &v1alpha1.ElastiServiceList{}
if err := h.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
return fmt.Errorf("failed to list ElastiServices: %w", err)
}
func (h *ElastiServiceReconciler) checkAndScale(ctx context.Context) error {
elastiServiceList := &v1alpha1.ElastiServiceList{}
// List across all namespaces. Use label selectors if you need scoping.
if err := h.List(ctx, elastiServiceList); err != nil {
return fmt.Errorf("failed to list ElastiServices: %w", err)
}
🤖 Prompt for AI Agents
In operator/internal/controller/scale_handler.go around lines 62 to 66, the code
uses a hard-coded namespace "shub-ws" when listing ElastiServices, which
restricts scaling to that namespace only. Modify the code to remove the
namespace filter or make the namespace configurable so that it can list
ElastiServices across all namespaces, enabling support for multi-tenant
clusters.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
operator/internal/controller/scale_handler.go (1)

346-364: ⚠️ Potential issue

Wrong resource patched & unused parameter

  1. The method claims to update a KEDA ScaledObject, but it patches an ElastiService{} object.
  2. scaledObjectName is never used (flagged by golangci-lint).
-func (h *ElastiServiceReconciler) UpdateKedaScaledObjectPausedState(ctx context.Context, scaledObjectName, namespace string, paused bool) error {
+func (h *ElastiServiceReconciler) UpdateKedaScaledObjectPausedState(ctx context.Context, scaledObjectName, namespace string, paused bool) error {
     ...
-    if err := h.Patch(ctx, &v1alpha1.ElastiService{}, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
-        return fmt.Errorf("failed to patch ElastiService: %w", err)
+    so := &unstructured.Unstructured{}
+    so.SetGroupVersionKind(values.ScaledObjectGVK)          // <-- define once in values
+    so.SetName(scaledObjectName)
+    so.SetNamespace(namespace)
+
+    if err := h.Patch(ctx, so, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
+        return fmt.Errorf("failed to patch ScaledObject: %w", err)
     }
     return nil
 }

This removes the linter warning and ensures the correct CRD is modified.

🧰 Tools
🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)

[failure] 346-346:
unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

🪛 golangci-lint (1.64.8)

[warning] 346-346: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

♻️ Duplicate comments (3)
operator/internal/controller/scale_handler.go (3)

30-33: 🛠️ Refactor suggestion

Possible memory-leak: ScaleLocks entries are never removed
getMutexForScale inserts a new mutex into ScaleLocks but nothing deletes it after the scaling operation finishes. On long-running clusters this map can grow unbounded.

 func (h *ElastiServiceReconciler) ScaleTargetFromZero(...){
     mutex := h.getMutexForScale(namespacedName.String())
     mutex.Lock()
-    defer mutex.Unlock()
+    defer func() {
+        mutex.Unlock()
+        h.ScaleLocks.Delete(namespacedName.String()) // cleanup
+    }()
 }

62-66: 🛠️ Refactor suggestion

Hard-coded namespace blocks multi-tenant support
client.InNamespace("shub-ws") limits the operator to a single namespace and ignores other ElastiService objects.

- if err := h.List(ctx, elastiServiceList, client.InNamespace("shub-ws")); err != nil {
+ // list across all namespaces; add a label selector if you need scoping
+ if err := h.List(ctx, elastiServiceList); err != nil {

368-377: 🛠️ Refactor suggestion

Status patch should use Status().Patch
updateLastScaledUpTime writes to the status sub-resource but calls the generic Patch, which can fail with RBAC “forbidden: User ... cannot patch resource ...”. Use the explicit status client:

- if err := h.Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
+ if err := h.Status().Patch(ctx, elastiService, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85c23b6 and eb441bf.

📒 Files selected for processing (2)
  • operator/internal/controller/opsServices.go (1 hunks)
  • operator/internal/controller/scale_handler.go (14 hunks)
🧰 Additional context used
🪛 GitHub Check: golangci-lint (/home/runner/work/elasti/elasti/operator)
operator/internal/controller/scale_handler.go

[failure] 346-346:
unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _ (revive)

🪛 golangci-lint (1.64.8)
operator/internal/controller/scale_handler.go

[warning] 346-346: unused-parameter: parameter 'scaledObjectName' seems to be unused, consider removing or renaming it as _

(revive)

Comment on lines +299 to 316
deploy := &appsv1.Deployment{}
if err := h.Get(ctx, types.NamespacedName{Namespace: ns, Name: targetName}, deploy); err != nil {
return fmt.Errorf("ScaleDeployment - GET: %w", err)
}

h.logger.Debug("Deployment found", zap.String("deployment", targetName), zap.Int32("current replicas", *deploy.Spec.Replicas), zap.Int32("desired replicas", replicas))
h.Logger.Debug("Deployment found", zap.String("deployment", targetName), zap.Int32("current replicas", *deploy.Spec.Replicas), zap.Int32("desired replicas", replicas))
if deploy.Spec.Replicas != nil && *deploy.Spec.Replicas != replicas {
patchBytes := []byte(fmt.Sprintf(`{"spec":{"replicas":%d}}`, replicas))
_, err = deploymentClient.Patch(ctx, targetName, types.StrategicMergePatchType, patchBytes, metav1.PatchOptions{})
err := h.Patch(ctx, deploy, client.RawPatch(types.MergePatchType, patchBytes))
if err != nil {
return fmt.Errorf("ScaleDeployment - Patch: %w", err)
}
h.logger.Info("Deployment scaled", zap.String("deployment", targetName), zap.Int32("replicas", replicas))
h.Logger.Info("Deployment scaled", zap.String("deployment", targetName), zap.Int32("replicas", replicas))
return nil
}
h.logger.Info("Deployment already scaled", zap.String("deployment", targetName), zap.Int32("current replicas", *deploy.Spec.Replicas))
h.Logger.Info("Deployment already scaled", zap.String("deployment", targetName), zap.Int32("current replicas", *deploy.Spec.Replicas))
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Nil-pointer panic when spec.replicas is unspecified
*deploy.Spec.Replicas is dereferenced for logging before you check if the pointer is nil. A deployment created without an explicit replica count (allowed by the API) will crash the reconciler.

- h.Logger.Debug("Deployment found",
-     zap.String("deployment", targetName),
-     zap.Int32("current replicas", *deploy.Spec.Replicas),
-     zap.Int32("desired replicas", replicas))
+ var current int32
+ if deploy.Spec.Replicas != nil {
+     current = *deploy.Spec.Replicas
+ }
+ h.Logger.Debug("Deployment found",
+     zap.String("deployment", targetName),
+     zap.Int32("current replicas", current),
+     zap.Int32("desired replicas", replicas))

Also guard the equality check:

if deploy.Spec.Replicas == nil || *deploy.Spec.Replicas != replicas { ... }
🤖 Prompt for AI Agents
In operator/internal/controller/scale_handler.go around lines 299 to 316, the
code dereferences deploy.Spec.Replicas for logging before checking if it is nil,
causing a nil-pointer panic if replicas is unspecified. To fix this, add a nil
check before dereferencing deploy.Spec.Replicas in the debug and info logs, and
modify the condition to scale the deployment to: if deploy.Spec.Replicas == nil
|| *deploy.Spec.Replicas != replicas. This ensures safe access and correct
scaling logic.

Comment on lines +44 to 46
// TODO: When would this happen?
r.Logger.Info("Private service already exists", zap.String("private-service", privateServiceNamespacedName.String()))
return privateServiceName, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify TODO or remove
The comment // TODO: When would this happen? offers no actionable guidance to future maintainers. Either explain the scenario (e.g., “when informer cache is stale and we receive the same object twice”) or delete the comment to avoid confusion.

🤖 Prompt for AI Agents
In operator/internal/controller/opsServices.go around lines 44 to 46, the TODO
comment "When would this happen?" is unclear and unhelpful. Either replace it
with a clear explanation of the scenario when this condition occurs, such as
"when informer cache is stale and the same object is received twice," or remove
the comment entirely to avoid confusion for future maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant