Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Task Decorator to Work With and Without Feature Flag (AAP-41775) #15911

Merged
merged 9 commits into from
Mar 31, 2025

Conversation

art-tapin
Copy link

SUMMARY

Implemented a solution that enables AWX's task system to work with both the legacy dispatcher and the new dispatcherd library based on the FEATURE_NEW_DISPATCHER feature flag. Created a registry pattern instead of trying to modify method objects directly.

ISSUE TYPE
  • Bug Fix
COMPONENT NAME
  • Other (Dispatcher System)
AWX VERSION
ADDITIONAL INFORMATION

Problem:
When initially tried to modify methods directly with cluster_node_heartbeat.apply_async._new_method = ..., I encountered AttributeError: 'method' object has no attribute '_new_method' because Python bound methods don't support attribute assignment.

With the feature flag disabled, I also encountered this scheduler error (probably known):

File "/awx_devel/awx/main/dispatch/periodic.py", line 26, in __init__
  self.interval = int(data['schedule'].total_seconds())
AttributeError: 'float' object has no attribute 'total_seconds'

Solution:

  1. In publish.py:

    • Added ALTERNATIVE_TASK_IMPLEMENTATIONS registry at module level
    • Modified apply_async to check the registry when the feature flag is enabled
  2. In system.py:

    • Refactored cluster_node_heartbeat into two implementations:
      • Original AWX version with bind_kwargs
      • New dispatcherd version using bind=True
    • Extracted common code into helper functions
    • Added _get_active_task_ids_from_dispatcherd() to retrieve running tasks from the new dispatcher
    • Registered the alternative implementation in the central registry
  3. Other minor adjustments to ensure compatibility with both systems

Testing Steps:

# (For ansible org) Pull the right image
docker image pull ghcr.io/ansible/awx_devel:feature_dispatcher

# Start AWX with feature dispatcher
COMPOSE_TAG=feature_dispatcher make docker-compose

# In AWX container, test with flag enabled:
# First, enable it in awx/settings/development_defaults.py
awx-manage shell -c "from awx.main.tasks.system import cluster_node_heartbeat; cluster_node_heartbeat.delay()"
# From the logs
grep "Using dispatcherd implementation"

# Test with flag disabled:
# First, enable it in awx/settings/development_defaults.py
awx-manage shell -c "from awx.main.tasks.system import cluster_node_heartbeat; cluster_node_heartbeat.delay()"

For more extensive in-the-code debugging logs, checkout the revision (commit): fix(dispatcher): Implement registry pattern for dispatcher feature flag compatibility

…node_heartbeat

Extract common heartbeat logic into helper functions:  _heartbeat_instance_management: consolidates instance management, health checks, and lost-instance detection.  _heartbeat_check_versions: compares instance versions and initiates shutdown when necessary.  _heartbeat_handle_lost_instances: reaps jobs and marks lost instances offline.

Refactor the original cluster_node_heartbeat to use these helpers and retain legacy behavior (using bind_kwargs).

Introduce adispatch_cluster_node_heartbeat for dispatcherd: uses the control API to retrieve running tasks and reaps them.

Link the two implementations by attaching adispatch_cluster_node_heartbeat as the _new_method on cluster_node_heartbeat.
…implementation

Update apply_async to check at runtime if FEATURE_NEW_DISPATCHER is enabled.

When the task is cluster_node_heartbeat and a _new_method is attached, delegate the task submission to the new dispatcherd implementation.

Preserve the original behavior for all other tasks and fallback on error.
…per function

Improves readability of adispatch_cluster_node_heartbeat by extracting
the complex UUID parsing logic into a dedicated helper function.
Adds clearer error handling and follows established code patterns.
…re flag

Implemented a new approach for handling task execution with feature flags
by attaching alternative implementations to apply_async._new_method. This
allows cluster_node_heartbeat to work correctly with both the legacy and
new dispatcher systems without modifying core decorator logic.

AAP-41775
…mplementation

- Add error handling when attaching alternative dispatcher implementation
- Fix method self-reference in apply_async to properly use cls.apply_async
- Document limitations of this targeted approach for specific tasks
- Add logging for better debugging of dispatcher selection
- Ensure decorator timing by keeping method attachment after function definitions

This completes the robust implementation for switching between dispatcher
implementations based on feature flags.

AAP-41775
…ag compatibility

Replaces direct method attribute assignment with a global registry for
alternative implementations. The original approach tried to attach new
methods directly to apply_async bound methods, which fails because bound
methods don't support attribute assignment in Python.

The registry pattern:
- Creates a global ALTERNATIVE_TASK_IMPLEMENTATIONS dict in publish.py
- Registers alternative implementations by task name
- Modifies apply_async to check the registry when feature flag is enabled
- Adds extensive logging throughout the process for debugging

This enables cluster_node_heartbeat to work correctly with both the legacy
and new dispatcher implementations based on the FEATURE_NEW_DISPATCHER flag.

AAP-41775
…entation

Reduces verbose debugging logs while maintaining essential logging for critical
operations. Preserves:
- Task implementation selection based on feature flag
- Registration success/failure messages
- Critical error reporting

Removed:
- Registry content debugging messages
- Repetitive task diagnostics
- Non-essential information logging

AAP-41775
@AlanCoding
Copy link
Member

AlanCoding commented Mar 29, 2025

AttributeError: 'float' object has no attribute 'total_seconds'

yeah, switched from using a timedelta to a float for seconds, and this is fallout from that.

EDIT: Oh now I know what the issue is. It was in-place modifications I had done

diff --git a/awx/settings/defaults.py b/awx/settings/defaults.py
index 43cbab180a..d2b242eadf 100644
--- a/awx/settings/defaults.py
+++ b/awx/settings/defaults.py
@@ -456,7 +456,7 @@ CELERYBEAT_SCHEDULE = {
 DISPATCHER_SCHEDULE = {}
 for options in CELERYBEAT_SCHEDULE.values():
     task_name = options['task']
-    DISPATCHER_SCHEDULE[task_name] = options
+    DISPATCHER_SCHEDULE[task_name] = options.copy()
     DISPATCHER_SCHEDULE[task_name]['schedule'] = options['schedule'].total_seconds()
 
 # Django Caching Configuration

Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the 1 modification for scheduled jobs, able to run the Demo Job Template and schedules run.

This resolves "AttributeError: 'float' object has no attribute 'total_seconds'"
errors when the dispatcher is restarted.

Refs: AAP-41775
@art-tapin
Copy link
Author

@AlanCoding I've added the .copy() method to create a proper copy in this PR, if you're okay with it

AttributeError: 'float' object has no attribute 'total_seconds'

yeah, switched from using a timedelta to a float for seconds, and this is fallout from that.

EDIT: Oh now I know what the issue is. It was in-place modifications I had done

diff --git a/awx/settings/defaults.py b/awx/settings/defaults.py
index 43cbab180a..d2b242eadf 100644
--- a/awx/settings/defaults.py
+++ b/awx/settings/defaults.py
@@ -456,7 +456,7 @@ CELERYBEAT_SCHEDULE = {
 DISPATCHER_SCHEDULE = {}
 for options in CELERYBEAT_SCHEDULE.values():
     task_name = options['task']
-    DISPATCHER_SCHEDULE[task_name] = options
+    DISPATCHER_SCHEDULE[task_name] = options.copy()
     DISPATCHER_SCHEDULE[task_name]['schedule'] = options['schedule'].total_seconds()
 
 # Django Caching Configuration

@art-tapin art-tapin marked this pull request as ready for review March 31, 2025 12:08
@art-tapin art-tapin merged commit 6ea9821 into ansible:feature_dispatcher Mar 31, 2025
11 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants