Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support to fetch and match worker logs #183

Merged
merged 1 commit into from
Apr 4, 2025

Conversation

leongdl
Copy link
Contributor

@leongdl leongdl commented Apr 3, 2025

What was the problem/requirement? (What/Why)

  • We are adding some new features related to worker features. Currently the test fixtures only have features related to Job logs. I am extending the test fixtures to worker logs.

What was the solution? (How)

  • Add new fixtures and helpers to get worker log group, log stream.
  • Small refactor to log class to re-use existing log search logic for test verification.
  • Allows tests to use assert_pattern_in_log and assert_pattern_not_in_log.

What is the impact of this change?

  • We will have great test coverage!

How was this change tested?

  1. hatch build
  2. hatch run fmt
  3. hatch run all:lint
  4. Used the new worker log functions in a new E2E test case for worker agent.

Was this change documented?

Not Applicable.

Is this a breaking change?

Not Applicable.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@leongdl leongdl force-pushed the workerlog branch 2 times, most recently from ed174a7 to aee1f33 Compare April 3, 2025 21:09
@leongdl leongdl marked this pull request as ready for review April 3, 2025 21:10
@leongdl leongdl requested a review from a team as a code owner April 3, 2025 21:10
@@ -252,6 +264,46 @@ def set_stopped_status(self):
LOG.exception(f"Failed to update worker status: {error}")
raise

def get_worker_logs(self) -> Optional[WorkerLogConfig]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this a little more clear, should we change this method to _get_worker_log_config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that.

I was thinking if other tests may want to get the WorkerLogConfig. Both works.

# Default, no log group yet.
return None

def get_all_worker_logs(self, *, logs_client: botocore.client.BaseClient) -> WorkerLog:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just get_logs? Would be more consistent with our Job implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, easy

@leongdl leongdl force-pushed the workerlog branch 2 times, most recently from 6b44091 to 2e47d5b Compare April 3, 2025 22:05
ttblanchard
ttblanchard previously approved these changes Apr 4, 2025
@@ -252,6 +264,46 @@ def set_stopped_status(self):
LOG.exception(f"Failed to update worker status: {error}")
raise

def _get_worker_logs(self) -> Optional[WorkerLogConfig]:
"""Get the log group and log stream for the worker. Retain the API structure"""
response = self.deadline_client.get_worker(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: type

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: do we have a data class for GetWorker response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have typing in this library for boto classes :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data class comes from the design and structure of this library. There's no use of boto3 generated types

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we type it as Dict[str, Any]?

fleetId=self.configuration.fleet.id,
workerId=self.worker_id,
)
if log_config := response["log"]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If log isn't a key in response, wouldn't this line throw an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, I'll change it to .get()


def get_logs(self, *, logs_client: botocore.client.BaseClient) -> WorkerLog:
# Get the worker log group and stream from the service.
log_config = self._get_worker_logs()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

logStreamNames=[log_config.cloudwatch_log_stream],
),
)
log_events = filter_log_events_pages.build_full_result()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no typing for boto in this package.

)
log_events = filter_log_events_pages.build_full_result()
log_events = [CloudWatchLogEvent.from_api_response(e) for e in log_events["events"]]
# For debugging test cases.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this comment? Doesn't seem to be adding any value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, otherwise everyone wants to remove LOG.info :)

@@ -252,6 +264,46 @@ def set_stopped_status(self):
LOG.exception(f"Failed to update worker status: {error}")
raise

def _get_worker_logs(self) -> Optional[WorkerLogConfig]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - what about get_worker_log_config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrrg, I had it as the a non private name before but went for _ so no :) Not changing it

Comment on lines +28 to +29
if TYPE_CHECKING:
from botocore.paginate import PageIterator, Paginator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious - why only import those during type checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in the rest of the library too, there's some differences in versions of python I think.

workerId=self.worker_id,
)
if log_config := response["log"]:
LOG.info(f"LogGroup structure {log_config}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - you meant LogConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

return WorkerLogConfig(
cloudwatch_log_group=log_group_name, cloudwatch_log_stream=log_stream_name
)
# Default, no log group yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - you meant log config? log group should always exist as it's at fleet level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

if not log_config:
return WorkerLog(worker_id=self.worker_id, logs=[]) # type: ignore[arg-type]

filter_log_events_paginator: Paginator = logs_client.get_paginator("filter_log_events")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure caller has the logs:FilterLogEvents permission to perform this operation. Local test and automated tests use different infrastructure setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same in the rest of the library and we don't check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduce FilterLogEvents call to fleet logs while the prior code filter on session logs no? The permission for CWL is configured at log group level, that's why I suggested checking.

Took a quick look seems we have have this permission on all deadline log groups log-group:/aws/deadline/* , should be good then.

godobyte
godobyte previously approved these changes Apr 4, 2025
@leongdl leongdl enabled auto-merge (squash) April 4, 2025 18:27
Copy link

sonarqubecloud bot commented Apr 4, 2025

@leongdl leongdl merged commit b920331 into aws-deadline:mainline Apr 4, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants