Skip to content

Conversation

@franciscovalentecastro
Copy link
Contributor

@franciscovalentecastro franciscovalentecastro commented Jan 19, 2026

Update hasMatchingLog to return all found logs and create QueryAllLogs.

Context :

@jefferbrecht
Copy link
Member

Why not just leave QueryLog alone and add a new QueryLogs that returns a slice of matching entries? As long as we're iterating all of the returned logs we may as well pass them on to the caller, who can just call len(...) to get the number of matches. That would also have the benefit of not breaking the API requiring us to change all of the call sites.

@franciscovalentecastro franciscovalentecastro changed the title [gce_testing] Return the number of "found" logs in hasMatchingLog and QueryLog. [gce_testing] Update hasMatchingLog to return all found logs and create QueryAllLogs. Jan 19, 2026
@franciscovalentecastro
Copy link
Contributor Author

Why not just leave QueryLog alone and add a new QueryLogs that returns a slice of matching entries? As long as we're iterating all of the returned logs we may as well pass them on to the caller, who can just call len(...) to get the number of matches. That would also have the benefit of not breaking the API requiring us to change all of the call sites.

Thank you for the comment @jefferbrecht ! I was going back and forth between the possibilities of how to expose the "found log count" while minimizing the API updates. The last update returns all found logs in hasMatchingLog and creates QueryAllLogs to be able to return all the found logs.

}
matchingLogs = append(matchingLogs, entry)
}
if found && len(matchingLogs) == 0 {
Copy link
Member

@jefferbrecht jefferbrecht Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: append here is guaranteed to result in a non-empty slice, and we set found at the same time as append, so we no longer need the found variable -- just test on len(matchingLogs) > 0. We can also eliminate the returned bool: if the query turns up empty then we'd signal "logs not found" by returning an empty slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Renamed hasMatchingLogs to findMatchingLogs to represent more accurately the new implementation.

@@ -666,10 +667,36 @@
// found after some retries.
func QueryLog(ctx context.Context, logger *log.Logger, vm *VM, logNameRegex string, window time.Duration, query string, maxAttempts int) (*cloudlogging.Entry, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to implement QueryLog by just calling QueryAllLogs internally and returning the first element of the slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// over the trailing time interval specified by the given window.
// Returns all the log entries found, or an error if the log could not be
// found after some retries.
func QueryAllLogs(ctx context.Context, logger *log.Logger, vm *VM, logNameRegex string, window time.Duration, query string, maxAttempts int) ([]*cloudlogging.Entry, error) {
Copy link
Member

@jefferbrecht jefferbrecht Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryAllLogs arguably does not need the "retry until at least one log is found" logic (although the "retry on retriable failure" logic should stay). IMO that's only there for "get me the first matching log ASAP" (which is just QueryLog), whereas here the caller is more likely to be looking for "get me all matching logs from the given time window" (after they've waited an appropriate amount of time for the eventual consistency of the backend).

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how to remove "retry until at least one log is found" from QueryAllLogs without having some downsides.

@jefferbrecht Could you add more details in the expected solution here ?

IMO that's only there for "get me the first matching log ASAP"

I think it could also be thought as "return query result if you find any data". It has the additional value of shorter tests.

QueryAllLogs arguably does not need the "retry until at least one log is found" logic (although the "retry on retriable failure" logic should stay).

Yes, "get me all matching logs from the given time window" maybe the most likely use case of QueryAllLogs, but i don't see an alternative way of "stopping the lookup" that is more helpful for testing.

Alternatives :

  • [Current] Stops querying the backend if any data is found (len(matchingLogs) > 0).
    • Shorter tests.
    • Caller is expecting to see "all found data".
  • Stop when err == nil
    • Shorter tests. May result in "no data found".
    • No consideration for race conditions when log may take longer to be ingested.
  • Create an expectedLogCount and stop when len(matchingLogs) >= expectedLogCount.
    • Same as the previous one, but more restrictive.
  • Remove the "stop after data is found" will result in the always querying "maxAttempts".
    • The tests is long (maxAttempt * backoffTime) and if the time window is small (e.g. 1 minute), the last query could return less data after backoff retries.

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, i still think stopping the query at len(matchingLogs) > 0 is the best option since its the most aligned with how the other QueryLog and WaitForLog functions have the expectation of finding data (at least one log).

To expectzero logs we can use AssertLogMissing.

Copy link
Member

@jefferbrecht jefferbrecht Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "stop as soon as we find one log" optimization is something you can only do when the caller is interested in finding at least one log. You cannot do that optimization in all other scenarios.

Consider the use cases we have today:

  1. I wrote exactly one log to the Ops Agent, and I expect it to show up in the backend eventually, and I don't care whether it is ingested multiple times -- most of our tests do this
  2. I wrote exactly one log to the Ops Agent, and I expect it to show up in the backend eventually, and I do care whether it is ingested multiple times -- this is the test we're talking about here
  3. I wrote exactly one log to the Ops Agent, and I've configured the Ops Agent to drop that log, so I expect that log to never show up in the backend -- this would be the exclude_logs tests

For (1), you are less bound by eventual consistency because you don't care what happens after you find the log you sent, so you're able to aggressively optimize the check by querying many times in quick succession until you find it.

For (2), it would be an error to stop at the first log because you can't assume that all logs will appear in the backend at the same time. It is possible -- even moreso for logs sent in different batches like we're doing -- that the second log could appear much later than the first. So the caller is required to wait an appropriate amount of time first for eventual consistency and then query for all of the logs. And at that point there's zero value in looping until you find a log: the assumption is that the backend is now fully consistent, because we already waited for it, so the log is either there or it isn't. If the query still returns zero logs after waiting for eventual consistency then I want the test to fail, because that means I need to fix my test (or the Ops Agent is actually broken).

(3) is simpler relative of (2): you need to wait for eventual consistency and then check that there are no logs. So you could reimplement AssertLogMissing in terms of QueryAllLogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation! I see now the value of returning the successful query data in (2) and (3). Also, this way (3) and (2) are very similar indeed.

Result : I implemented the following :

  • (1) : Implemented of QueryLog that returns the first found log.
  • (2) : Implemented QueryAllLogs that returns the first successful query and does retries only for "retriable" errors.
  • (3) : Implemented AssertLogMissing using QueryAllLogs.

@franciscovalentecastro franciscovalentecastro changed the title [gce_testing] Update hasMatchingLog to return all found logs and create QueryAllLogs. [gce_testing] Create findMatchingLogs to return all found logs and create QueryAllLogs. Jan 19, 2026
@franciscovalentecastro franciscovalentecastro force-pushed the fcovalente-hasmatchinglog-count branch from 32b459e to fb6c69a Compare January 26, 2026 17:24
@franciscovalentecastro franciscovalentecastro force-pushed the fcovalente-hasmatchinglog-count branch from 695408d to 4201281 Compare January 27, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants