Skip to content

Google Logging API Limit not reasonable for Snakemake  #14

@vsoch

Description

@vsoch

Problem: we need to be able to stream logs for the job to a .snakemake logs file. Currently, we get job statuses and print to the console, but for use cases where it's entirely in Python (e.g., a test) and for provenance / keeping of this metadata, we need the file written.

Google has a logging API, but the API limits make it unreasonable to use for this case. The Logging API has an extremely limited API limit, 60/minute, where each request returns one line of a log that can be many thousands of lines (for reference, a hello world test workflow is over 3K).

Number of entries.list requests 60 per minute, per Google Cloud project2, 3

This means that, in practice, anything > 1 job (e.g., snakemake running multiple steps) or that does not sufficiently sleep will hit the limit very quickly (I did in a hello world test run). Because of this limit and the needs for snakemake, this is currently not reasonable to add. We are adding a helper script in #13 that can (with a sleep) stream logs (very slowly) for a job, but ideally we can add this to the core of the executor to stream all logs for all steps.

I'm seeing that we are able to create "sinks" https://cloud.google.com/logging/docs/routing/overview#sinks and I am worried the behavior above is because Google wants us to send them to say, pub/sub and then retrieve more reasonably (but also pay more for the extra service). If this is the only option we won't have a choice, but it adds complexity to the executor in requiring use (and activation) of another API, and introduces another source of billing. We will ping Google for advice on options we have here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions