Skip to content

Not all logs are captured correctly #150

@abought

Description

@abought

Summary

We have noticed that key system logs are not all being captured in cloudgene.log. Some information, such as system startup, malformed URLs, and some application exceptions, is sent to stdout/the console instead. Depending on how this is written, this info could appear in bootstrap script logs, nohup.out, or systemd.

In some cases, this stdout log info is the only place where this info can be found.

Impact

This makes it more difficult to:

  • Detect system bugs (logging info not captured in cloudgene.log)
  • Diagnose malicious activity in the system that leads to elevated 500 errors, including fuzzing attempts
  • Diagnose general user support incidents, like job failure reasons not written to any other system log
  • Correlate evidence of above with other logging info, requiring much manual alignment of logs

Examples

Processing file '/file/path/REDACTED'...

Processed 123 variants
1 variants failed to liftover
0 variants removed (indels on negative strand)
0 variants removed (different chromosome)
122 variants lifted over

[tabix] was bgzip used to compress this file?

java.lang.NullPointerException
at cloudgene.mapred.api.v2.users.LoginUser
(stacktrace redacted)

[redacted] ERROR org.restlet.WebServer.WebApp - Exception or error caught in server resource
org.restlet.resource.ResourceException: Internal Server Error (500) - The server encountered an unexpected condition which prevented it from fulfilling the request

java.lang.IllegalArgumentException: Writing VCF version VCF4_3 is not implemented

java.io.IOException: /mnt/jobs/job-20240702-234717-385/input/files/COL-chr-1.vcf.gz: Line 24862: No GT field found in FORMAT column.

Caused by: java.net.URISyntaxException: Illegal character in path at index 36: file:////home/hadoop/webapp/;;https://interact.sh%22/onmouseover=%22alert(1)%5Cu0007example%5Cu001B]8;;%5Cu0007

Remediations

  • Ensure that a default logger is configured correctly and verify that logs are clean following the fix
  • Add the relevant "other" log location to any configured log capture/export agents (splunk, cloudwatch, etc)
  • I mean, honestly, look into some of these error messages

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions