Skip to content

fix: friendly logging #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

tristan-f-r
Copy link
Collaborator

@tristan-f-r tristan-f-r commented May 22, 2025

This adds:

  • context for errors when fetching dockerd (without this, the error is some arbitrary 401 unauthorized error with no context, as the stacktrace was originally hidden.)
  • indentation on stdout and stderr for docker and singularity errors
  • proper formatting of docker and singularity errors instead of the usual one-line concatenated string that is usually printed.

Closes #51. In regards to that issue, it does not approach checking if the daemon has started 'optimally' (rather, it only tries to run docker and throws a friendly error if it can't run a container at all, nor does it try to do the docker detection at the start of the program.)

This adds a note for context (what errors are coming from wrapping the docker daemon, what are coming from actually running the container),
and prints the stacktrace instead of just the error message.
@tristan-f-r tristan-f-r changed the title chore: format docker errors fix: format docker errors May 22, 2025
@tristan-f-r tristan-f-r changed the title fix: format docker errors fix: friendly logging May 26, 2025
@tristan-f-r
Copy link
Collaborator Author

Note that before this PR, errors were usually on one line like so (in reference to #78):

E           docker.errors.ContainerError: Command '['OmicsIntegrator', '-e', 'test/OmicsIntegrator2/input/oi2-edges.txt', '-p', 'test/OmicsIntegrator2/input/oi2-prizes.txt', '-o', 'test/OmicsIntegrator2/output', '--filename', 'oi2', '-w', '5', '-b', '1', '-g', '3', '-noise', '0.1', '--noisy_edges', '0', '--random_terminals', '0', '--dummyMode', 'terminals', '--seed', '2']' in image 'reedcompbio/omics-integrator-2' returned non-zero exit status 1: b'04:07:36 - OI2: INFO - Duplicated gene symbols in the prize file (we\'ll keep the max prize):\n04:07:36 - OI2: INFO - []\n04:07:36 - OI2: INFO - Members of the prize file not present in the interactome:\n04:07:36 - OI2: INFO - []\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn\n    (self._dns_host, self.port), self.timeout, **extra_kw\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection\n    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):\n  File "/opt/conda/envs/oi2/lib/python3.6/socket.py", line 745, in getaddrinfo\n    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):\nsocket.gaierror: [Errno -2] Name or service not known\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen\n    chunked=chunked,\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 382, in _make_request\n    self._validate_conn(conn)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn\n    conn.connect()\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connection.py", line 353, in connect\n    conn = self._new_conn()\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn\n    self, "Failed to establish a new connection: %s" % e\nurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7ffb6fbc3ba8>: Failed to establish a new connection: [Errno -2] Name or service not known\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/adapters.py", line 449, in send\n    timeout=timeout\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen\n    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=\'code.jquery.com\', port=443): Max retries exceeded with url: /jquery-3.2.1.slim.min.js (Caused by NewConnectionError(\'<urllib3.connection.HTTPSConnection object at 0x7ffb6fbc3ba8>: Failed to establish a new connection: [Errno -2] Name or service not known\',))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/bin/OmicsIntegrator", line 8, in <module>\n    sys.exit(main())\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/OmicsIntegrator/__main__.py", line 83, in main\n    oi.output_networkx_graph_as_interactive_html(augmented_forest, args.output_dir, args.filename+\'.html\')\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/OmicsIntegrator/graph.py", line 889, in output_networkx_graph_as_interactive_html\n    filename=filename)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/axial/axial.py", line 450, in graph\n    scripts_block = _scripts_block(scripts, scripts_mode, output_dir)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/axial/axial.py", line 81, in _scripts_block\n    script_block = \'\\n\'.join([f"<script>{requests.get(script).text}</script>" for script in scripts])\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/axial/axial.py", line 81, in <listcomp>\n    script_block = \'\\n\'.join([f"<script>{requests.get(script).text}</script>" for script in scripts])\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/api.py", line 76, in get\n    return request(\'get\', url, params=params, **kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/api.py", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/sessions.py", line 542, in request\n    resp = self.send(prep, **send_kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/sessions.py", line 655, in send\n    r = adapter.send(request, **kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/adapters.py", line 516, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPSConnectionPool(host=\'code.jquery.com\', port=443): Max retries exceeded with url: /jquery-3.2.1.slim.min.js (Caused by NewConnectionError(\'<urllib3.connection.HTTPSConnection object at 0x7ffb6fbc3ba8>: Failed to establish a new connection: [Errno -2] Name or service not known\',))\nERROR conda.cli.main_run:execute(33): Subprocess for \'conda run [\'OmicsIntegrator\', \'-e\', \'test/OmicsIntegrator2/input/oi2-edges.txt\', \'-p\', \'test/OmicsIntegrator2/input/oi2-prizes.txt\', \'-o\', \'test/OmicsIntegrator2/output\', \'--filename\', \'oi2\', \'-w\', \'5\', \'-b\', \'1\', \'-g\', \'3\', \'-noise\', \'0.1\', \'--noisy_edges\', \'0\', \'--random_terminals\', \'0\', \'--dummyMode\', \'terminals\', \'--seed\', \'2\']\' command failed.  (See above for error)\n'

Now, we have slightly formatted errors:

Running DOMINO on framework "docker" with command: domino --active_genes_files /spras/QTS52HB/active_genes.txt --network_file /spras/KIFIAJG/network.txt --slices_file /spras/DHQFXEU/slices.txt --output_folder /spras/OGZE5QT/data1-domino-params-V3X4RW7 --use_cache false --parallelization 1 --visualization true --slice_threshold 0.3 --module_threshold 0.05
(Command formatted as list: `['domino', '--active_genes_files', '/spras/QTS52HB/active_genes.txt', '--network_file', '/spras/KIFIAJG/network.txt', '--slices_file', '/spras/DHQFXEU/slices.txt', '--output_folder', '/spras/OGZE5QT/data1-domino-params-V3X4RW7', '--use_cache', 'false', '--parallelization', '1', '--visualization', 'true', '--slice_threshold', '0.3', '--module_threshold', '0.05']`)
An unexpected non-zero exit status (1) inside the docker image docker.io/reedcompbio/domino occurred:
    Traceback (most recent call last):
      File "/usr/local/bin/domino", line 8, in <module>
        sys.exit(main_domino())
      File "/usr/local/lib/python3.7/site-packages/src/runner.py", line 36, in main_domino
        G_final_modules=domino_main(active_genes_file=cur_ag, network_file=network_file, slices_file=slices_file, slice_threshold=slice_threshold, module_threshold=module_threshold)
      File "/usr/local/lib/python3.7/site-packages/src/core/domino.py", line 349, in main
        ".")[0] + ".pkl"))
      File "/usr/local/lib/python3.7/site-packages/src/core/domino.py", line 83, in prune_network_by_modularity
        G_modularity = nx.algorithms.operators.union_all(G_modules)
      File "/usr/local/lib/python3.7/site-packages/networkx/algorithms/operators/all.py", line 61, in union_all
        raise ValueError('cannot apply union_all to an empty list')
    ValueError: cannot apply union_all to an empty list

@ntalluri
Copy link
Collaborator

ntalluri commented May 27, 2025

Note that before this PR, errors were usually on one line like so (in reference to #78):

E           docker.errors.ContainerError: Command '['OmicsIntegrator', '-e', 'test/OmicsIntegrator2/input/oi2-edges.txt', '-p', 'test/OmicsIntegrator2/input/oi2-prizes.txt', '-o', 'test/OmicsIntegrator2/output', '--filename', 'oi2', '-w', '5', '-b', '1', '-g', '3', '-noise', '0.1', '--noisy_edges', '0', '--random_terminals', '0', '--dummyMode', 'terminals', '--seed', '2']' in image 'reedcompbio/omics-integrator-2' returned non-zero exit status 1: b'04:07:36 - OI2: INFO - Duplicated gene symbols in the prize file (we\'ll keep the max prize):\n04:07:36 - OI2: INFO - []\n04:07:36 - OI2: INFO - Members of the prize file not present in the interactome:\n04:07:36 - OI2: INFO - []\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn\n    (self._dns_host, self.port), self.timeout, **extra_kw\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection\n    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):\n  File "/opt/conda/envs/oi2/lib/python3.6/socket.py", line 745, in getaddrinfo\n    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):\nsocket.gaierror: [Errno -2] Name or service not known\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen\n    chunked=chunked,\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 382, in _make_request\n    self._validate_conn(conn)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn\n    conn.connect()\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connection.py", line 353, in connect\n    conn = self._new_conn()\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn\n    self, "Failed to establish a new connection: %s" % e\nurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7ffb6fbc3ba8>: Failed to establish a new connection: [Errno -2] Name or service not known\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/adapters.py", line 449, in send\n    timeout=timeout\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen\n    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=\'code.jquery.com\', port=443): Max retries exceeded with url: /jquery-3.2.1.slim.min.js (Caused by NewConnectionError(\'<urllib3.connection.HTTPSConnection object at 0x7ffb6fbc3ba8>: Failed to establish a new connection: [Errno -2] Name or service not known\',))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/conda/envs/oi2/bin/OmicsIntegrator", line 8, in <module>\n    sys.exit(main())\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/OmicsIntegrator/__main__.py", line 83, in main\n    oi.output_networkx_graph_as_interactive_html(augmented_forest, args.output_dir, args.filename+\'.html\')\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/OmicsIntegrator/graph.py", line 889, in output_networkx_graph_as_interactive_html\n    filename=filename)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/axial/axial.py", line 450, in graph\n    scripts_block = _scripts_block(scripts, scripts_mode, output_dir)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/axial/axial.py", line 81, in _scripts_block\n    script_block = \'\\n\'.join([f"<script>{requests.get(script).text}</script>" for script in scripts])\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/axial/axial.py", line 81, in <listcomp>\n    script_block = \'\\n\'.join([f"<script>{requests.get(script).text}</script>" for script in scripts])\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/api.py", line 76, in get\n    return request(\'get\', url, params=params, **kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/api.py", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/sessions.py", line 542, in request\n    resp = self.send(prep, **send_kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/sessions.py", line 655, in send\n    r = adapter.send(request, **kwargs)\n  File "/opt/conda/envs/oi2/lib/python3.6/site-packages/requests/adapters.py", line 516, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPSConnectionPool(host=\'code.jquery.com\', port=443): Max retries exceeded with url: /jquery-3.2.1.slim.min.js (Caused by NewConnectionError(\'<urllib3.connection.HTTPSConnection object at 0x7ffb6fbc3ba8>: Failed to establish a new connection: [Errno -2] Name or service not known\',))\nERROR conda.cli.main_run:execute(33): Subprocess for \'conda run [\'OmicsIntegrator\', \'-e\', \'test/OmicsIntegrator2/input/oi2-edges.txt\', \'-p\', \'test/OmicsIntegrator2/input/oi2-prizes.txt\', \'-o\', \'test/OmicsIntegrator2/output\', \'--filename\', \'oi2\', \'-w\', \'5\', \'-b\', \'1\', \'-g\', \'3\', \'-noise\', \'0.1\', \'--noisy_edges\', \'0\', \'--random_terminals\', \'0\', \'--dummyMode\', \'terminals\', \'--seed\', \'2\']\' command failed.  (See above for error)\n'

Now, we have slightly formatted errors:

Running DOMINO on framework "docker" with command: domino --active_genes_files /spras/QTS52HB/active_genes.txt --network_file /spras/KIFIAJG/network.txt --slices_file /spras/DHQFXEU/slices.txt --output_folder /spras/OGZE5QT/data1-domino-params-V3X4RW7 --use_cache false --parallelization 1 --visualization true --slice_threshold 0.3 --module_threshold 0.05
(Command formatted as list: `['domino', '--active_genes_files', '/spras/QTS52HB/active_genes.txt', '--network_file', '/spras/KIFIAJG/network.txt', '--slices_file', '/spras/DHQFXEU/slices.txt', '--output_folder', '/spras/OGZE5QT/data1-domino-params-V3X4RW7', '--use_cache', 'false', '--parallelization', '1', '--visualization', 'true', '--slice_threshold', '0.3', '--module_threshold', '0.05']`)
An unexpected non-zero exit status (1) inside the docker image docker.io/reedcompbio/domino occurred:
    Traceback (most recent call last):
      File "/usr/local/bin/domino", line 8, in <module>
        sys.exit(main_domino())
      File "/usr/local/lib/python3.7/site-packages/src/runner.py", line 36, in main_domino
        G_final_modules=domino_main(active_genes_file=cur_ag, network_file=network_file, slices_file=slices_file, slice_threshold=slice_threshold, module_threshold=module_threshold)
      File "/usr/local/lib/python3.7/site-packages/src/core/domino.py", line 349, in main
        ".")[0] + ".pkl"))
      File "/usr/local/lib/python3.7/site-packages/src/core/domino.py", line 83, in prune_network_by_modularity
        G_modularity = nx.algorithms.operators.union_all(G_modules)
      File "/usr/local/lib/python3.7/site-packages/networkx/algorithms/operators/all.py", line 61, in union_all
        raise ValueError('cannot apply union_all to an empty list')
    ValueError: cannot apply union_all to an empty list

Does this also make all the outputs (not just errors) from the docker containers look pretty? (It's fine if it doesn't)

@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented May 27, 2025

It does 👍

@tristan-f-r
Copy link
Collaborator Author

This broke all of the singularity test cases - I'll fix those now.

stderr comes in buffer for docker, but not singularity? hm
@tristan-f-r
Copy link
Collaborator Author

Hm, it seems that these changes made some of the errors from Singularity that were usually just caught and printed bleed to the main python context - I'll have to install Singularity locally to properly fix these cases.

formatted singularity errors!
@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented May 27, 2025

Took a good while to get singularity installed, but that latest latest latest commit works locally 👍

Also, the latest commit does pretty print singularity output as well!

@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented May 27, 2025

(CI failed but this time because of (now fixed) style issues! All tests pass 👍 )

except docker.errors.ContainerError as err:
print(f"(Command formatted as list: `{err.command}`)")
print(f"An unexpected non-zero exit status ({err.exit_status}) inside the docker image {err.image} occurred:")
err = str(err.stderr if err.stderr is not None else "", "utf-8")
Copy link
Collaborator Author

@tristan-f-r tristan-f-r Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary piece about this PR, as mentioned in slack, is that these err: str strings are constructed per container runtime, and can be easily handled. In another PR, we can have these errors be filtered for messages that we know are thrown by containers in certain situations, and we can handle them appropriately, while sufficiently raising exceptions of unknown errors to stop the Snakemake runtime and report those issues to SPRAS as a bug.

The current way that errors are handled have caused Aden and I a good amount of trouble when wrapping algorithms, since it's quite hard to tell that the output even is an error, and, if the algorithm that's being wrapped happens to touch the output file, it can sometimes make tests pass when they should fail. Even here, for example - if I were to raise this error instead of just printing it, it would break some tests, which is mainly why I wanted to break this into two separate PRs. 👍

@jhiemstrawisc jhiemstrawisc self-requested a review June 5, 2025 21:55
Copy link
Collaborator

@jhiemstrawisc jhiemstrawisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. You're free to take them or leave them!

Otherwise, looks good to me.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and questions. I have not yet tested it locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure misc. changes made to SPRAS itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test Docker is running at startup
4 participants