Skip to content

Commit a222429

Browse files
authored
Add logging documentation, including vipin-created logs (#199)
- *Category*: documentation - *JIRA issue*: [MIC-4757](https://jira.ihme.washington.edu/browse/MIC-4757) Changes and notes - Adds a logging page to the docs, describes all the logs in general, with sections for each "area" of logs.
1 parent e8e6b73 commit a222429

File tree

5 files changed

+51
-3
lines changed

5 files changed

+51
-3
lines changed

.zenodo.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,5 +39,5 @@
3939
}
4040
],
4141
"access_right": "open",
42-
"description": "Archival release of Vivarium Cluster Tools, a Python package that makes running ``vivarium`` simulations at scale on a Univa Grid Engine cluster easy."
42+
"description": "Archival release of Vivarium Cluster Tools, a Python package that makes running ``vivarium`` simulations at scale on a Slurm cluster easy."
4343
}

CHANGELOG.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
**1.5.1 - 12/15/23**
2+
3+
- Add logging documentation for psimulate
4+
15
**1.5.0 - 10/27/23**
26

37
- Remove default results directory for 'psimulate run'

README.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Vivarium Cluster Tools
1313
:alt: Documentation Status
1414

1515
Vivarium cluster tools is a python package that makes running ``vivarium``
16-
simulations at scale on a Univa Grid Engine cluster easy.
16+
simulations at scale on a Slurm cluster easy.
1717

1818
Installation
1919
------------

docs/source/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
Vivarium Cluster Tools Documentation
88
====================================
99
Vivarium cluster tools is a python package that makes running ``vivarium``
10-
simulations at scale on a Univa Grid Engine cluster easy.
10+
simulations at scale on a Slurm cluster easy.
1111

1212
.. toctree::
1313
:maxdepth: 2
@@ -16,5 +16,6 @@ simulations at scale on a Univa Grid Engine cluster easy.
1616
distributed_runner
1717
yaml_basics
1818
branch
19+
logging
1920
api_reference/index
2021
glossary

docs/source/logging.rst

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
.. toctree::
2+
:maxdepth: 2
3+
:caption: Contents:
4+
5+
Logging
6+
============
7+
8+
Sometimes, even with perfect code, things can go wrong at sufficient scale.
9+
When they do, it's useful to look to the logs to see what happened. ``psimulate``
10+
logs to the results directory, in a subdirectory called ``logs``. Inside that directory,
11+
there will be a directory for each simulation run or restart. If neither
12+
``psimulate restart`` nor ``psimulate expand`` was
13+
ever used for the run, there will be only one directory for the run.
14+
15+
Top-level logs
16+
----------------
17+
At the top-level of the directory, there will be text and JSON-formatted main log files.
18+
These are the log files for the runner process. There will also be a log file for each
19+
Redis database process, which will be named ``redis.p<port>.log``. Per-worker logs are
20+
in ``cluster_logs`` and ``worker_logs`` directories, described below.
21+
22+
Cluster logs
23+
-------------
24+
The ``cluster_logs`` directory contains logs from the the array job processes. Each worker job
25+
has its own file. The contents of these are similar to what you will find in the ``worker_logs``
26+
directory, but a superset. The logs in the ``cluster_logs`` directory contain information about Redis
27+
heartbeats and other cluster-related information.
28+
29+
Worker logs
30+
-------------
31+
The ``worker_logs`` directory contains logs from the the worker processes as they relate
32+
running simulations. Additionally this directory contains performance logs that
33+
are described in the next section.
34+
35+
Performance logs
36+
-----------------
37+
As part of the VIPIN (VIvarium Performance INformation) feature, ``psimulate`` gathers
38+
per-worker performance information. This information is summarized at the end of the parallel
39+
runs and stored in the ``worker_logs`` directory as ``log_summary.csv``. This file
40+
contains metadata identifying the run and the worker host, execution timing information, and CPU,
41+
disk, and network performance counters. The intent of this logging is to allow users to understand the
42+
performance characteristics of their simulations and in the event of suspicious performance,
43+
to be able to correlate outlier performance characteristics to cluster and hardware events.

0 commit comments

Comments
 (0)