The Health Monitor (HMON) interface to the Dyalog interpreter allows a client application or tool to connect and monitor its activity, memory usage etc., to gauge its "state of health". The communications protocol between the interpreter and client is described in this document.
THE HEALTH MONITOR INTERFACE IS UNDER CONTINUING DEVELOPMENT AND IS INCLUDED IN DYALOG 19.0 AND 20.0 AS AN EXPERIMENTAL FEATURE. FURTHER DEVELOPMENT IS PLANNED FOR SUBSEQUENT RELEASES AND THIS MAY RESULT IN CHANGES TO THE INTERFACE WHICH ARE INCOMPATIBLE WITH THE SPECIFICATION DESCRIBED HERE.
Note
Features which differ between different versions of Dyalog are highlighted like this.
The interpreter to be monitored may either listen for incoming client
connections or initiate a connection to a client. It will not do either unless
explicitly configured to do so using either the HMON_INIT config setting
(prior to startup) or 112⌶ (from within the interpreter).
The connection mode, address and port on which the interpreter should listen or connect (the Interface Configuration) is determined by a character sequence of the form:
mode:address:port
where:
- mode is the connection mode - either SERVE (listen for incoming connections) or POLL (initiate outgoing connection).
- address is the address of an interface on the local machine on which the the interpreter should listen (SERVE mode) or the address of a machine running a client to which the interpreter should connect (POLL mode).
- port is the TCP port to listen on or connect to.
In SERVE mode address may be specified as follows:
<empty>- listen on all loopback interfaces, that is, the interpreter only accepts connection from the local machine.*– listen on all local machine interfaces, that is, the interpreter listens for connections from any (local or remote) machine/interface.- The host/DNS name of the machine/interface running the interpreter – listen on that specific interface on the local machine.
- The IPv4 address of the machine/interface running the interpreter – listen on that specific interface on the local machine.
- The IPv6 address of the machine/interface running the interpreter – listen on that specific interface on the local machine.
In POLL mode address may be specified as follows:
<empty>– connect to a client on the the local machine.- The host/DNS name of the machine/interface running the client.
- The IPv4 address of the machine/interface running the client.
- The IPv6 address of the machine/interface running the client.
For example:
HMON_INIT="SERVE:localhost:4512"
The level of information made available by the interpreter is determined by an Access Level:
- 0 - Disallow connections.
- 1 - Permit connections, with restricted information provided and restricted permissions once connected.
- 2 - Permit connections, with full information provided but restricted permissions once connected.
- 3 - Permit connectons, with full information provided and full permissions once connected.
Information which is not available when restricted includes anything which exposes the code of the aplication being run, such as the SI stack. Information such as memory usage, number of running threads etc. is not restricted is always available once a connection is established.
Permissions which are not available when restricted include anything which can control (rather than observe) the interpreter, such establishing a connection to RIDE.
The Access Level defaults to 1 with runtime interpreters and 2 with development interpreters. Access level 3 is never enabled by default.
Some information which the interpreter provides requires it to perform additional work (which slows it down) even when a connected client is not requesting it. Whether it does this or not is controlled by setting an Event Gathering Level.
- 0 - Do not gather information for
GetLastKnownStaterequests. - 1 - Gather information for
GetLastKnownStaterequests - has a runtime performance impact.
The Event Gathering Level defaults to 0.
The Access Level and Event Gathering Level may be altered from within the
interpreter using 112⌶. Alternatively, 112⌶ may be used to
set the interface configuration and the Levels without setting HMON_INIT prior
to starting the interpreter, or to override any existing HMON_INIT setting.
A message starts with a 4-byte big-endian total length field, followed by
the four ASCII-encoded characters HMON and a UTF-8-encoded payload:
Total length "HMON" Payload
┌───────────────────┬───────────────────┬─────~─────┐
│0x00 0x00 0x00 0x0B│0x48 0x4D 0x4F 0x4E│ ... │
└───────────────────┴───────────────────┴─────~─────┘
Total length is therefore 8 + the byte length of the payload.
The payload is almost always a 2-element JSON array consisting of a Message Name and arguments as key/value pairs:
["MessageName",{"key1":"value1","key2":222,"key3":[3,4,5]}]The only exception are the first two messages that each side sends upon establishing a connection. These constitute the handshake and are not JSON-encoded. Their payloads are:
SupportedProtocols=2
UsingProtocol=2
Messages sent by the client to the interpreter are request messages. Messages sent by the interpreter to the client are response messsages.
The interpreter will usually respond to a request message by sending a response containing either the information requested, confirmation of receipt, or an error report (indicating e.g. invalid syntax or invalid message type etc.) It may also send response messages at any time - for example, to inform of a particular event or to provide regular updates on its condition, or if the application that the interpreter is running initiates them.
The interpreter will process most request messages it receives when it is
between execution of APL code or otherwise in a position where it can safely
access its workspace, and may therefore not react immediately. The special
request message GetLastKnownState is designed to remain
active and provide an immediate response in the event that the the interpreter
is unable to respond at all to the others.
Messages must be syntactically valid JSON text and of the 2-element array form described above.
Named items in the payload are described in the relevant documentation for each message type. The interpreter will ignore any unexpected named item (so long as it is described using syntactically valid JSON). Except as noted, request messages may include the named item "UID" (a string value) and if present the response will echo the same UID value. UID strings may be used by a client as it wishes - for example, to track requests and responses; the interpreter does not require them.
Requests zero or more "facts" about the application state, and will be
responded to with a Facts message.
["GetFacts",{"Facts":["Host","Workspace"]}]The following facts may be requested:
| Fact ID | Fact name |
|---|---|
| 1 | "Host" |
| 2 | "AccountInformation" |
| 3 | "Workspace" |
| 4 | "Threads" |
| 5 | "SuspendedThreads" |
| 6 | "ThreadCount" |
See the Facts response message for the information provided.
Fact may be requested using either their numeric Fact ID or their alphanumeric
(string) Fact name, so the following GetFacts requests are equivalent to the
one above:
["GetFacts",{"Facts":[1,3]}]["GetFacts",{"Facts":["Host",3]}]PollFacts behaves in the same way as GetFacts except that it
polls - that is, the Facts message response will be sent
immediately and then repeat after specified or implied intervals.
The interval defaults to 1000ms but any value of 500ms or more may be specified. Values less than 500 will be taken as 500.
Example:
["PollFacts",{"Facts":[1,"Workspace"],"Interval":750}]If a UID is specified it will appear in every message that is sent.
Messages will continue at the requested frequency until either a new request
is made (which will supersede any already established) or a
StopFacts message is sent.
Note: polling responses may occasionally stop when the interpreter is
waiting on an external event such as a file operation, ⎕NA call, etc. It is
currently also a limitation that polling messages may also stop when the
interpreter is inactive - that is, when it is not running APL code or
responding to external input such as HMON requests or keyboard events.
StopFacts cancels PollFacts.
A UID may not be included in the message.
Example:
["StopFacts",{}]A Facts message will be sent back, with an empty list of facts and
Interval set to 0.
A BumpFacts message will cause a polling Facts message to be sent,
regardless of the time remaining until the next message is due. Messages will
then continue at the normal frequency.
A UID may not be included in the message.
Example:
["BumpFacts",{}]The Subscribe message tells the interpreter to send notification messages
when certain, specifiable, events occur. The interpreter will confirm the
settings with a Subscribed message in response, and a
Notification message whenever the subscribed events occur.
If the Subscribe message contains a UID then the Subscribed
response and all subsequent Notification messages of all
types will echo that UID.
Each subscribable event has a numeric and alphanumeric (string) identifier, and either may be used for the request. Example:
["Subscribe",{"UID":"XX","Events":[1,4]}]No event notifications are enabled by default.
The following events may be subscribed to:
| Subscription ID | Subscription name |
|---|---|
| 1 | WorkspaceCompaction |
| 2 | WorkspaceResize |
| 3 | UntrappedSignal |
| 4 | TrappedSignal |
When a subscribed event takes place, the Notification
message will include details of that event, as documented there.
Sending a Subscribe message resets the list of subscribed events to those
specified - that is, it replaces any existing subscriptions. The list may be
empty.
Note: following a WSFULL exception the interpreter may be unable to send
either a TrappedSignal or UntrappedSignal.
GetLastKnownState will reliably report the last time a
WSFULL event occurred.
Requests the last known state of the interpreter, and will be responded to with
a LastKnownState message.
Examples:
["GetLastKnownState",{}]["GetLastKnownState",{"UID":"123"}]GetLastKnownState requests can be used when a monitored interpreter becomes
otherwise unresponsive. In "normal" use, GetFacts should be used.
The "Last Known State" is provided from a repository which must be continuously maintained by the interpreter during its normal operation on the off-chance that it might be asked for at any time. This introduces an overhead so is only done if the Event Gathering Level is set to 1. Maintaining this repository is independent of whether a Health Monitor is connected at the time or not.
112⌶ controls the Event Gathering Level.
In addition, ⎕PROFILE must currently be started to provide full
information, e.g.:
⎕PROFILE 'start' 'coverage'
Requests that the interpreter connects to RIDE listening at a given address and
on a given port, or disconnects a conection. Once the action has taken place
the request will be responded to with a RideConnection
message.
Examples:
["ConnectRide",{"Address":"localhost","Port":4502}]If Address and Port are both present in the mesasge, and have string and
numeric values respectively, a connection will be attempted, otherwise any
existing connection will be disconnected.
For a connection request, HMON will send 3502⌶'CONNECT:<Address>:<port>'
to the interpreter and, if that is successful, 3502⌶1. For a disconnecton
request, HMON will send 3502⌶0. See
Manage RIDE Connections
for details of how 3502⌶ behaves, the return values it produces etc.
Note: the interpreter Access Level must be set to 3 in order to permit this
request. 112⌶ controls the Access Level.
Note
"ConnectRide" is not supported by Dyalog 19.0.
Reports one or more "facts" about the application state, corresponding to a
GetFacts, PollFacts, StopFacts
or BumpFacts request.
The facts are presented as an array of objects in the same order as in the request. Each will contain a "Value" object or a "Values" array of objects, the contents of which depend on the fact type.
Example:
["Facts",{"UID":"xx","Interval":5000,"Facts":[{"ID":6,"Name":"ThreadCount","Value":{"Total":1,"Suspended":0}}]}]"Interval" is only present in the response if polling.
The "Value" object contains:
- "Machine" - an object containing facts about the host machine:
- "Name" - the name of the machine.
- "User" - the name of the user account.
- "PID" - the interpreter process ID.
- "Desc" - an application-specific description - see
110⌶. - "AccessLevel" - the level of rights the Health Monitor is permitted.
Note
"Desc" is always a string value when produced by Dyalog 19.0. It may be any value (including an object) when produced by Dyalog 20.0 onwards.
- "Interpreter" - an object containing facts about the host interpreter:
- "Version" - the interpreter version, in the form "A.B.C".
- "BitWidth" - the interpreter edition word size, either 32 or 64 (bits).
- "IsUnicode" - a Boolean value indicating whether the interpreter is a Unicode edition or not (i.e. is Classic).
- "IsRuntime" - a Boolean value indicating whether the interpreter is a Runtime edition or not (i.e. is a development version).
- "SessionUUID" - a String value containing a unique Session UUID in
RFC 9562 format (see also
113⌶0).
Note
"SessionUUID" is not provided by Dyalog 19.0.
-
"CommsLayer" - an object containing facts about the interpreter comms layer servicing the Health Monitor:
- "Version" - the Comms Layer (Conga) version.
- "Address" - the interpreter's network IP address.
- "Port4" - the interpreter's network port number.
- "Port6" - an alternate port number.
-
"RIDE" - an object containing facts about the interpreter comms later servicing RIDE:
- "Listening" - a Boolean value indicating whether the interpreter is listening for RIDE connections. There are no further entries in this object if the value is 0.
- "HTTPServer" - a Boolean value indicating whether the interpreter is running as a RIDE HTTP server ("Zero footprint" RIDE).
- "Version" - the Comms Layer (Conga) version.
- "Address" - the interpreter's network IP address.
- "Port4" - the interpreter's network port number.
- "Port6" - an alternate port number.
The "Value" object contains:
- "UserIdentification", "ComputeTime", "ConnectTime", "KeyingTime" - elements
from
⎕AI.
The "Value" object contains:
- "WSID" - the workspace name.
- "Available", "Used", "Compactions", "GarbageCollections", "GarbagePockets",
"FreePockets", "UsedPockets", "Sediment", "Allocation", "AllocationHWM",
"TrapReserveWanted", "TrapReserveActual" - statistics from
2000⌶.
The "Values" array contains one or more objects (one per thread), each containing:
- "Tid" - thread ID
- "Stack" - SIstack, as an array of objects each containing:
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
- "Description" - a line of SIstack information - only if "Restricted" is 0.
- "Suspended" - a Boolean value indicating whether the thread is suspended.
- "State" - a string indicating the current location of the thread.
- "Flags" - e.g. Normal, Paused or Terminated.
If "Suspended" is 1 the object will also contain:
- "DMX" - an object containing elements of
⎕DMXin that thread, eithernullor:- "Category", "DM", "EM", "EN", "ENX", "InternalLocation", "Vendor" "Message", "OSError" - only if "Restricted" is 0.
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
- "Exception" - an object containing elements of
⎕EXCEPTIONin that thread, eithernullor:- "Source", "StackTrace", "Message" - only if "Restricted" is 0.
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
The "Values" array contains one or more objects (one per suspended thread), each containing values as descrived for "Threads" with the exception that the "Suspended" item is omitted as it would always have the value 1.
The "Value" object contains:
- "Total" - the total number of threads.
- "Suspended" - the number of threads which are suspended.
The response to Subscribe. The message lists all subscription
events and whether they are enabled or disabled.
Example (showing an incomplete list of subscription events):
["Subscribed",{"UID":"XX","Events":[{"ID":1,"Name":"WorkspaceCompaction","Value":1},{"ID":2,"Name":"WorkspaceResize","Value":0}]}A Notification message indicates that a specified event, for which
notifications have been subscribed, has taken or is taking place. The
Subscribe message is used to select notified event types. The
Notification message always reports a single event and contains the event ID
and name, along with any specific detail pertaining to that event type.
Example:
["Notification",{"UID":"XX","Event":{"ID":2,"Name":"WorkspaceResize"},"Size":894213}]Occurs when a workspace compaction has occurred. The additional values provided are:
- "Tid" - thread ID.
- "Stack" - SIstack, as an array of objects each containing:
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
- "Description" - a line of SIstack information - only if "Restricted" is 0.
Occurs when the workspace size (the amount of memory committed by the OS) has grown or shrunk. The additional values provided are:
- "Size" - new size, in bytes.
Occurs when an APL exception has been signalled which has not been trapped. The additional values provided are:
- "Tid" - thread ID.
- "Stack" - SIstack, as an array of objects each containing:
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
- "Description" - a line of SIstack information - only if "Restricted" is 0.
- "DMX" - an object containing elements of
⎕DMXin that thread, eithernullor:- "Category", "DM", "EM", "EN", "ENX", "InternalLocation", "Vendor" "Message", "OSError" - only if "Restricted" is 0.
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
- "Exception" - an object containing elements of
⎕EXCEPTIONin that thread, eithernullor:- "Source", "StackTrace", "Message" - only if "Restricted" is 0.
- "Restricted": a Boolean value indicating whether some information is restricted (missing) because the Access Level does not permit it.
Occurs when an APL exception has been signalled which has not been trapped. The
additional values provided are as described for
UntrappedSignal
The response to GetLastKnownState, containing:
- The UID, if provided in the request.
- The interpreter's current UTC clock setting, so that times elapsed since the other timings can be computed.
- The line currently being executed by the interpreter and the UTC time that
this line started or resumed execution, if enabled with
112⌶and⎕PROFILEis running (otherwise this information is omitted). - An activity code and the UTC time that this activity started, if enabled with
112⌶(otherwise omitted). - The time a trapped or untrapped WSFULL event last occurred, if enabled with
112⌶(otherwise omitted).
UTC times are in ISO format with millisecond precision, e.g. 20231231T235959.999 for the very last millisecond of 2023.
Activity codes are:
| Code | Meaning |
|---|---|
| 1 | Anything not specifically listed below |
| 2 | Performing a workspace allocation |
| 3 | Performing a workspace compaction |
| 4 | Performing a workspace check |
| 222 | Sleeping (an internal testing feature) |
This functionality is at an early stage of development. It is anticipated that this list will be significantly extended in future.
Examples:
No UID provided, and 112⌶0 set:
["LastKnownState",{"TS":"20230111T144700.132Z"}]UID provided, 112⌶2 1 set and ⎕PROFILE started:
["LastKnownState",{"UID":"123","TS":"20230111T144700.132Z","Activity":{"Code":1,"TS":"20230111T144700.132Z"},"Location":{"Function":"#.f","Line":2,"TS":"20230111T144700.132Z"},"WS FULL":{"TS":"20230111T144620.723Z"}}]Note: "Location" is updated by the interpreter whenever execution of a line
begins or resumes. If program execution stops for any reason (e.g. exception
or program termination) it will report the last executed line. "location" does
not report anything about inactive threads - full thread/stack info is
available with GetFacts, so long as the interpreter is
responsive.
The response to ConenectRide, containing:
- The UID, if provided in the request.
- "Restricted": a Boolean value indicating whether the connection request was disallowed because the Access Level did not permit it.
- "Connect": a Boolean indicating whether the
ConenectRidemessage was interpreter as a Connect (1) or Disconnect (0) request - only if "Restricted" is 0). - "Status": the return code issued by
3502⌶(0 indicates success) - only if "Restricted" is 0).
Example:
["RideConnection",{"UID":"myuid","Restricted":0,"Connect":1,"Status":0}]Note
"RideConnection" is not provided by Dyalog 19.0.
The response to a syntactically invalid JSON message, or a message which does not strictly define a two-element array with a string name in the first array element and an object in the second array element. Because it is a response to an "unintelligible" message, it will never contain a UID response.
Example:
["InvalidSyntax",{}]The response to a message which contains a UID when it should not -
specifically, StopFacts and BumpFacts message.
Example:
["DisallowedUID",{"UID":"xx","Name":"StopFacts"}]The response to a syntactically correct message which contains a message with an unrecognised name. The unrecognised name, and the UID if provided, ares included in the message.
Example:
["UnknownCommand",{"UID":"ABC","Name":"Hatstand"}]The response to a syntactically correct request message which does not exactly conform to specification. The name, and the UID if provided, are included in the message.
Example:
["MalformedCommand",{"Name":"GetLastKnownState"}]Sent by the interpreter under user/application control using 111⌶.
Example:
["UserMessage",{"UID":"123","Message":"Hello"}]{R}←(110⌶) Y
Specifies the machine description which will appear in
Facts messages sent to the Health Monitor.
The default machine description is an empty string ("") until one is
explicitly specified.
Y is any array or namespace which can be serialised as JSON text.
Note
Dyalog 19.0 only: Y is a vector or scalar only; the "description" in the
Facts message is therefore always a string value.
The shy result is the value 1.
{R}←{X} (111⌶) Y
Will cause the interpreter to send a UserMessage notification
message to the client, if one is connected.
Y is a character vector or scalar containing the free-form message text.
X is an optional character vector or scalar containing the UID.
The shy result is the value 1.
R←X (112⌶) Y
Starts and stops the Health Monitor, specifies the Interface Configuration and
controls the Access and Event Gathering Levels. See the section
Establishing connections and controlling access levels
for an explanation of what these are and their permitted values.
Y is a 1 or 2-element numeric array consisting of:
- Settings for Access Level, and optionally
- Event Gathering.
X may be omitted, or scalar zero, or an empty character vector, or character vector specifying the Interface Configuration.
X should be a character vector containing either:
- The Interface Configuration, or
- Nothing (i.e. empty), in which case the previous Interface Configuration (if any) is reused.
Y should contain:
- The Access Level values of 1, 2 or 3, and optionally
- The Event Gathering setting; defaults to 0 if not specified.
The function should be called monadically or with 0 in X.
Y should contain:
- The Access Level values of 1, 2 or 3, and optionally
- The Event Gathering setting, which defaults to 0 if not specified.
The settings can be changed whether or not a client is currently attached and take effect immediately.
The function should be called monadically or with 0 in the left argument.
Y should contain:
- The Access Level value 0.
No action will be taken and 0 will be returned if:
- A request is made to start the HMON comms layer when it is already started.
- A request is made to update the Access Level and Event Gathering settings, and the HMON comms layer is not started.
- A request is made to stop the HMON comms layer when it is already stopped.
Otherwise, the comms layer is stopped or started as requested and then:
- An error will be signalled if the operation fails.
- The shy value 1 will be returned if the operation succeeds.
R←X (113⌶) Y
Y must be ⍬
The result is a character vector containing the Session UUID, as reported to an HMON client as a Host fact.
Note
113⌶ is not available in Dyalog 19.0.