Skip to content

Commit dc3e7dc

Browse files
committed
ckpt
Signed-off-by: Ralph Castain <[email protected]>
1 parent 3a89a04 commit dc3e7dc

File tree

2 files changed

+193
-56
lines changed

2 files changed

+193
-56
lines changed

Chap_API_Job_Mgmt.tex

Lines changed: 190 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -712,8 +712,8 @@ \subsection{\code{PMIx_Process_monitor_nb}}
712712
}
713713

714714
\begin{arglist}
715-
\argin{monitor}{info (handle)}
716-
\argin{error}{status (integer)}
715+
\argin{monitor}{Pointer to \restruct{pmix_info_t} specifying monitoring action (handle)}
716+
\argin{error}{\ac{PMIx} constant that to be used when generating an associated monitoring event (integer)}
717717
\argin{directives}{Array of info structures (array of handles)}
718718
\argin{ndirs}{Number of elements in the \refarg{directives} array (integer)}
719719
\argin{cbfunc}{Callback function \refapi{pmix_info_cbfunc_t} (function reference)}
@@ -731,22 +731,49 @@ \subsection{\code{PMIx_Process_monitor_nb}}
731731
\returnend
732732

733733
\optattrstart
734-
The following attributes may be implemented by a \ac{PMIx} library or by the host environment. If supported by the \ac{PMIx} server library, then the library must not pass the supported attributes to the host environment. All attributes not directly supported by the server library must be passed to the host environment if it supports this operation, and the library is \textit{required} to add the \refAttributeItem{PMIX_USERID} and the \refAttributeItem{PMIX_GRPID} attributes of the requesting process:
734+
The following attributes may be implemented by a \ac{PMIx} library or by the host environment. If an attribute is supported by the \ac{PMIx} server library, then the library must not pass the supported attributes to the host environment unless the requested action involves other nodes. In addition, the library is \textit{required} to add the \refAttributeItem{PMIX_USERID} and the \refAttributeItem{PMIX_GRPID} attributes of the requesting process to the directives array when it passes actions to its host.
735735

736-
\pasteAttributeItem{PMIX_MONITOR_ID}
737-
\pasteAttributeItem{PMIX_MONITOR_CANCEL}
738-
\pasteAttributeItem{PMIX_MONITOR_APP_CONTROL}
739-
\pasteAttributeItem{PMIX_MONITOR_HEARTBEAT}
740-
\pasteAttributeItem{PMIX_MONITOR_HEARTBEAT_TIME}
741-
\pasteAttributeItem{PMIX_MONITOR_HEARTBEAT_DROPS}
742-
\pasteAttributeItem{PMIX_MONITOR_FILE}
743-
\pasteAttributeItem{PMIX_MONITOR_FILE_SIZE}
744-
\pasteAttributeItem{PMIX_MONITOR_FILE_ACCESS}
745-
\pasteAttributeItem{PMIX_MONITOR_FILE_MODIFY}
746-
\pasteAttributeItem{PMIX_MONITOR_FILE_CHECK_TIME}
747-
\pasteAttributeItem{PMIX_MONITOR_FILE_DROPS}
748-
\pasteAttributeItem{PMIX_SEND_HEARTBEAT}
749-
\pasteAttributeItem{PMIX_MONITOR_RESOURCE_USAGE}
736+
The \refarg{monitor} argument may contain any of the following actions:
737+
738+
\begin{itemize}
739+
\item \pasteAttributeItem{PMIX_MONITOR_CANCEL}
740+
\item \pasteAttributeItem{PMIX_MONITOR_HEARTBEAT}. The associated \refarg{directives} array may include any of the following:
741+
\begin{itemize}
742+
\item \pasteAttributeItem{PMIX_MONITOR_HEARTBEAT_TIME}
743+
\item \pasteAttributeItem{PMIX_MONITOR_HEARTBEAT_DROPS}
744+
\end{itemize}
745+
\item \pasteAttributeItem{PMIX_SEND_HEARTBEAT}
746+
\item \pasteAttributeItem{PMIX_MONITOR_FILE}. The associated \refarg{directives} array may include any of the following:
747+
\begin{itemize}
748+
\item \pasteAttributeItem{PMIX_MONITOR_FILE_SIZE}
749+
\item \pasteAttributeItem{PMIX_MONITOR_FILE_ACCESS}
750+
\item \pasteAttributeItem{PMIX_MONITOR_FILE_MODIFY}
751+
\item \pasteAttributeItem{PMIX_MONITOR_FILE_CHECK_TIME}
752+
\item \pasteAttributeItem{PMIX_MONITOR_FILE_DROPS}
753+
\end{itemize}
754+
\item \pasteAttributeItem{PMIX_MONITOR_PROC_RESOURCE_USAGE}. The associated \refarg{directives} array may include any of the following:
755+
\begin{itemize}
756+
\item \refattr{PMIX_MONITOR_RESOURCE_RATE}
757+
\item \refattr{PMIX_MONITOR_TARGET_PROCS}
758+
\item \refattr{PMIX_MONITOR_TARGET_PIDS}
759+
\item \refattr{PMIX_MONITOR_TARGET_NODES}. All processes on the specified nodes are to be monitored.
760+
\item \refattr{PMIX_MONITOR_TARGET_NODEIDS}. All processes on the specified nodes are to be monitored.
761+
\end{itemize}
762+
\item \pasteAttributeItem{PMIX_MONITOR_NODE_RESOURCE_USAGE}. The associated \refarg{directives} array may include any of the following:
763+
\begin{itemize}
764+
\item \refattr{PMIX_MONITOR_RESOURCE_RATE}
765+
\item \refattr{PMIX_MONITOR_TARGET_NODES}
766+
\item \refattr{PMIX_MONITOR_TARGET_NODEIDS}
767+
\item \refattr{PMIX_MONITOR_TARGET_PROCS}. Monitor the nodes where the specified processes are located.
768+
\end{itemize}
769+
\end{itemize}
770+
771+
In addition to action-specific directives, the \refarg{directives} array may include:
772+
773+
\begin{itemize}
774+
\item \pasteAttributeItem{PMIX_MONITOR_ID}
775+
\item \pasteAttributeItem{PMIX_MONITOR_APP_CONTROL}
776+
\item \pasteAttributeItem{PMIX_RANGE}. Non-default range to be used when generating the associated event for this monitoring action.
750777
\optattrend
751778

752779
%%%%
@@ -790,8 +817,99 @@ \subsection{Monitoring events}
790817
\declareconstitemvalue{PMIX_MONITOR_FILE_ALERT}{-110}
791818
File failed its monitoring detection criteria. The file that triggered this alert will be identified in the event.
792819
%
820+
\declareconstitemvalueProvisional{PMIX_MONITOR_RESUSAGE_UPDATE}{-112}
821+
Resource usage update - the report will be included in the event information.
822+
%
793823
\end{constantdesc}
794824

825+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
826+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
827+
\subsection{Monitoring Datatypes}
828+
829+
The following datatype definitions have been created to support monitoring operations and information.
830+
831+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
832+
\subsubsection{Node PID Structure}
833+
\declarestructProvisional{pmix_node_pid_t}
834+
835+
The \refstruct{pmix_node_pid_t} structure contains the hostname and pid of a process executing on that host.
836+
Since a pid is uniquely associated with a given host, this creates a conjugate pair.
837+
838+
\copySignature{pmix_node_pid_t}{6.0}{
839+
typedef struct pmix_node_pid \{ \\
840+
\hspace*{4\sigspace}char *hostname; \\
841+
\hspace*{4\sigspace}pid_t pid; \\
842+
\} pmix_node_pid_t;
843+
}
844+
845+
The \refarg{pid} field contains the \ac{pid_t} of the process, while the \refarg{hostname} is the name of the node where the process is executing.
846+
847+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
848+
\subsubsection{Node PID support functions}
849+
850+
The following functions are provided for convenience when working with \refstruct{pmix_node_pid_t} structures.
851+
852+
\littleheader{Initialize the node_pid structure}
853+
\declareapiProvisional{PMIx_Nodepid_construct}
854+
855+
Initialize the \refstruct{pmix_node_pid_t} fields.
856+
857+
\copySignature{PMIx_Nodepid_construct}{6.0}{
858+
void
859+
PMIx_Nodepid_construct(pmix_node_pid_t *p);
860+
}
861+
862+
\begin{arglist}
863+
\argin{p}{Pointer to the structure to be initialized(pointer to \refstruct{pmix_node_pid_t})}
864+
\end{arglist}
865+
866+
\littleheader{Destruct the node_pid structure}
867+
\declareapiProvisional{PMIx_Nodepid_destruct}
868+
869+
Destruct the \refstruct{pmix_node_pid_t} fields.
870+
871+
\copySignature{PMIx_Nodepid_destruct}{6.0}{
872+
void
873+
PMIx_Nodepid_destruct(pmix_node_pid_t *p);
874+
}
875+
876+
\begin{arglist}
877+
\argin{p}{Pointer to the structure to be destructed (pointer to \refstruct{pmix_node_pid_t})}
878+
\end{arglist}
879+
880+
\littleheader{Create an array of node_pid structures}
881+
\declareapiProvisional{PMIx_Nodepid_create}
882+
883+
Allocate and initialize an array of \refstruct{pmix_node_pid_t} structures.
884+
885+
\copySignature{PMIx_Nodepid_create}{6.0}{
886+
pmix_node_pid_t*
887+
PMIx_Nodepid_create(size_t n);
888+
}
889+
890+
\begin{arglist}
891+
\argin{n}{Number of \refstruct{pmix_node_pid_t} structures to allocate}
892+
\end{arglist}
893+
894+
Returns \refstruct{pmix_node_pid_t} pointer to the allocated array
895+
896+
\littleheader{Release an array of node_pid structures}
897+
\declareapiProvisional{PMIx_Nodepid_free}
898+
899+
Free all allocated memory in an array of \refstruct{pmix_node_pid_t} structures.
900+
901+
\copySignature{PMIx_Nodepid_free}{6.0}{
902+
void
903+
PMIx_Nodepid_free(pmix_node_pid_t *p,
904+
\hspace*{4\sigspace}size_t n);
905+
}
906+
907+
\begin{arglist}
908+
\argin{p}{Pointer to the array to be released (pointer to \refstruct{pmix_node_pid_t})}
909+
\argin{n}{Number of \refstruct{pmix_node_pid_t} structures in array}
910+
\end{arglist}
911+
912+
795913
%%%%%%%%%%%
796914
\subsection{Monitoring attributes}
797915
\label{api:struct:attributes:monitor}
@@ -828,19 +946,19 @@ \subsection{Monitoring attributes}
828946
}
829947
%
830948
\declareAttribute{PMIX_MONITOR_FILE}{"pmix.monitor.fmon"}{char*}{
831-
Register to monitor file for signs of life.
949+
Register to monitor file for signs of life - the value contains the filename to be monitored.
832950
}
833951
%
834952
\declareAttribute{PMIX_MONITOR_FILE_SIZE}{"pmix.monitor.fsize"}{bool}{
835953
Monitor size of given file is growing to determine if the application is running.
836954
}
837955
%
838-
\declareAttribute{PMIX_MONITOR_FILE_ACCESS}{"pmix.monitor.faccess"}{char*}{
839-
Monitor time since last access of given file to determine if the application is running.
956+
\declareAttribute{PMIX_MONITOR_FILE_ACCESS}{"pmix.monitor.faccess"}{bool}{
957+
Monitor time since last access to determine if the application is running.
840958
}
841959
%
842-
\declareAttribute{PMIX_MONITOR_FILE_MODIFY}{"pmix.monitor.fmod"}{char*}{
843-
Monitor time since last modified of given file to determine if the application is running.
960+
\declareAttribute{PMIX_MONITOR_FILE_MODIFY}{"pmix.monitor.fmod"}{bool}{
961+
Monitor time since last modified to determine if the application is running.
844962
}
845963
%
846964
\declareAttribute{PMIX_MONITOR_FILE_CHECK_TIME}{"pmix.monitor.ftime"}{uint32_t}{
@@ -851,31 +969,40 @@ \subsection{Monitoring attributes}
851969
Number of file checks that can be missed before generating the event.
852970
}
853971
%
972+
\declareAttributeProvisional{PMIX_MONITOR_TARGET_PROCS}{pmix.monitor.tgtproc}{pmix_data_array_t*}{
973+
Arrray of process IDs specifying the processes to be monitored. Can include a
974+
\refconst{PMIX_RANK_WILDCARD} to indicate that all processes
975+
from a given namespace are to be included. If omitted, then
976+
all processes in the session will be monitored. May be included
977+
multiple times to fully specify all processes to be included.
978+
%
979+
\declareAttributeProvisional{PMIX_MONITOR_TARGET_PIDS}{pmix.monitor.tgtpid}{pmix_data_array_t*}{
980+
Array of \refstruct{pmix_node_pid_t} structures to be monitored. Can include a
981+
structure containing a hostname with a pid value of \code{-1} to indicate all
982+
processes on that node are to be included. May be included
983+
multiple times to fully specify all processes to be included.
984+
%
985+
\declareAttributeProvisional{PMIX_MONITOR_TARGET_NODES}{pmix.monitor.tgtnode}{pmix_data_array_t*}{
986+
Array of host names to be monitored
987+
%
988+
\declareAttributeProvisional{PMIX_MONITOR_TARGET_NODEIDS}{pmix.monitor.tgtndids}{pmix_data_array_t*}{
989+
Array of node IDs (\code{uint32_t}) to be monitored
990+
%
854991
\declareAttributeProvisional{PMIX_MONITOR_RESOURCE_RATE}{pmix.monitor.resrate}{uint64_t}{
855992
Monitor resource usage every N seconds, where N is the value provided by the attribute.
856993
}
857994
%
858-
\declareAttributeProvisional{PMIX_MONITOR_RESOURCE_USAGE}{"pmix.monitor.resuse"}{pmix_data_array_t*}{
859-
Monitor the resources specified in the provided \refstruct{pmix_data_array_t}. Resource types may
995+
\declareAttributeProvisional{PMIX_MONITOR_PROC_RESOURCE_USAGE}{"pmix.monitor.presuse"}{pmix_data_array_t*}{
996+
Monitor the resources specified in the provided \refstruct{pmix_data_array_t}. If the provided array
997+
is \code{NULL}, then all resources shall be monitored. If no targets are provided in the associated
998+
\refarg{directives} array, then
999+
all processes in the session will be monitored. Resource types may
8601000
include any of the following:
8611001

862-
\begin{itemize}
863-
\item \refattr{PMIX_MONITOR_RESOURCE_RATE}. If not provided, then the request will be treated as a one-shot
864-
sampling of resource usage.
865-
\item \refattr{PMIX_PROC_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
866-
all process resource usage values shall be returned for all processes in the session.
867-
Optionally, the array of \refstruct{pmix_info_t} can specify the processes to be monitored, and/or the particular attributes to be included. Note that the values in the provided structures will be
868-
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
869-
\refattr{PMIX_PROC_SAMPLE_TIME} will always be included in the returned data (there is no
870-
need to include it in the request). Optional attributes include:
8711002
\begin{itemize}
872-
\item \refattr{PMIX_PROCID}. Optionally specify the process to be monitored. Can include a
873-
\refconst{PMIX_RANK_WILDCARD} to indicate that all processes
874-
from a given namespace are to be included. If omitted, then
875-
all processes in the session will be monitored. May be included
876-
multiple times to fully specify all processes to be included.
8771003
\item \refattr{PMIX_HOSTNAME}. Include the hostname where the process is located.
878-
\item \refattr{PMIX_PROC_PID} Optionally specify the process to be monitored.
1004+
\item \refattr{PMIX_NODEID}. Include the node ID where the process is located.
1005+
\item \refattr{PMIX_PROC_PID}
8791006
\item \refattr{PMIX_PROC_OS_STATE}
8801007
\item \refattr{PMIX_PROC_TIME}
8811008
\item \refattr{PMIX_PROC_PERCENT_CPU}
@@ -888,22 +1015,22 @@ \subsection{Monitoring attributes}
8881015
\item \refattr{PMIX_PROC_CPU}
8891016
\item \refattr{PMIX_PROC_SAMPLE_TIME}
8901017
\end{itemize}
891-
\item \refattr{PMIX_NODE_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
892-
all node resource usage values shall be returned for all nodes in the session.
893-
Optionally, the array of \refstruct{pmix_info_t} can specify the nodes to be monitored (using the \refattr{PMIX_HOSTNAME} or \refattr{PMIX_NODEID} attributes), and/or the particular attributes to be included. Note that the values in the provided structures will be
894-
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
895-
\refattr{PMIX_NODE_SAMPLE_TIME} will always be included in the returned data (there is no
896-
need to include it in the request). Optional
897-
attributes include:
1018+
1019+
%
1020+
\declareAttributeProvisional{PMIX_PROC_RESOURCE_USAGE}{"pmix.proc.resuse"}{pmix_data_array_t*}{
1021+
Contains the reported resource usage of a given process. The process ID will be the first element in
1022+
the array. Note that the \refattr{PMIX_PROC_SAMPLE_TIME} will always be included in the returned data
1023+
(there is no need to include it in the request).
1024+
}
1025+
%
1026+
\declareAttributeProvisional{PMIX_MONITOR_NODE_RESOURCE_USAGE}{"pmix.monitor.ndresuse"}{pmix_data_array_t*}{
1027+
Monitor the resources specified in the provided \refstruct{pmix_data_array_t}. If the provided array
1028+
is \code{NULL}, then all resources shall be monitored. If no targets are provided in the associated
1029+
\refarg{directives} array, then
1030+
all nodes in the session will be monitored. Resource types may
1031+
include any of the following:
1032+
8981033
\begin{itemize}
899-
\item \refattr{PMIX_HOSTNAME}. Optionally specify the node to be monitored. May be included multiple
900-
times to fully specify all nodes to be included. Only
901-
hostname or node ID need be included (not both). If omitted, then all nodes in the session
902-
shall be monitored.
903-
\item \refattr{PMIX_NODEID}. Optionally specify the process to be monitored. May be included multiple
904-
times to fully specify all nodes to be included. Only
905-
hostname or node ID need be included (not both). If omitted, then all nodes in the session
906-
shall be monitored.
9071034
\item \refattr{PMIX_NODE_LOAD_AVG}
9081035
\item \refattr{PMIX_NODE_LOAD_AVG5}
9091036
\item \refattr{PMIX_NODE_LOAD_AVG15}
@@ -917,7 +1044,7 @@ \subsection{Monitoring attributes}
9171044
\item \refattr{PMIX_NODE_MEM_MAPPED}
9181045
\item \refattr{PMIX_DISK_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
9191046
all disk resource usage values shall be returned for all disks attached to the node.
920-
Optionally, the array of \refstruct{pmix_info_t} can specify the disks to be monitored (using the \refattr{PMIX_DISK_ID} attribute), and/or the particular attributes to be included. Note that the values in the provided structures will be
1047+
Optionally, the array of \refstruct{pmix_info_t} can specify the disks to be monitored (using the \refattr{PMIX_DISK_ID} attribute), and/or the particular attributes to be reported. Note that the values in the provided structures will be
9211048
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
9221049
\refattr{PMIX_DISK_SAMPLE_TIME} will always be included in the returned data (there is no
9231050
need to include it in the request). Optional
@@ -939,7 +1066,7 @@ \subsection{Monitoring attributes}
9391066
\end{itemize}
9401067
\item \refattr{PMIX_NETWORK_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
9411068
all network resource usage values shall be returned for all interfaces on the node.
942-
Optionally, the array of \refstruct{pmix_info_t} can specify the networks to be monitored (using the \refattr{PMIX_NETWORK_ID} attribute), and/or the particular attributes to be included. Note that the values in the provided structures will be
1069+
Optionally, the array of \refstruct{pmix_info_t} can specify the networks to be monitored (using the \refattr{PMIX_NETWORK_ID} attribute), and/or the particular attributes to be reported. Note that the values in the provided structures will be
9431070
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
9441071
\refattr{PMIX_NET_SAMPLE_TIME} will always be included in the returned data (there is no
9451072
need to include it in the request). Optional
@@ -958,6 +1085,13 @@ \subsection{Monitoring attributes}
9581085
\end{itemize}
9591086
}
9601087

1088+
%
1089+
\declareAttributeProvisional{{PMIX_NODE_RESOURCE_USAGE}{"pmix.node.resuse"}{pmix_data_array_t*}{
1090+
Contains the reported resource usage of a given node. The hostname and/or node ID will be the first element in
1091+
the array. Note that the \refattr{PMIX_NODE_SAMPLE_TIME} will always be included in the returned data
1092+
(there is no need to include it in the request).
1093+
}
1094+
9611095
%%%%%%%%%%%
9621096
\versionMarkerProvisional{6.0}
9631097
\subsection{Resource usage attributes}

Chap_API_Struct.tex

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2548,6 +2548,9 @@ \section{Generalized Data Types Used for Packing/Unpacking}
25482548
\declareconstitemvalue{PMIX_STOR_ACCESS_TYPE}{69}
25492549
Bitmask specifying different storage system access types. (\refstruct{pmix_storage_access_type_t}).
25502550
%
2551+
\declareconstitemvalueProvisional{PMIX_NODE_PID}{70}
2552+
Structure containing the hostname and pid of a process
2553+
%
25512554
\declareconstitemvalue{PMIX_DATA_TYPE_MAX}{500}
25522555
A starting point for implementer-specific data types.
25532556
Values above this are guaranteed not to conflict with \ac{PMIx} values.

0 commit comments

Comments
 (0)