Skip to content

Commit b613969

Browse files
JV0812jpipkin1
andauthored
System Limits Visibility (#5996)
* System Limits Visibility * minor fixes * Update docs/manage/health-events.md Co-authored-by: John Pipkin (Sumo Logic) <[email protected]> * Update docs/manage/health-events.md Co-authored-by: John Pipkin (Sumo Logic) <[email protected]> * Update docs/metrics/introduction/metric-formats.md Co-authored-by: John Pipkin (Sumo Logic) <[email protected]> * Update docs/send-data/installed-collectors/sources/streaming-metrics-source.md Co-authored-by: John Pipkin (Sumo Logic) <[email protected]> * Update docs/manage/health-events.md * Update blog-service/2025-11-14-manage.md * Update docs/manage/health-events.md * Change release note date to Nov 25 2025 * Change release note date to Dec 1 2025 * Rename 2025-12-01-manage.md to 2025-11-27-manage.md * Rename 2025-11-27-manage.md to 2025-12-17-manage.md --------- Co-authored-by: John Pipkin (Sumo Logic) <[email protected]>
1 parent f9e1e94 commit b613969

File tree

5 files changed

+117
-89
lines changed

5 files changed

+117
-89
lines changed

blog-service/2025-12-17-manage.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: System Limits Visibility (Manage)
3+
image: https://assets-www.sumologic.com/company-logos/_800x418_crop_center-center_82_none/SumoLogic_Preview_600x600.jpg?mtime=1617040082
4+
keywords:
5+
- system-limits-visibility
6+
- manage
7+
- health-events
8+
hide_table_of_contents: true
9+
---
10+
11+
We’re excited to announce that Health Events are now automatically generated when 90% usage threshold is exceeded for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs). These health events can further be configured to receive timely alerts whenever a threshold breach occurs, ensuring that all designated recipients are promptly notified when the health event is triggered every time. [Learn more](/docs/manage/health-events).

docs/manage/health-events.md

Lines changed: 103 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,91 +1,35 @@
11
---
22
id: health-events
33
title: Health Events
4-
description: Monitor the health of your Collectors and Sources.
4+
description: Monitor the health of your Collectors, Sources, and Log data.
55
---
66

77
import useBaseUrl from '@docusaurus/useBaseUrl';
88

9+
System Health Events are generated automatically when the system detects an issue within a Collector or Source, or when a credit usage threshold is exceeded for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs).
10+
11+
These events provide visibility into the operational health of Collectors, Sources, and Ingest Budgets, enabling administrators to monitor performance and identify potential issues proactively. Health events also help in investigating common errors and warnings known to affect data collection and processing.
12+
13+
Additionally, a health event is triggered when any limit associated with Lookup Tables, Partitions, Fields, or FERs reaches or exceeds 90% of the allocated capacity, allowing timely action to prevent service disruption. This health event will auto-resolve when the usage falls back below the 90% threshold limit.
14+
15+
:::note
16+
Health events are sent from Installed Collectors of version `19.308-2` and later.
17+
:::
18+
919
## Availability
1020

1121
| Account Type | Account Level |
1222
|:--------------|:---------------------------------------------------------------------------------|
1323
| CloudFlex | Professional, Enterprise |
1424
| Credits | Trial, Essentials, Enterprise Operations, Enterprise Security, Enterprise Suite |
1525

16-
Health events allow you to keep track of the health of your Collectors, Sources, and Ingest Budgets. You can use them to find and investigate common errors and warnings that are known to cause collection issues. 
17-
18-
This framework includes the following:
19-
20-
* Health event logs indexed in the [System Event Index](/docs/manage/security/audit-indexes/system-event-index).
21-
* A [health events table](#health-events-table) on the Alerts page.
22-
* A health status column on the [Collection page](#collection-page).
23-
24-
Health events are sent from Installed Collectors on version 19.308-2 and
25-
later.
26-
27-
## Alerts
28-
29-
Alerts for specific health events are easy to create in the Health Events Table. The details pane of an event provides a **Create Scheduled Search** button to automatically generate the required query.
30-
31-
## Health events
32-
33-
Health events are created when an issue is detected with a Collector or Source. Events are indexed and searchable in a separate partition named **sumologic_system_events** in the [System Event Index](/docs/manage/security/audit-indexes/system-event-index). For details on what information is available in a health event, see the [common parameters](#common-parameters) table.
34-
35-
### Health events table
36-
37-
The health events table allows you to easily view and investigate problems getting your data to Sumo.
26+
## Event schema
3827

39-
On the health events table, you can search, filter, and sort incidents by key aspects like severity, resource name, event name, resource type, and opened since date.
28+
This section defines the structure of System Health Events, including all key parameters and their descriptions. The example below illustrates a sample health event in JSON format, followed by a parameter table explaining each field for better understanding and analysis.
4029

41-
[**New UI**](/docs/get-started/sumo-logic-ui/). To access the health events table, in the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Health Events**. You can also click the **Go To...** menu at the top of the screen and select **Health Events**.
30+
### JSON example
4231

43-
[**Classic UI**](/docs/get-started/sumo-logic-ui-classic). To access the health events table, in the main Sumo Logic menu select **Manage Data > Monitoring > Health Events**.
44-
45-
<img src={useBaseUrl('img/health-events/health-events-table.png')} alt="Health events table" style={{border: '1px solid gray'}} width="800" />
46-
47-
Click on a row to view the details of a health event.
48-
49-
<img src={useBaseUrl('img/health-events/health-event-detail.png')} alt="Health event detail" style={{border: '1px solid gray'}} width="400" />
50-
51-
Click the **Create Scheduled Search** button on the details pane to get alerts for specific health events. The unique identifier of the resource, such as the Source or Collector, is used in the query. See [Schedule a Search](../alerts/scheduled-searches/schedule-search.md) for details.
52-
53-
Under the **More Actions** menu you can select:
54-
55-
* **Event History** to run a search against the **sumologic_system_events** partition to view all of the related event logs.
56-
* **View Object** to view the Collector or Source in the Collection page related to the event.
57-
58-
### Health events severity
59-
60-
Events are categorized by two severity levels, warning and error. The severity column has color-coded error and warning events so you can quickly determine the severity of a given issue.
61-
62-
* <img src={useBaseUrl('img/health-events/warning-label.png')} alt="Warning label" style={{border: '1px solid gray'}} width="75" /> A warning indicates the Collector or Source has a configuration issue or is operating in a degraded state.
63-
* <img src={useBaseUrl('img/health-events/Error-label.png')} alt="Error label" style={{border: '1px solid gray'}} width="50" /> An error indicates the Collector or Source is unable to collect data as expected.
64-
65-
### Common parameters
66-
67-
Each health event log has common keys that categorize it to a product
68-
area and provide details of the event. The following table shows the
69-
common parameters in the order that they are found in health event logs.
70-
71-
| Parameter | Description | Data Type |
72-
|:--|:--|:--|
73-
| status | Either `Healthy` or `Unhealthy` based on the event. | String |
74-
| details | The details of the event include the type as `trackerId`, the `name` of the event, and a `description`. | JSON object of Strings |
75-
| eventType | Health events have a value of `Health-Change`. | String |
76-
| severityLevel | Either `Error` or `Warning` based on the event. | String |
77-
| accountId | The unique identifier of the organization. | String |
78-
| eventId | The unique identifier of the event. | String |
79-
| eventName | The name of the event. | String |
80-
| eventTime | The event timestamp in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. | String |
81-
| eventFormatVersion | The event log format version. | String |
82-
| operator | Information on who did the operation. If it's missing, the Sumo service was the operator. | JSON object of Strings |
83-
| subsystem | The product area of the event. | String |
84-
| resourceIdentity | This includes any unique identifiers, names, and the type of the object associated with the event. | JSON object of Strings |
85-
86-
### Health event log example
87-
88-
```json
32+
```json title="Sample Health Event"
8933
{
9034
"status": "UnHealthy",
9135
"details": {
@@ -111,10 +55,94 @@ common parameters in the order that they are found in health event logs.
11155
}
11256
```
11357

58+
### Parameters table
59+
60+
Each health event log has common keys that categorize it to a product area and provide details of the event. The following table shows the common parameters in the order that they are found in health event logs.
61+
62+
| Parameter | Description | Data type |
63+
|:--|:--|:--|
64+
| status | Either `Healthy` or `Unhealthy` based on the event. | String |
65+
| details | The details of the event include the type as `trackerId`, the `name` of the event, and a `description`. | JSON object of Strings |
66+
| eventType | Health events have a value of `Health-Change`. | String |
67+
| severityLevel | Either `Error` or `Warning` based on the event. | String |
68+
| accountId | The unique identifier of the organization. | String |
69+
| eventId | The unique identifier of the event. | String |
70+
| eventName | The name of the event. | String |
71+
| eventTime | The event timestamp in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. | String |
72+
| eventFormatVersion | The event log format version. | String |
73+
| operator | Information on who did the operation. If it's missing, the Sumo service was the operator. | JSON object of Strings |
74+
| subsystem | The product area of the event. | String |
75+
| resourceIdentity | This includes any unique identifiers, names, and the type of the object associated with the event. | JSON object of Strings |
76+
77+
## Configure Scheduled Search
78+
79+
Configuring the scheduled search for the selected health event will help you with timely alerts to all the recipients when the health event is triggered every time. To configure, follow the below steps:
80+
81+
1. [**Classic UI**](/docs/get-started/sumo-logic-ui-classic). Go to **Manage Data > Monitoring > Health Events**.<br/>[**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Health Events**. <br/><img src={useBaseUrl('/img/health-events/health-events-table.png')} alt="health-events-table" style={{border: '1px solid gray'}} width="800"/>
82+
1. Click on the required row to view the details of a health event. <br/><img src={useBaseUrl('/img/health-events/health-event-detail.png')} alt="health-events-detial" style={{border: '1px solid gray'}} width="400"/>
83+
1. Click the **Create Scheduled Search** button and configure it based on your requirement. For more details, refer to [Create a Scheduled Search](/docs/alerts/scheduled-searches/schedule-search/).
84+
:::info
85+
Query will be auto-generated for the selected health event.
86+
:::
87+
88+
Use the below scheduled search query to get an alert when 90% threshold is exceeded for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs).
89+
90+
``` sql
91+
_index=sumologic_system_events "0000000007063B25"
92+
| json "eventType", "resourceIdentity.id" as eventType , resourceId
93+
| where eventType = "Health-Change" AND resourceId = "0000000007063B25"
94+
```
95+
96+
For specific `eventType`, `resourceId`, `eventName`:
97+
98+
```sql
99+
_index=sumologic_system_events "0000000007063B25"
100+
| json "eventType", "resourceIdentity.id","eventName" as eventType, resourceId, eventName
101+
| where eventType = "Health-Change" AND resourceId = "0000000007063B25" AND eventName="LookupsLimitApproaching"
102+
```
103+
104+
## View Health Events
105+
106+
The health events table allows you to easily view and investigate problems which occur while injecting the data to Sumo Logic. On the health events table, you can search, filter, and sort incidents by key aspects like severity, resource name, event name, resource type, and opened since date.
107+
108+
:::info
109+
It may take up to 15 minutes for a 90% usage breach for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs) to reflect on the Health Events page after detection.
110+
:::
111+
112+
1. [**Classic UI**](/docs/get-started/sumo-logic-ui-classic). Go to **Manage Data > Monitoring > Health Events**.<br/>[**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Health Events**. <br/><img src={useBaseUrl('/img/health-events/health-events-table.png')} alt="health-events-table" style={{border: '1px solid gray'}} width="800"/>
113+
1. Click on the required row to view the details of a health event. <br/><img src={useBaseUrl('/img/health-events/health-event-detail.png')} alt="health-events-detial" style={{border: '1px solid gray'}} width="400"/>
114+
- **Create Scheduled Search**. Click this button to get alerts for specific health events. The unique identifier of the resource type is used in the query. See [Schedule a Search](../alerts/scheduled-searches/schedule-search.md) for details.
115+
- Under the **More Actions** menu you can select:
116+
* **Event History** to run a search against the **sumologic_system_events** partition to view all of the related event logs.
117+
* **View Object** to view the resource in detail related to the event.
118+
- **Description**. Provides the information about the health events error or warning.
119+
- **Severity**. Events are categorized by two severity levels, warning, and error. The severity column has color-coded error and warning events so you can quickly determine the severity of a given issue.
120+
* ![warning label.png](/img/health-events/warning-label.png) A warning indicates the Collector or Source has a configuration issue or is operating in a degraded state.
121+
* ![Error label.png](/img/health-events/Error-label.png) An error indicates the Collector or Source is unable to collect data as expected.
122+
- **Event Name**. The name or type of the health event that occurred. This identifies what kind of issue or status change was detected.
123+
- **Resource Type**. The category or class of resource affected by the event. For example, Collectors, Sources, or Organizations.
124+
- **Resource ID**. A unique identifier for the affected resource.
125+
- **Created At**. The timestamp indicating when the event was generated by the monitoring system.
126+
- **Collector ID**. The unique identifier of the collector that detected and reported the event. This field is only available for *Source* resource type.
127+
- **Collector Name**. The name of the collector associated with the event. This field is only available for *Source* resource type.
128+
- **Error**. A brief summary or title of the detected issue.
129+
- **Service**. Displays the specific resource or service affected by the event.
130+
- **Error Code**. A numeric code associated with the error, that provides a quick reference for troubleshooting or mapping to known issue types.
131+
- **Error Info**. Detailed information about the event. This may include error context and suggested corrective actions.
132+
- **Minutes Since Last Heartbeat**. The number of minutes that have elapsed since the system last received a heartbeat signal from the resource. A higher number may indicate the resource is offline or unresponsive. This field is only available for *Collector* resource type.
133+
134+
## View Health Events in Collection page
135+
136+
A **Health** column on the Collection page shows color-coded healthy, error, and warning states for Collectors and Sources to quickly determine the health of your Collectors and Sources.<br/><img src={useBaseUrl('/img/health-events/Collection-health-column.png')} alt="Collection-health-column" style={{border: '1px solid gray'}} width="800"/>
137+
138+
To view the number of health events associated with the Collector or Source, perform the following steps:
139+
140+
1. Hover over a **Health** status to view a tooltip that provides the number of health events detected on the selected Collector or Source. <br/><img src={useBaseUrl('/img/health-events/health_tooltip.png')} alt="health_tooltip" style={{border: '1px solid gray'}} width="200"/>
141+
1. Click on the **Health** status of a Collector or Source to view a pop-up displaying a list of related events. <br/><img src={useBaseUrl('/img/health-events/object_event_details.png')} alt="object_event_details" style={{border: '1px solid gray'}} width="500"/>
142+
114143
## Search health events
115144

116-
To search all health events run a query against the internal partition
117-
named **sumologic_system_events**. For example,
145+
Events are indexed and searchable in a separate partition named `sumologic_system_events` in the [System Event Index](/docs/manage/security/audit-indexes/system-event-index). To search all health events run a query against the internal partition named `sumologic_system_events`. For example,
118146

119147
```sql
120148
_index=sumologic_system_events "Health-Change"
@@ -130,17 +158,6 @@ Creating a query that defines built-in metadata field values in the scope can he
130158

131159
| **Metadata Field** | **Assignment Description** |
132160
|:--|:--|
133-
| _sourceCategory | Value of the [common parameter](#common-parameters)`subsystem`. |
134-
| _sourceName | Value of the [common parameter](#common-parameters), `eventName`. |
161+
| _sourceCategory | Value of the [common parameter](#parameters-table)`subsystem`. |
162+
| _sourceName | Value of the [common parameter](#parameters-table), `eventName`. |
135163
| _sourceHost | The remote IP address of the host that made the request. If not available the value will be `no_sourceHost`. |
136-
137-
### Collection page
138-
139-
A **Health** column on the Collection page shows color-coded healthy, error, and warning states for Collectors and Sources so you can quickly determine the health of your Collectors and Sources.
140-
141-
The **status** column now shows the status of Sources manually paused by users.
142-
143-
<img src={useBaseUrl('img/health-events/Collection-health-column.png')} alt="Collection health column" style={{border: '1px solid gray'}} width="800" />
144-
145-
* Hover your mouse over a Collector or Source to view a tooltip that provides the number of health events detected on the Collector or Source.<br/><img src={useBaseUrl('img/health-events/health_tooltip.png')} alt="Health tooltip" style={{border: '1px solid gray'}} width="150" />
146-
* Click on the **Health** status in a row to view a pop-up displaying a list of related events.<br/><img src={useBaseUrl('img/health-events/object_event_details.png')} alt="Object event details" style={{border: '1px solid gray'}} width="500" />

docs/metrics/introduction/metric-formats.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ cluster=cluster-1 node=node-1 cpu=cpu-1 metric=cpu_idle 97.29 1460061337
9797

9898
### Mandatory metric name
9999

100-
Unlike Prometheus, Carbon 2.0 format doesn't enforce the presence of a metric name. It also cannot be reliably inferred automatically. Therefore, Sumo Logic requires a `metric` key to be present among `intrinsic_tags`. All metrics without a `metric` key specified will not be ingested to Sumo Logic and a `MetricsMetricNameMissing` Health Event for the associated Metric Source will be triggered (for more information on Health Events, see [About Health Events](/docs/manage/health-events#health-events)).
100+
Unlike Prometheus, Carbon 2.0 format doesn't enforce the presence of a metric name. It also cannot be reliably inferred automatically. Therefore, Sumo Logic requires a `metric` key to be present among `intrinsic_tags`. All metrics without a `metric` key specified will not be ingested to Sumo Logic and a `MetricsMetricNameMissing` Health Event for the associated Metric Source will be triggered (for more information on Health Events, see [Health Events](/docs/manage/health-events)).
101101

102102
For example, the following metric will be correctly ingested to Sumo Logic:
103103
```

docs/send-data/hosted-collectors/cloud-to-cloud-integration-framework/source-info.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ If the Source has any issues during any one of these states, it is placed in an
1919

2020
When you delete the Source, it is placed in a **Stopping** state. When it has successfully stopped, it is deleted from your Hosted Collector.
2121

22-
On the [Collection page](/docs/manage/health-events#collection-page), the Health and Status for Sources is displayed. Use [Health Events](/docs/manage/health-events.md) to investigate issues with collection.
22+
On the [Collection page](/docs/manage/health-events#view-health-events-in-collection-page), the Health and Status for Sources is displayed. Use [Health Events](/docs/manage/health-events) to investigate issues with collection.
2323

2424
Hover your mouse over the status icon to view a tooltip with a count of the detected errors and warnings. You can click on the status icon to open a Health Events panel with details on each detected issue.
2525

0 commit comments

Comments
 (0)