Skip to content

Commit ae0bf4b

Browse files
New Features (#5)
* update version * looping for over 5k results * add relative to epoch * Raise maximum query results * added import * loop for over 5k results * update maxcount help text * add columns * formatting * columns regex validator * consolidate timestamp logic * support nanoseconds * add columns * update no columns and no timestamp logic * add columns * set PowerQuery string required * update r_json if then logging logic * write event before checkpoint * powerQuery input * powerQuery help text * powerquery * timeseries * timeseries * update timeseries * update doc * add timeseries * timeseries * add powerquery input and timeseries search * lowercase method name Co-authored-by: Mike McGrail <[email protected]>
1 parent 7383249 commit ae0bf4b

14 files changed

+750
-178
lines changed

.DS_Store

0 Bytes
Binary file not shown.

README.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,14 +81,20 @@ The DataSet Add-on for Splunk collects the following inputs utilizing time-based
8181
| dataset:query | User-defined standard [query](https://app.scalyr.com/help/api#query) API call to index events | - |
8282

8383
## SPL Command
84-
The `| dataset` command allows queries against the DataSet API directly from Splunk's search bar. Five optional parameters are supported:
84+
The `| dataset` command allows queries against the DataSet API directly from Splunk's search bar. Optional parameters are supported:
8585

86-
- **method** - Define `query` or `powerQuery` to call the appropriate REST endpoint. Default is query.
87-
- **query** - The DataSet [query](https://app.scalyr.com/help/query-language) or Power Query []() used to filter events. Default is no filter (return all events limited by maxCount).
88-
- **maxcount** - Number of events to return from DataSet. Default is 100.
86+
- **method** - Define `query`, `powerquery` or `timeseries` to call the appropriate REST endpoint. Default is query.
87+
- **query** - The DataSet [query](https://app.scalyr.com/help/query-language) or filter used to select events. Default is no filter (return all events limited by maxCount).
88+
- **maxcount** - Number of events to return from DataSet query or powerquery. Default is 100. Not used for timeseries.
8989
- **starttime** - The Splunk time picker can be used (not "All Time"), but if starttime is defined it will take precedence to define the [start time](https://app.scalyr.com/help/time-reference) for DataSet events to return. Use epoch time or relative shorthand in the form of a number followed by d, h, m or s (for days, hours, minutes or seconds), e.g.: `24h`. Default is 24h.
9090
- **endtime** - The Splunk time picker can be used (not "All Time"), but if endtime is defined it will take precedence to define the [end time](https://app.scalyr.com/help/time-reference) for DataSet events to return. Use epoch time or relative shorthand in the form of a number followed by d, h, m or s (for days, hours, minutes or seconds), e.g.: `5m`. Default is current time at search.
9191

92+
For timeseries queries, additional parameters include:
93+
- **function** - Define value to compute from matching events. Default is rate.
94+
- **buckets** - The number of numeric values to return by dividing time range into equal slices. Default is 1.
95+
- **createsummaries** - Specify whether to create summaries to automatically update on ingestion pipeline. Default is true, *be sure to set to false for one-off or while testing new queries*.
96+
- **useonlysummaries** - Specify whether to only use preexisting timeseries for fastest speed.
97+
9298
For all queries, be sure to `"`wrap the entire query in double quotes`"`, and inside use `'`single quotes`'` or double quotes `\"`escaped with a backslash`\"`, as shown in the following examples.
9399

94100
Query Example:
@@ -116,6 +122,11 @@ Since events are returned in JSON format, the Splunk [spath command](https://doc
116122
| collect index=dataset
117123
```
118124

125+
Timeseries Query Example:
126+
```
127+
| dataset method=timeseries search="serverHost='scalyr-metalog'" function="p90(delayMedian)" starttime="24h" buckets=24 createsummaries=false onlyusesummaries=false
128+
```
129+
119130
## Alert Action
120131
An alert action allows sending an event to the DataSet [addEvents API](https://app.scalyr.com/help/api#addEvents).
121132

globalConfig.json

Lines changed: 140 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"meta": {
33
"name": "TA-dataset",
44
"displayName": "DataSet Add-on for Splunk",
5-
"version": "1.0.0",
5+
"version": "1.1.0",
66
"restRoot": "TA_dataset",
77
"schemaVersion": "0.0.3"
88
},
@@ -356,20 +356,156 @@
356356
}
357357
]
358358
},
359+
{
360+
"field": "dataset_query_columns",
361+
"label": "Columns",
362+
"help": "If left blank, all columns are returned.",
363+
"required": false,
364+
"type": "text",
365+
"validators": [
366+
{
367+
"type": "string",
368+
"minLength": 0,
369+
"maxLength": 8192,
370+
"errorMsg": "Max length of text input is 8192"
371+
},
372+
{
373+
"type": "regex",
374+
"pattern": "^(\\w+,\\s*)*\\w+$",
375+
"errorMsg": "Column names must be comma separated."
376+
}
377+
]
378+
},
359379
{
360380
"field": "max_count",
361381
"label": "Max Count",
362-
"help": "Specifies the maximum number of records to return, from 1 to 5000. If left blank, the default is 100.",
382+
"help": "Specifies the maximum number of records to return. If left blank, the default is 100.",
363383
"required": false,
364384
"type": "text",
365385
"validators": [
366386
{
367387
"type": "number",
368388
"range": [
369389
1,
370-
5000
390+
9999999
371391
],
372-
"errorMsg": "Max Count must be 1 - 5000"
392+
"errorMsg": "Max Count must be a number"
393+
}
394+
]
395+
}
396+
]
397+
},
398+
{
399+
"name": "dataset_powerquery",
400+
"title": "DataSet PowerQuery",
401+
"entity": [
402+
{
403+
"field": "name",
404+
"label": "Name",
405+
"type": "text",
406+
"help": "Enter a unique name for the data input",
407+
"required": true,
408+
"validators": [
409+
{
410+
"type": "regex",
411+
"pattern": "^[a-zA-Z]\\w*$",
412+
"errorMsg": "Input Name must start with a letter and followed by alphabetic letters, digits or underscores."
413+
},
414+
{
415+
"type": "string",
416+
"minLength": 1,
417+
"maxLength": 100,
418+
"errorMsg": "Length of input name should be between 1 and 100"
419+
}
420+
]
421+
},
422+
{
423+
"field": "interval",
424+
"label": "Interval",
425+
"type": "text",
426+
"required": true,
427+
"help": "Time interval of input in seconds.",
428+
"validators": [
429+
{
430+
"type": "regex",
431+
"pattern": "^\\-[1-9]\\d*$|^\\d*$",
432+
"errorMsg": "Interval must be an integer."
433+
}
434+
]
435+
},
436+
{
437+
"field": "index",
438+
"label": "Index",
439+
"type": "singleSelect",
440+
"defaultValue": "default",
441+
"options": {
442+
"endpointUrl": "data/indexes",
443+
"createSearchChoice": true,
444+
"denyList": "^_.*$"
445+
},
446+
"required": true,
447+
"validators": [
448+
{
449+
"type": "string",
450+
"minLength": 1,
451+
"maxLength": 80,
452+
"errorMsg": "Length of index name should be between 1 and 80."
453+
}
454+
]
455+
},
456+
{
457+
"field": "start_time",
458+
"label": "Start Time",
459+
"help": "Relative time to query back. Use short form relative time, e.g.: 24h or 30d. Reference https://app.scalyr.com/help/time-reference.",
460+
"required": true,
461+
"type": "text",
462+
"defaultValue": "5m",
463+
"validators": [
464+
{
465+
"type": "string",
466+
"minLength": 0,
467+
"maxLength": 8192,
468+
"errorMsg": "Max length of text input is 8192"
469+
},
470+
{
471+
"type": "regex",
472+
"pattern": "^\\d+(d|h|m|s)$",
473+
"errorMsg": "Start time must be a digit follow by one of: d, h, m, s."
474+
}
475+
]
476+
},
477+
{
478+
"field": "end_time",
479+
"label": "End Time",
480+
"help": "If left blank, present time at query execution is used. If defined, use short form relative time.",
481+
"required": false,
482+
"type": "text",
483+
"validators": [
484+
{
485+
"type": "string",
486+
"minLength": 0,
487+
"maxLength": 8192,
488+
"errorMsg": "Max length of text input is 8192"
489+
},
490+
{
491+
"type": "regex",
492+
"pattern": "^\\d+(d|h|m|s)$",
493+
"errorMsg": "End time must be a digit follow by one of: d, h, m, s."
494+
}
495+
]
496+
},
497+
{
498+
"field": "dataset_query_string",
499+
"label": "DataSet PowerQuery String",
500+
"help": "DataSet PowerQuery to return results.",
501+
"required": true,
502+
"type": "text",
503+
"validators": [
504+
{
505+
"type": "string",
506+
"minLength": 0,
507+
"maxLength": 8192,
508+
"errorMsg": "Max length of text input is 8192"
373509
}
374510
]
375511
}

output/TA-dataset-1.0.0.tar.gz

-2.94 MB
Binary file not shown.

output/TA-dataset-1.1.0.tar.gz

2.95 MB
Binary file not shown.

package/README/inputs.conf.spec

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,18 @@
22
start_time = Start time for the DataSet query to use. Use shortform (e.g.: 1m, 24h, 3d).
33
end_time = If left blank, present time at query execution is used.
44
dataset_query_string = If left blank, all records (limited by max count) are retrieved.
5+
dataset_query_columns = If left blank, all columns are retrieved.
56
max_count = Specifies the maximum number of records to return, from 1 to 5000. If left blank, the default is 100.
67
python.version = {default|python|python2|python3}
78
start_by_shell = {true|false}
89

10+
[dataset_powerquery://<name>]
11+
start_time = Start time for the DataSet query to use. Use shortform (e.g.: 1m, 24h, 3d).
12+
end_time = If left blank, timestamp and message from all records (limited by max count) are retrieved.
13+
dataset_query_string = If left blank, all records (limited by max count) are retrieved.
14+
python.version = {default|python|python2|python3}
15+
start_by_shell = {true|false}
16+
917
[dataset_alerts://<name>]
1018
start_time = Relative time to query back. Use short form relative time, e.g.: 24h or 30d. Reference https://app.scalyr.com/help/time-reference
1119
python.version = {default|python|python2|python3}

package/app.manifest

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"id": {
66
"group": null,
77
"name": "TA-dataset",
8-
"version": "1.0.0"
8+
"version": "1.1.0"
99
},
1010
"author": [
1111
{

package/bin/dataset_alerts.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -91,17 +91,17 @@ def stream_events(self, inputs, ew):
9191
r_json = r.json() #parse results json
9292

9393
#log information from results
94-
if r_json['status']:
94+
if 'status' in r_json:
9595
logger.info("response status=%s" % str(r_json['status']))
9696

97-
if r_json['warnings']:
97+
if 'warnings' in r_json:
9898
for warning in r_json['warnings']:
9999
logger.warning("response warning=%s" % str(warning))
100100

101-
if r_json['matchingEvents']:
101+
if 'matchingEvents' in r_json:
102102
logger.info("response matches=%s" % str(r_json['matchingEvents']))
103103

104-
if r_json['omittedEvents']:
104+
if 'omitedEvents' in r_json:
105105
logger.warning("response omitted=%s" % str(r_json['omittedEvents']))
106106

107107
#parse results, match returned columns with corresponding values
@@ -125,9 +125,6 @@ def stream_events(self, inputs, ew):
125125

126126
if event_time > checkpoint_time:
127127
#if greater than current checkpoint, update checkpoint and write event
128-
logger.debug("saving checkpoint %s" % (str(event_time)))
129-
checkpoint.update(input_name, {"timestamp": event_time})
130-
131128
splunk_dt = normalize_time(int(event_time))
132129
ds_event = json.dumps(ds_event_dict)
133130
#create and write event
@@ -139,6 +136,9 @@ def stream_events(self, inputs, ew):
139136
)
140137
logger.debug("writing event with event_time=%s and checkpoint=%s" % (str(event_time), str(checkpoint_time)))
141138
ew.write_event(event)
139+
140+
logger.debug("saving checkpoint %s" % (str(event_time)))
141+
checkpoint.update(input_name, {"timestamp": event_time})
142142
else:
143143
logger.debug("skipping due to event_time=%s is less than checkpoint=%s" % (str(event_time), str(checkpoint_time)))
144144
else:

package/bin/dataset_common.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import os.path as op
33
import sys
44
import json
5+
import time
56
from collections import OrderedDict
67

78
import import_declare_test
@@ -128,4 +129,34 @@ def get_proxy(session_key, logger):
128129

129130
def normalize_time(ds_time):
130131
splunk_dt = ds_time / 1000000000
131-
return splunk_dt
132+
return splunk_dt
133+
134+
135+
def relative_to_epoch(relative):
136+
"""
137+
This function uses return epoch time from a relative time
138+
:param relative: shorthand relative time stamp (e.g. "24h" for 24 hours ago)
139+
:return : time_relative in epoch as an integer
140+
"""
141+
relative_num = int(relative[0:-1])
142+
relative_unit = relative[-1:]
143+
#get current epoch time in milliseconds
144+
time_current = int(time.time())
145+
num_seconds = 1
146+
if relative_unit == 'm':
147+
num_seconds = num_seconds * 60
148+
elif relative_unit == 'h':
149+
num_seconds = num_seconds * 60 * 60
150+
elif relative_unit == 'd':
151+
num_seconds = num_seconds * 60 * 60 * 24
152+
153+
time_relative = time_current - (relative_num * num_seconds)
154+
return time_relative
155+
156+
157+
def get_maxcount(max):
158+
#query API returns max 5,000 results per call
159+
if max > 5000:
160+
return 5000
161+
else:
162+
return max

0 commit comments

Comments
 (0)