You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+57-1Lines changed: 57 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -159,7 +159,9 @@ Supported write options:
159
159
160
160
### Reading from Microsoft Sentinel / Azure Monitor
161
161
162
-
The data source supports batch reading logs from Azure Monitor / Log Analytics workspaces using KQL (Kusto Query Language) queries. If schema isn't specified with `.schema`, it will be inferred automatically.
162
+
The data source supports both batch and streaming reads from Azure Monitor / Log Analytics workspaces using KQL (Kusto Query Language) queries. If schema isn't specified with `.schema`, it will be inferred automatically.
query ="MyCustomTable_CL | where TimeGenerated > ago(1h)"
234
236
```
235
237
238
+
#### Streaming Read
239
+
240
+
The data source supports streaming reads from Azure Monitor / Log Analytics. The streaming reader uses time-based offsets to track progress and splits time ranges into partitions for parallel processing.
-`workspace_id` (string, required) - Log Analytics workspace ID
278
+
-`query` (string, required) - KQL query to execute (should not include time filters - these are added automatically)
279
+
-`start_time` (string, optional, default: "latest") - Start time in ISO 8601 format (e.g., "2024-01-01T00:00:00Z"). Use "latest" to start from current time
280
+
-`partition_duration` (int, optional, default: 3600) - Duration in seconds for each partition (controls parallelism)
281
+
-`tenant_id` (string, required) - Azure Tenant ID
282
+
-`client_id` (string, required) - Application ID (client ID) of Azure Service Principal
283
+
-`client_secret` (string, required) - Client Secret of Azure Service Principal
284
+
-`checkpointLocation` (string, required) - Directory path for Spark streaming checkpoints
285
+
286
+
**Important notes for streaming:**
287
+
- The reader automatically tracks the timestamp of the last processed data in checkpoints
288
+
- Time ranges are split into partitions based on `partition_duration` for parallel processing
289
+
- The query should NOT include time filters (e.g., `where TimeGenerated > ago(1d)`) - the reader adds these automatically based on offsets
290
+
- Use `start_time: "latest"` to begin streaming from the current time (useful for monitoring real-time data)
291
+
236
292
## Simple REST API
237
293
238
294
Right now only implements writing to arbitrary REST API - both batch & streaming. Registered data source name is `rest`.
0 commit comments