A flexible synthetic data generator in Go, designed to produce realistic logs and metrics for testing and development.
Requires Go version 1.24+
- Clone the repository (alternatively download the latest release)
- Copy
config.sample.yamland edit as needed. - Start the data generator with config file as the only parameter
go run cmd/main.go --config ./config.yaml
Tip
Use --debug flag for debug level logs.
Given below are the supported configuration options.
Check config.sample.yaml for reference.
Given below are supported input types and their related environment variable overrides,
| YAML Property | Environment Variable | Description |
|---|---|---|
type |
ENV_INPUT_TYPE |
Specifies the input data type (e.g., LOGS, METRICS, ALB, NLB, VPC, CLOUDTRAIL, WAF). |
delay |
ENV_INPUT_DELAY |
Delay between a data point. Accepts value in format like 5s (5 seconds), 10ms (10 milliseconds). |
batching |
ENV_INPUT_BATCHING |
Set time delay between data batches. Accepts a time value similar to delay. Default is set to 0 (no batching). Note: Batching is most effective with bulk ingest endpoints like S3 and Firehose. For CloudWatch Logs, batching may not be suitable as it concatenates multiple log entries into single messages. |
max_batch_size |
ENV_INPUT_MAX_BATCH_SIZE |
Set maximum byte size of a batch. Default is to ignore (no max size). |
max_data_points |
ENV_INPUT_MAX_DATA_POINTS |
Set maximum amount of data points to generate. Default is to ignore (no max limit). |
max_runtime |
ENV_INPUT_MAX_RUNTIME |
Set maximum duration load generator will run. Default is to ignore (no max limit). |
Note about input type,
LOGS: ECS (Elastic Common Schema) formatted logs based on zapMETRICS: Generate metrics similar to a CloudWatch metrics entryALB: Generate AWS ALB formatted logs with some random contentNLB: Generate AWS NLB formatted logs with some random contentVPC: Generate AWS VPC formatted logs with randomized contentCLOUDTRAIL: Generate AWS CloudTrail formatted logs with randomized content. Data is generated for AWS S3 Data Event.WAF: Generate AWS WAF formatted logs with randomized content
Example:
input:
type: LOGS # Input type LOGS
delay: 500ms # 500 milliseconds between each data point
batching: 10s # Emit generated data batched within 10 seconds
max_batch_size: 10000 # Limit maximum batch size to 10,000 bytes. The output is capped at 1000 bytes/second max
max_data_points: 10000 # Exit input after generating 10,000 data pointsTip
When max_batch_size is reached, elapsed time for batching will be considered before generating new data
Given below are supported output types (environment variable ENV_OUT_TYPE),
- FILE: Output to a file
- FIREHOSE: Output to a Firehose stream
- CLOUDWATCH_LOG: Output to a CloudWatch log group
- S3: Output to a S3 bucket
Sections below provide output specific configurations
| YAML Property | Environment Variable | Description |
|---|---|---|
location |
ENV_OUT_LOCATION |
Output file location. Default to ./out. When batching, file suffix will increment with numbers (e.g., out_0, out_2). |
Example:
output:
type: FILE
config:
location: "./data"| YAML Property | Environment Variable | Description |
|---|---|---|
s3_bucket |
ENV_OUT_S3_BUCKET |
S3 bucket name (required). |
compression |
ENV_OUT_COMPRESSION |
To compress or not the output. Currently supports gzip. |
path_prefix |
ENV_OUT_PATH_PREFIX |
Optional prefix for the bucket entry. Default to logFile-. |
Example:
output:
type: S3
config:
s3_bucket: "testing-bucket"
compression: gzip
path_prefix: "datagen"| YAML Property | Environment Variable | Description |
|---|---|---|
stream_name |
ENV_OUT_STREAM_NAME |
Firehose stream name (required). |
Example:
output:
type: FIREHOSE
config:
stream_name: "my-firehose-stream"| YAML Property | Environment Variable | Description |
|---|---|---|
log_group |
ENV_OUT_LOG_GROUP |
CloudWatch log group name. |
log_stream |
ENV_OUT_LOG_STREAM |
Log group stream name. |
Example:
output:
type: CLOUDWATCH_LOG
config:
logGroup: "MyGroup"
logStream: "data"Note
CloudWatch Logs API (PutLogEvents) is optimized for single log messages per API call. When batching is enabled, multiple log entries are concatenated into a single message, which may not be ideal for log analysis and searching in CloudWatch. For CloudWatch destinations, consider setting batching: 0s (no batching) or using a small delay without batching. Batching is more suitable for bulk ingest endpoints like S3 and Firehose.
Currently, this project only support AWS Cloud Service Provider (CSP). Given below are available configurations,
| YAML Property | Environment Variable | Description |
|---|---|---|
region |
AWS_REGION |
Region to use by exporters. Default is us-east-1. |
profile |
AWS_PROFILE |
Credential profile to use by exporters. Default is default. |
Example:
aws:
region: "us-east-1"
profile: "default"Generate ECS-formatted logs every 2s, batch them in 10 seconds and forward to S3 bucket
input:
type: LOGS
delay: 2s
batching: 10s
output:
type: s3
config:
s3_bucket: "testing-bucket"Generate ALB logs. No delay between data points (continuous data generating).
Limit batching to 10 seconds and max batch size is set to 10MB. This translates to ~1 MB/second data load.
S3 files will be in gzip format.
input:
type: ALB
delay: 0s
batching: 10s
max_batch_size: 10000000
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate VPC logs and limit to 2 data points. Then upload it to S3 in gzip format.
input:
type: VPC
delay: 1s
max_data_points: 2
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate CLOUDTRAIL logs and limit to generator runtime of 5 minutes.
input:
type: CLOUDTRAIL
delay: 10us # 10 microseconds between data points
batching: 10s
max_runtime: 5m # 5 minutes
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"