A system that fetches, transforms, and routes audit logs from multiple SaaS platforms and CSV sources to various destinations. The system normalizes different log formats into a unified schema while providing enrichment capabilities.
- Multi-source log ingestion (SaaS APIs, CSV files)
- Log transformation and normalization
- Data enrichment capabilities
- Multi-destination routing
- Configurable processing pipeline
-
SaaS APIs:
- GitHub
- AWS CloudTrail
- Google Workspace
- Slack
- Jira
- 1Password
- (Extensible for additional sources)
-
CSV Files:
- Custom format support
- Automatic schema detection
- Configurable field mapping
- SIEM systems
- Log analytics platforms
- Cloud storage (S3, Azure Blob, etc.)
{
"timestamp": "ISO8601 datetime",
"source": "string",
"event_type": "string",
"actor": {
"id": "string",
"name": "string",
"email": "string",
"ip_address": "string"
},
"target": {
"id": "string",
"type": "string",
"name": "string"
},
"action": "string",
"metadata": {
"raw_event": "object",
"enrichments": "object"
}
}
- IP geolocation
- User role mapping
- Asset classification
- Threat intelligence lookups
- Custom enrichment plugins
- YAML-based configuration
- Environment variable support
- Source-specific credentials management
- Rate limiting and throttling controls
- Retry policies
- Health checks
- Performance metrics
- Error tracking
- Processing statistics
- Alerting capabilities
- Credential encryption
- TLS for all external communications
- Access control
- Audit logging of system operations
- Python 3.8+
- Docker support
- CI/CD pipeline integration
- Testing framework
- Documentation requirements
- Docker containers
- Kubernetes
- Serverless functions
- On-premise installation
- Real-time processing
- Machine learning integration
- Advanced analytics
- Custom plugin framework
- Horizontal scaling
Log fetchers must return events in chronological order (oldest first). This is important for:
- Consistent processing of events
- Correct state management in Windmill scripts
- Reliable incremental fetching
For example, if fetching logs from 1:00 PM to 2:00 PM:
[
{timestamp: "1:00:00", ...},
{timestamp: "1:00:05", ...},
{timestamp: "1:15:30", ...},
{timestamp: "1:59:59", ...}
]
This ordering requirement applies both to:
- Events within a single page of results
- Events across multiple pages when pagination is used