-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Feature Description
Introduce a configurable log processing pipeline that automatically parses, transforms, and enriches log data at ingestion time — turning unstructured log lines from existing systems into fully queryable, structured entries without requiring application changes.
Problem/Use Case
Many teams want to use Logtide to centralize logs from existing infrastructure (nginx, syslog, legacy applications) that emit unstructured or semi-structured text. Today, these logs land in Logtide as raw message strings with no queryable fields beyond the top-level ones. You can't filter by HTTP status code, response time, or request path unless the application already emits structured JSON — which most legacy systems don't.
The current PII masking applies to raw messages, but there's no way to first extract fields and then mask sensitive values within them.
Proposed Solution
- Pipeline definitions: per-project configuration (YAML or UI-based) defining an ordered sequence of processing steps applied to incoming logs before storage
- Built-in parsers for common log formats: nginx access log, Apache combined log, syslog, logfmt, and auto-detected JSON
- Custom pattern matching: grok-like named captures that extract fields from arbitrary log lines and promote them to queryable metadata (e.g. extracting
http_status,response_time_ms,request_pathfrom a raw nginx line) - Enrichment steps: GeoIP lookup for extracted IP fields, service name normalization, environment tag injection based on source
- Pipeline ordering with PII masking: pipeline steps execute first to extract structured fields, then the existing PII masking rules apply on top of the extracted data
- Preview UI: paste a sample log line and see the parsed output in real time before saving the pipeline configuration
Alternatives Considered
- Logstash/Fluentd as a sidecar: requiring users to run a log shipper that pre-processes logs before sending to Logtide. Rejected because it adds operational complexity and goes against the goal of simplifying the self-hosted stack.
- Client-side SDK enrichment: asking developers to structure logs in the SDK before sending. This works for greenfield apps but doesn't help with existing infrastructure logs.
Implementation Details (Optional)
- Pipelines should run inside BullMQ ingestion workers, not in the HTTP ingest handler, to avoid adding latency to the ingest endpoint and to allow retries on processing failures
- Pipeline configurations stored in the database as JSON, editable via the settings UI and optionally importable from YAML files
- The grok pattern engine can be implemented with a well-tested library (e.g.
node-grok) or a custom named-regex approach to keep the dependency footprint small - Parsed fields should be stored in the existing
metadataJSONB column on the logs hypertable — no schema changes required
Priority
- Critical - Blocking my usage of LogTide
- High - Would significantly improve my workflow
- Medium - Nice to have
- Low - Minor enhancement
Target Users
- DevOps Engineers
- Developers
- Security/SIEM Users
- System Administrators
- All Users
Additional Context
This feature significantly lowers the barrier to adopting Logtide for teams with existing infrastructure, removing the "you need to refactor your logging first" objection. The preview UI is particularly important for usability — grok-style patterns are notoriously hard to write without immediate feedback.
Contribution
- I would like to work on implementing this feature