Description
Description
Context
It's not uncommon for documents to have fields with dots in them:
{
"nested": {
"a.b.c": "This is a test"
}
}
When specifying a field in a processor (e.g. grok, rename or others), it's currently not possible to target these fields, because dots are always interpreted as nested objects. { "grok": { "field": "nested.a.b.c" }}
will only work on { "nested": { "a": { "b": { "c": "This is a test" } } } }
.
This is especially relevant for OTel data and the streams project which plans to transform all incoming data to match the otel format.
Solution
A new syntax should be introduced to allow accessing these fields in all processors. Dots are interpreted as nested objects except when enclosed in ['
and ']
:
{ "grok": { "field": "nested['a.b.c']" }}
Some examples:
"resource.attributes['bar.foo']" // matches {"resource": {"attributes": {"bar.foo": "…"}}}
"['resource']['attributes']['bar.foo']" // same as above
"resource.attributes.bar.foo" // matches {"resource": {"attributes": {"bar": {"foo": "…"}}}}
"['resource']['attributes']['bar']['foo']" // matches {"resource": {"attributes": {"bar": {"foo": "…"}}}}
"['resource.attributes']['bar.foo']" // matches {"resource.attributes": {"bar.foo": "…"}}
"['resource.attributes.bar.foo']" // matches {"resource.attributes.bar.foo": "…"}}
It's possible to escape quotes within the quotes using \
to still access field names with brackets in them:
my['weird[\'fieldname\']'] // matches { "my": { "weird['fieldname']": "..." } }
Open questions
How does this syntax play with mustache template which are supported in some cases? For the scope of the observability team, it would be OK to not support it initially - this could be added later on.
Breaking change
This feature constitutes a change of behavior - using ['
followed by ']
in a field name specified in an ingest pipeline is currently allowed and treats these as regular characters. However, these cases are expected to be very rare.
Draft for breaking change proposal: https://github.com/elastic/dev/issues/3091
Why not dot_expander?
The dot_expander processor is addressing a similar need by normalizing the data instead of allowing the user to specify the difference. However, it has some downsides which are unacceptable in some cases:
- Not possible to have a prefix of a dotted field name as a primitive value (especially in OTel this is a common format):
{
"host": "abc",
"host.name": "def" // can't be dot-expanded without breaking host
}
- Possible collisions
{
"host": { "name": "abc" },
"host.name": "def"
}
- Different from OTTL, which allows this style of access
- Changes the shape of the data which loses information - it becomes impossible to tell the difference between dotted field names and nested field names
References
POC: #125566
Discussion: https://github.com/elastic/streams-program/discussions/224