Skip to content

VRL function parse_tokens does not handle unescaped "\" #1426

@CyBeRoni

Description

@CyBeRoni

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When trying to parse a string from a source that does not consider \ to be special, and thus does not escape it (as \\), parse_tokens does not seem to handle tokenizing that string correctly.

Given the following message (which is from haproxy via syslog, but with irrelevant bits removed):

{
    "message1": "start \"GET https://vector.dev/\\ HTTP/2.0\" remainder",
    "message2": "start \"GET https://vector.dev/ HTTP/2.0\" remainder"
}

(It is of course escaped here because of the JSON encoding)

and the following VRL program:

.tokens_bad = parse_tokens!(.message1)
.tokens_good = parse_tokens!(.message2)

This should result in 3 tokens for each message, but message1 has only two, and the rest of the string is messed up too, now including the surrounding quotes:

{
	"tokens_bad": [
		"start",
		"\"GET https://vector.dev/\\ HTTP/2.0\" remainder"
	],
	"tokens_good": [
		"start",
		"GET https://vector.dev/ HTTP/2.0",
		"remainder"
	]
}

Version

vector 0.47.0 (x86_64-unknown-linux-gnu 3d5af22 2025-05-20 13:53:41.638057046)

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugA code related bug

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions