RFC: Parsing record contents directly

We often need to construct an RR from its name, type, TTL and content (for example, when reading records from the database or from user API requests.)

We currently use something similar to `NewRR(fmt.Sprintf("%s %d IN %s %s", name, ttl, type, content))`, but that requires a fair amount hacky extra work:

- The content (or even the name!) could contain semicolons, which would be interpreted as starting a comment.
- The content could contain newlines, causing the A record content `1.2.3.4\n5.6.7.8` to parse without errors (as `1.2.3.4`).
- Perhaps it would even be possible to inject an `$INCLUDE` somehow, so we need to disable that as a precaution.
- We have to deal with special cases around quoting and escaping.
- We ignore leading and trailing whitespace, meaning `  fe80::1\t` is considered a valid AAAA record content.
- etc.

It's also currently impossible to parse generic (RFC 3597) records into a `*dns.RFC3597` when the underlying type is known to this library because `NewRR` always returns the known type's struct in that case, even when the content started with a `\#` token. `ToRFC3597` can be used to convert it back to a `*dns.RFC3597`, but that calls `pack()`, which changes behavior depending on the underlying type. (For example, calling `NewRR` with `. 1 IN TYPE9999 \# 3 abcdef` produces Rdata `\# 3 abcdef` as expected, whereas changing TYPE9999 to TYPE48 results in `\# 4 abcdef00` instead.)

This is far from ideal for our use case, so I've taken a stab at trying to build something more reasonable, which would require additional public interface. In short, I've been trying to build something that parses only the record's content, without any special zone file syntax.

My first thought was that this would be fairly simple to do with a single function (this one isn't super clean; consider it an example or first draft):

```
func ParseRdata(newFn func() RR, h *RR_Header, origin string, r io.Reader) (RR, error) {
       if origin != "" {
               origin = Fqdn(origin)
               if _, ok := IsDomainName(origin); !ok {
                       return nil, &ParseError{"", "bad initial origin name", lex{}}
               }
       }

       c := newZLexer(r)

       var rr RR

       parseAsRFC3597 := newFn == nil

       // If a newFn function was provided but the content starts with
       // a `\#` token, ignore newFn and parse as RFC 3597 regardless.
       if c.Peek().token == `\#` {
               parseAsRFC3597 = true
       }

       if parseAsRFC3597 {
               rr = &RFC3597{Hdr: *h}
       } else {
               rr = newFn()
               *rr.Header() = *h
       }

       if err := rr.parse(c, origin); err != nil {
               // err is a concrete *ParseError without the file field set.
               // The setParseError call below will construct a new
               // *ParseError with file set to zp.file.

               // err.lex may be nil in which case we substitute our current
               // lex token.
               if err.lex == (lex{}) {
                       return nil, &ParseError{err: err.err}
               }

               return nil, &ParseError{"", err.err, err.lex}
       }

       if err := c.Err(); err != nil {
               return nil, err
       }

       n, ok := c.Next()

       return rr, nil
}
```

This gets us most of the way there, but I realized the _lexer_ is also context-aware and also parses comments. The former is easily worked around by initializing the `zlexer` with `owner: false`, which makes it expect content instead of an owner directive. The latter is more problematic and seems to require an entirely new, purpose-built lexer. Since `RR.parse()` expects a `*zlexer`, that would either involve making `zlexer` an interface, or adding the new logic to the existing `zlexer` struct. In addition, I noticed all `RR.parse()` functions call `slurpRemainder()`, which would then possibly have to be moved out.

I'd be happy to work on this but before I spend too much time I wanted to ask whether this is a change you'd consider merging at all (hopefully), and, if so, whether someone with a deeper understanding of the parser and lexer has any design thoughts for me. I know you all tend to be conservative when it comes to adding new public interface (which is good), but I think this could be a useful addition that would enable use cases that are otherwise very difficult to achieve correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Parsing record contents directly #1530

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RFC: Parsing record contents directly #1530

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions