Use standard io Writer and Reader where possible#101
Use standard io Writer and Reader where possible#101philhassey merged 9 commits intocedar-policy:mainfrom
Conversation
philhassey
left a comment
There was a problem hiding this comment.
This looks good, see the linter errors for some stuff to appease it. I suggest adding an extra test just to prove trailing line breaks work as expected. Make sure you obtain 100% code coverage as well in your unit testing.
Thanks a ton!
|
@philhassey I fixed the lint violations and added the linebreaks. For the record: those errors were not handled before this change (I think golangci-lint might have some default rule for errcheck not to warn for buf.Write methods) For now I opted to keep the previous behavior: ignore errors. But I wonder if that's the right thing to do.
But I think that should be a separate PR. Happy to open an issue with the details if you agree. |
|
Coverage should be a 100% now. |
fd72b96 to
fcfc2ca
Compare
|
@philhassey I made the requested changes. But I just realized that the last write to the writer would still normally return an error: _, _ = w.Write(buf.Bytes())I propose returning an error from MarshalCedar (similar to the Unmarshal one). If you agree, I can make the change. |
philhassey
left a comment
There was a problem hiding this comment.
Thanks again, just a few more tweaks to get this over the line.
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
fcfc2ca to
8bad87e
Compare
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
8bad87e to
4160a94
Compare
|
@philhassey I've just realized I've been working with the wrong Obviously, encoding/decoding Answering your latest round of review:
What changed with the latest version of the PR: I added an encoder and a decoder type to the root package. Since they follow standard Go practices, I think their API is stable enough to skip the exp package (though I'm happy to move them if you give me guidance where exactly they should be placed) They call into the parser encoder/decoder (which is still necessary due to how parsing works internally) The public encoder/decoder have doc comments as requested. Sorry for the confusion. But I think this PR is now solid. |
philhassey
left a comment
There was a problem hiding this comment.
Okay, I think the shape looks good to me now. I'll get one more maintainer to give it a look just to be sure. Thanks a ton for the work on this! There's a little bit of coverage still missing:
I do wonder if in that function it'd be better if it used OnceValue to return the error so that the error was returned repeatedly in all future calls? Otherwise I think it might panic in line 67.
So maybe a test to call it a second time to prove it correctly emits the same error again in that case?
patjakdev
left a comment
There was a problem hiding this comment.
LGTM! Thank you so much for your contribution.
The one significant piece of feedback I have is around making the tokenizer stream-oriented as well so that Decode() doesn't block until EOF. It doesn't have to be done in this PR, but I'm imagining we're going to want that functionality at some point.
| @@ -74,9 +74,13 @@ func (t Token) intValue() (int64, error) { | |||
| } | |||
|
|
|||
| func Tokenize(src []byte) ([]Token, error) { | |||
There was a problem hiding this comment.
I'd be fine just changing the signature of Tokenize() and forcing callers who have a byte slice to call bytes.NewBuffer() to convert it to an io.Reader. I think there are only two non-test callers.
There was a problem hiding this comment.
See my comment about a new PR.
internal/parser/cedar_unmarshal.go
Outdated
| type Decoder struct { | ||
| r io.Reader | ||
|
|
||
| once sync.Once |
There was a problem hiding this comment.
Let's not use a synchronization primitive for what's basically just a boolean. Just check for p.parser == nil in Decode().
There was a problem hiding this comment.
It's not really there for synchronization, but to make sure that part of the code is called exactly once.
The Tokenizer reads the entire stream and executing that code again would yield a different error on subsequent calls.
(But as @philhassey pointed out, even this code has a bug in it: subsequent calls would indeed lead to panic at the moment.)
I'm happy to remove the once variable @patjakdev if you tell me what you would like to see instead. I don't think a nil check is enough, because that would still run the tokenizer again.
If the tokenizer becomes stream-aware, this becomes a non-issue though and we can get rid of it.
internal/parser/cedar_unmarshal.go
Outdated
| var err error | ||
|
|
||
| d.once.Do(func() { | ||
| tokens, e := TokenizeReader(d.r) |
There was a problem hiding this comment.
I wonder a bit about this...
As it stands, the tokenizer is going to read until EOF, so you can't really use the Decoder to accept a stream of policies like you might hope without changing the tokenizer to also be stream-oriented.
For example, say you had a TCP connection open to a remote server which was sending you policies but the protocol you're using requires that the client acknowledge each policy before it will send the next one. If you passed the receiver stream into the Decoder and then called Decode(), it would just hang forever waiting for EOF.
Maybe we don't have to tackle that in this PR, but we should think about it...
There was a problem hiding this comment.
My thoughts exactly. I didn't want to go down that rabbit hole in this PR though.
From an API perspective, the current PR solves my problem, but I'm happy to take a stab at making the tokenizer stream-oriented in a followup PR.
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
|
Good catch @philhassey ! Fixed coverage and added another call Decode. I believe I replied to both of your comments. If I left anything unanswered, let me know. |
| createParser func() (*parser, error) | ||
| } | ||
|
|
||
| func NewDecoder(r io.Reader) *Decoder { |
There was a problem hiding this comment.
@patjakdev @sagikazarmark How's this for a variation, I think it handles the errors how I want, does what @sagikazarmark needs and avoids using sync like @patjakdev wants:
type Decoder struct {
r io.Reader
parser *parser
parserErr error
}
func (d *Decoder) getParser() (*parser, error) {
if d.parser != nil || d.parserErr != nil {
return d.parser, d.parserErr
}
tokens, err := TokenizeReader(d.r)
if err != nil {
d.parserErr = err
return nil, err
}
p := newParser(tokens)
d.parser = &p
return d.parser, nil
}
func NewDecoder(r io.Reader) *Decoder {
return &Decoder{
r: r,
}
}
func (d *Decoder) Decode(p *Policy) error {
parser, err := d.getParser()
if err != nil {
return err
}
if parser.peek().isEOF() {
return io.EOF
}
return p.fromCedar(parser)
}
There was a problem hiding this comment.
If the tokenizer gets rewritten anyway, I don't think it makes much of a difference (except maybe that the current solution is more idiomatic IMO).
I can make the change if you want me to, but my vote goes to the current version.
(I wouldn't mind getting this one merged and iterate later now that I managed to get the CI green)
There was a problem hiding this comment.
If we're going to be serious about ensuring the same error is always returned from Decode() once an error is encountered, how about something like this?
type Decoder struct {
r io.Reader
parser *parser
err error
}
func NewDecoder(r io.Reader) *Decoder {
return &Decoder{
r: r,
}
}
func (d *Decoder) decode(p *Policy) error {
if d.parser != nil {
tokens, err := TokenizeReader(d.r)
if err != nil {
return err
}
parser := newParser(tokens)
d.parser = &parser
}
if d.parser.peek().isEOF() {
return io.EOF
}
return p.fromCedar(d.parser)
}
func (d *Decoder) Decode(p *Policy) error {
if d.err != nil {
return d.err
}
d.err = p.decode()
return d.err
}
You could also inline decode() into Decode(), but I thought that ended up with a bit too much indentation for the bulk of the code in the function.
There was a problem hiding this comment.
That said, I'm fine with what you've got right now and happy to merge it.
|
I started working on making the tokenizer branching off from this one. If you are good with this PR, could you merge it? I can provide a PR for the tokenizer shortly. |
|
Friendly ping :) |
1 similar comment
|
Friendly ping :) |
|
Sorry for the delay here, there was some trouble with our notifications! I'm merging this, I might do a tiny follow-up PR to remove the sync dependency, then I can cut a release. |
Description of changes:
This PR contains a few refactors and features:
Refactor: use standard
io.Writerandio.Readerwhere possible instead ofbytes.Bufferand[]byte. This is necessary for the feature in the PR, but it's also for good hygiene (relying on standard interfaces).Feature: add a Decoder to the parser package (and x/exp) that allows reading multiple policies from any
io.Reader(practically: files). It is modeled after the decoder in the stdlib JSON package.The primary use case is not having to read all contents of a file first, but let the component that actually needs the content do the reading (which is good practice).
Naturally, an encoder would similarly make sense, but I wanted to open a PR first to get feedback.