Skip to content

Use standard io Writer and Reader where possible#101

Merged
philhassey merged 9 commits intocedar-policy:mainfrom
sagikazarmark:marshal-io
Oct 20, 2025
Merged

Use standard io Writer and Reader where possible#101
philhassey merged 9 commits intocedar-policy:mainfrom
sagikazarmark:marshal-io

Conversation

@sagikazarmark
Copy link
Copy Markdown
Contributor

@sagikazarmark sagikazarmark commented Sep 19, 2025

Description of changes:

This PR contains a few refactors and features:

Refactor: use standard io.Writer and io.Reader where possible instead of bytes.Buffer and []byte. This is necessary for the feature in the PR, but it's also for good hygiene (relying on standard interfaces).

Feature: add a Decoder to the parser package (and x/exp) that allows reading multiple policies from any io.Reader (practically: files). It is modeled after the decoder in the stdlib JSON package.

The primary use case is not having to read all contents of a file first, but let the component that actually needs the content do the reading (which is good practice).

Naturally, an encoder would similarly make sense, but I wanted to open a PR first to get feedback.

Copy link
Copy Markdown
Contributor

@philhassey philhassey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, see the linter errors for some stuff to appease it. I suggest adding an extra test just to prove trailing line breaks work as expected. Make sure you obtain 100% code coverage as well in your unit testing.

Thanks a ton!

@sagikazarmark
Copy link
Copy Markdown
Contributor Author

@philhassey I fixed the lint violations and added the linebreaks.

For the record: those errors were not handled before this change (I think golangci-lint might have some default rule for errcheck not to warn for buf.Write methods)

For now I opted to keep the previous behavior: ignore errors.

But I wonder if that's the right thing to do.

UnmarshalCedar returns an error, I think MarshalCedar should too.

But I think that should be a separate PR. Happy to open an issue with the details if you agree.

@sagikazarmark
Copy link
Copy Markdown
Contributor Author

Coverage should be a 100% now.

@sagikazarmark sagikazarmark force-pushed the marshal-io branch 3 times, most recently from fd72b96 to fcfc2ca Compare September 24, 2025 15:40
@sagikazarmark
Copy link
Copy Markdown
Contributor Author

@philhassey I made the requested changes.

But I just realized that the last write to the writer would still normally return an error:

_, _ = w.Write(buf.Bytes())

I propose returning an error from MarshalCedar (similar to the Unmarshal one).

If you agree, I can make the change.

Copy link
Copy Markdown
Contributor

@philhassey philhassey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, just a few more tweaks to get this over the line.

Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
@sagikazarmark
Copy link
Copy Markdown
Contributor Author

@philhassey I've just realized I've been working with the wrong Policy type the whole time. 🤦

Obviously, encoding/decoding parser.Policy is necessary, but far from enough since that's not the exported Policy. All these policy structs confused me. 😄

Answering your latest round of review:

  • No, the decoder can't be moved out of parser, because it relies on exported features
  • As a result, I think the encoder should remain in the parser package too
  • The exp/parser package is not even necessary, because it's not what as I user I want to interact with
  • So I kept the decoder and encoder in the parser package as they are
  • Reverted changes to existing marshaling/unmarshaling code as you requested (with the exception that the Tokenizer accepts a reader as well)
  • I did NOT add doc comments to the decoder and encoder in the parser package, because none of the other functions have them

What changed with the latest version of the PR:

I added an encoder and a decoder type to the root package.

Since they follow standard Go practices, I think their API is stable enough to skip the exp package (though I'm happy to move them if you give me guidance where exactly they should be placed)

They call into the parser encoder/decoder (which is still necessary due to how parsing works internally)

The public encoder/decoder have doc comments as requested.

Sorry for the confusion. But I think this PR is now solid.

Copy link
Copy Markdown
Contributor

@philhassey philhassey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think the shape looks good to me now. I'll get one more maintainer to give it a look just to be sure. Thanks a ton for the work on this! There's a little bit of coverage still missing:

image

I do wonder if in that function it'd be better if it used OnceValue to return the error so that the error was returned repeatedly in all future calls? Otherwise I think it might panic in line 67.

So maybe a test to call it a second time to prove it correctly emits the same error again in that case?

Copy link
Copy Markdown
Collaborator

@patjakdev patjakdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you so much for your contribution.

The one significant piece of feedback I have is around making the tokenizer stream-oriented as well so that Decode() doesn't block until EOF. It doesn't have to be done in this PR, but I'm imagining we're going to want that functionality at some point.

@@ -74,9 +74,13 @@ func (t Token) intValue() (int64, error) {
}

func Tokenize(src []byte) ([]Token, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine just changing the signature of Tokenize() and forcing callers who have a byte slice to call bytes.NewBuffer() to convert it to an io.Reader. I think there are only two non-test callers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about a new PR.

type Decoder struct {
r io.Reader

once sync.Once
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use a synchronization primitive for what's basically just a boolean. Just check for p.parser == nil in Decode().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really there for synchronization, but to make sure that part of the code is called exactly once.

The Tokenizer reads the entire stream and executing that code again would yield a different error on subsequent calls.

(But as @philhassey pointed out, even this code has a bug in it: subsequent calls would indeed lead to panic at the moment.)

I'm happy to remove the once variable @patjakdev if you tell me what you would like to see instead. I don't think a nil check is enough, because that would still run the tokenizer again.

If the tokenizer becomes stream-aware, this becomes a non-issue though and we can get rid of it.

var err error

d.once.Do(func() {
tokens, e := TokenizeReader(d.r)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder a bit about this...

As it stands, the tokenizer is going to read until EOF, so you can't really use the Decoder to accept a stream of policies like you might hope without changing the tokenizer to also be stream-oriented.

For example, say you had a TCP connection open to a remote server which was sending you policies but the protocol you're using requires that the client acknowledge each policy before it will send the next one. If you passed the receiver stream into the Decoder and then called Decode(), it would just hang forever waiting for EOF.

Maybe we don't have to tackle that in this PR, but we should think about it...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts exactly. I didn't want to go down that rabbit hole in this PR though.

From an API perspective, the current PR solves my problem, but I'm happy to take a stab at making the tokenizer stream-oriented in a followup PR.

Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
Signed-off-by: Mark Sagi-Kazar <mark.sagikazar@gmail.com>
@sagikazarmark
Copy link
Copy Markdown
Contributor Author

Good catch @philhassey !

Fixed coverage and added another call Decode.

I believe I replied to both of your comments. If I left anything unanswered, let me know.

createParser func() (*parser, error)
}

func NewDecoder(r io.Reader) *Decoder {
Copy link
Copy Markdown
Contributor

@philhassey philhassey Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patjakdev @sagikazarmark How's this for a variation, I think it handles the errors how I want, does what @sagikazarmark needs and avoids using sync like @patjakdev wants:

type Decoder struct {
	r         io.Reader
	parser    *parser
	parserErr error
}

func (d *Decoder) getParser() (*parser, error) {
	if d.parser != nil || d.parserErr != nil {
		return d.parser, d.parserErr
	}
	tokens, err := TokenizeReader(d.r)
	if err != nil {
		d.parserErr = err
		return nil, err
	}
	p := newParser(tokens)
	d.parser = &p
	return d.parser, nil
}

func NewDecoder(r io.Reader) *Decoder {
	return &Decoder{
		r: r,
	}
}

func (d *Decoder) Decode(p *Policy) error {
	parser, err := d.getParser()
	if err != nil {
		return err
	}

	if parser.peek().isEOF() {
		return io.EOF
	}

	return p.fromCedar(parser)
}

Copy link
Copy Markdown
Contributor Author

@sagikazarmark sagikazarmark Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tokenizer gets rewritten anyway, I don't think it makes much of a difference (except maybe that the current solution is more idiomatic IMO).

I can make the change if you want me to, but my vote goes to the current version.

(I wouldn't mind getting this one merged and iterate later now that I managed to get the CI green)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to be serious about ensuring the same error is always returned from Decode() once an error is encountered, how about something like this?

type Decoder struct {
	r       io.Reader
	parser  *parser
	err     error
}

func NewDecoder(r io.Reader) *Decoder {
	return &Decoder{
		r: r,
	}
}

func (d *Decoder) decode(p *Policy) error {
    if d.parser != nil {
	    tokens, err := TokenizeReader(d.r)
  	    if err != nil {
	        return err
        }
	    parser := newParser(tokens)
        d.parser = &parser
	}

	if d.parser.peek().isEOF() {
		return io.EOF
	}

	return p.fromCedar(d.parser)
}

func (d *Decoder) Decode(p *Policy) error {
    if d.err != nil {
        return d.err
    }

    d.err = p.decode()
    return d.err
}

You could also inline decode() into Decode(), but I thought that ended up with a bit too much indentation for the bulk of the code in the function.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I'm fine with what you've got right now and happy to merge it.

@sagikazarmark
Copy link
Copy Markdown
Contributor Author

I started working on making the tokenizer branching off from this one.

If you are good with this PR, could you merge it? I can provide a PR for the tokenizer shortly.

@sagikazarmark
Copy link
Copy Markdown
Contributor Author

Friendly ping :)

1 similar comment
@sagikazarmark
Copy link
Copy Markdown
Contributor Author

Friendly ping :)

@philhassey philhassey dismissed their stale review October 20, 2025 17:59

Patrick OK'd it.

@philhassey
Copy link
Copy Markdown
Contributor

Sorry for the delay here, there was some trouble with our notifications! I'm merging this, I might do a tiny follow-up PR to remove the sync dependency, then I can cut a release.

@philhassey philhassey merged commit 960e041 into cedar-policy:main Oct 20, 2025
2 checks passed
@sagikazarmark sagikazarmark deleted the marshal-io branch November 21, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants