Skip to content

Support encoding indefinite containers #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

CZDanol
Copy link

@CZDanol CZDanol commented Mar 26, 2025

Changes

  • Added the indefinite_containers parameter to the encoder functions. If the parameter is set to True, containers (maps and arrays) are encoded an indefinite containers.

I am using this library to for unittesting a C++ CBOR generator, for which case I needed to generate structures with indefinite containers to be able to do 1:1 byte checks.

@CZDanol CZDanol force-pushed the indefinite-containers branch 9 times, most recently from b473365 to b672722 Compare March 26, 2025 13:55
@agronholm
Copy link
Owner

How are the tests passing for you locally?

@CZDanol CZDanol force-pushed the indefinite-containers branch from b672722 to 088d68b Compare March 26, 2025 14:00
@CZDanol
Copy link
Author

CZDanol commented Mar 26, 2025

I'm getting all sorts of SKIPPED [1] tests/test_types.py:97: requires CPython and I don't know how to resolve it :( So I am only managing to test the python implementation locally.

@CZDanol CZDanol force-pushed the indefinite-containers branch from 088d68b to 597fc86 Compare March 26, 2025 14:01
@agronholm
Copy link
Owner

I'm getting all sorts of SKIPPED [1] tests/test_types.py:97: requires CPython and I don't know how to resolve it :( So I am only managing to test the python implementation locally.

This message indicates that the C extension is unavailable. So likely it didn't compile successfully, or you didn't do it in the first place. How are you running the tests? tox is a good way to test against multiple Python versions and its configuration on cbor2 ensures that the C extensions get built.

@CZDanol
Copy link
Author

CZDanol commented Mar 26, 2025

I was just running pytest on python3.12.
I now tried tox, it gives me "OK" even when I have garbage in the .c files.

Add the indefinite_containers parameter to the encoder functions. If the parameter is set to True, containers (maps and arrays) are encoded an indefinite containers.
@CZDanol CZDanol force-pushed the indefinite-containers branch from 597fc86 to b8b0d89 Compare March 26, 2025 14:09
@CZDanol
Copy link
Author

CZDanol commented Mar 26, 2025

WOOHOO! Passes the tests now :)

@agronholm
Copy link
Owner

That's odd. Try setting CBOR2_BUILD_C_EXTENSION=1 in your environment and then run pip install -e .. That should make sure that the compilation happens.

@coveralls
Copy link

coveralls commented Mar 26, 2025

Coverage Status

coverage: 93.607% (+0.07%) from 93.542%
when pulling 0604190 on CZDanol:indefinite-containers
into d9cee77 on agronholm:master.

@CZDanol
Copy link
Author

CZDanol commented Mar 26, 2025

That CBOR2_BUILD_C_EXTENSION=1 pip install -e . seems to have made the trick. Thanks :)

@agronholm
Copy link
Owner

From a cursory look, only minor code style issues remain. In Python code, I prefer a blank line after a control block if the code continues on the same level. In C code, I prefer a space between if and the opening parentheses. I'll give this a proper review some time later.

@CZDanol
Copy link
Author

CZDanol commented Mar 28, 2025

No problem with doing these changes, but wouldn't these styling issues be better resolved by a pre-commit formatter hook?

@agronholm
Copy link
Owner

If you can point me to a Ruff rule (or a rule in another linter) that can enforce these, I would be most grateful!

@CZDanol
Copy link
Author

CZDanol commented Mar 28, 2025

Don't know about the Python part, but I am quite positive there is an option for the ifs in ClangFormat. There seems to be no autoformatting for the C code set up at the moment.

@agronholm
Copy link
Owner

Something occurred to me: couldn't we simply wrap indefinite containers to mark them to be encoded as such?

@CZDanol
Copy link
Author

CZDanol commented Mar 31, 2025

You mean that you would need to use a specific class other than the dict? I suppose that would also work, but it wouldn't be that practical for my use case - I need to encode all containers as indefinite, implicitly. I could imagine both approaches being accepted/implemented - the wrapping approach would offer granular control over how a specific element gets encoded, whereas my PR provides a general configuration for the encoder, which I believe is also valid and useful.

@agronholm
Copy link
Owner

Alright, I'll take that under advisement. Reviewing this is still on my to-do list, but I don't expect any trouble from that.

static int
encode_length(CBOREncoderObject *self, const uint8_t major_tag,
const uint64_t length)
encode_length_possibly_indefinite(CBOREncoderObject *self, const uint8_t major_tag,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you had to split this function into two parts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I wanted to reduce diff by keeping rest of the encode_length calls the same (not having to add the extra parameter everywhere)
  2. I felt that the bool parameter would be somewhat cryptic if there was not some addition to the function name, so I created encode_length_possibly_indefinite. Most of the use cases don't care about indefinite encoding, so it didn't make sense to drag this function there - so we have two now.

Copy link
Owner

@agronholm agronholm Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a better look at this, and it doesn't make a whole lot of sense. uint64_t cannot possibly be -1 since it's unsigned. The actual value would then become a very large integer (18446744073709551615) instead, yes? While I doubt anyone will really try to actually encode such enormous structures, it does raise other questions, like if you felt this was an acceptable sentinel value, why then did you have to split encode_length() instead of just using -1 as the sentinel for indefinite length? And why can't encode_length() just directly look at self->indefinite_containers and if it's true, ignore the length parameter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode_length cannot directly use indefinite_containers, because it is also used on other places outside of encoding array/map length.


// CBOREncoder.encode_length(self, major_tag, length)
static PyObject *
CBOREncoder_encode_length(CBOREncoderObject *self, PyObject *args)
{
uint8_t major_tag;
uint64_t length;
uint64_t length = -1;
Copy link
Author

@CZDanol CZDanol Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, -1 becomes a very large integer.

I did this because I was too lazy to set up a custom structure for the uint64_or_none that would be required for PyArg_ParseTuple. It would be cleaner to have it distinct, yes. It is somewhat dirty, so I didn't want to bring this approach to other places.

Should I remake it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants