Skip to content

Component.__eq__ semantics: multiset vs dedup-set, and a hash-based redesign worth considering #1460

Description

@SashankBhamidi

While shipping the fix for GHSA-cv84-9p8j-fj68 last week, two larger questions about how Component.__eq__ should actually behave came up. They didn't belong in a security release, so we parked them. Opening this to track properly.

The fix in 7.1.3 settled on multiset semantics for subcomponent equality. [A, A] is not equal to [A], [A, B] == [B, A], counts of each distinct subcomponent matter. That matches Python's intuition and is what most people expect when they call == on two objects.

@niccokunzmann pointed out during the advisory review that the spec doesn't necessarily agree. His reading: a calendar with duplicated components describes the same calendar as one without, because the iCalendar spec is a description not a process. Under that interpretation, equality would be dedup-set instead. [A, A] == [A], counts wouldn't matter.

@angatha put together a clean test table on the advisory thread showing exactly where the two interpretations diverge. The relevant cases are the ones with duplicates on one side but not the other, and the implementations differ on those.

So we have a genuine open question: should icalendar treat subcomponents as a bag (current) or as a set (spec-strict per Nicco's reading)? This is a behavior decision that affects every consumer doing == on calendars, and it's worth wider input than a single PR.

Separately, @niccokunzmann proposed an XOR-based hash approach in the same advisory thread. Hash every property line, value and parameter, XOR them so order doesn't matter, combine subcomponent hashes the same way. Equal hashes imply equal calendars. That gets us O(N) equality instead of the current polynomial behavior, and would let us implement __hash__ on Component which we currently can't because subcomponents aren't hashable.

@angatha extended this with a caching idea: a custom __eq_with_cache that memoizes hashes by identity, otherwise hash calculation itself ends up O(n²).

Either approach is a meaningful refactor. It touches every vProperty type, needs careful handling of TZID and other parameter edge cases, and probably wants a visitor pattern. I'd like to do it but it's substantially more work than the security fix and shouldn't have been bundled with it. I committed on the advisory thread to picking this up as follow-up work, so consider this issue the start.

Two things I'd like out of this:

  • Decide the spec semantics question (multiset vs dedup-set) with broader input.
  • Once that's decided, scope the hash-based implementation as a separate PR.

References: advisory GHSA-cv84-9p8j-fj68, and #1287 (the PR that introduced the original buggy __eq__ while fixing #1224 commutativity).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions