Skip to content

Atom: drink up the string soup#1497

Draft
Jannik2099 wants to merge 3 commits into
gentoo:masterfrom
Jannik2099:string_soup
Draft

Atom: drink up the string soup#1497
Jannik2099 wants to merge 3 commits into
gentoo:masterfrom
Jannik2099:string_soup

Conversation

@Jannik2099
Copy link
Copy Markdown
Contributor

First step to making portage.dep.Atom properly typed.

It still retains a __str__ method, as that will always be useful. Some call sites will still do something like str(atom).split(...), because they usually operate on variant types of str | Atom | foo and are completely untyped :(

Some properties of Atom should probably be dropped entirely, and some more be added. For memory efficiency, we may also want to synthesize some of them from the string representation as needed, instead of storing them.

Also, it seems that depgraph.py is more cursed than previously thought possible, and writes to Atom._orig_atom. I've just added it as a field for now...

Ultimately there's still a lot more str(atom) users than I'd like, hence this is just a draft.
There are a few remaining test failures that I am working on right now.

Note that there's also a selinux sandbox commit to fix some tests locally, that's ofc not related and I'll drop the commit later. Working with @WavyEbuilder to figure out why this only appears in tests.

@Jannik2099 Jannik2099 force-pushed the string_soup branch 2 times, most recently from 44c1be6 to ada5b2c Compare November 1, 2025 15:13
@Jannik2099
Copy link
Copy Markdown
Contributor Author

There's a test failure in lib/portage/tests/resolver/test_slot_collisions.py::SlotCollisionTestCase::testSlotCollision that I can't figure out. It's not obviously related to "treat Atom as str". Perhaps some cache key corruption.

@thesamesam thesamesam requested a review from zmedico November 1, 2025 15:19
@zmedico
Copy link
Copy Markdown
Member

zmedico commented Nov 1, 2025

There's a test failure in lib/portage/tests/resolver/test_slot_collisions.py::SlotCollisionTestCase::testSlotCollision that I can't figure out. It's not obviously related to "treat Atom as str". Perhaps some cache key corruption.

The debug for the string_soup branch shows that it didn't account for the the conditional [foo?] USE dep here, where it's not supposed to match dev-libs/E-1::test_repo because it does not have foo in its IUSE:

Parent:    (app-misc/E-1:0/0::test_repo, ebuild scheduled for merge)
Depstring: dev-libs/E[foo?]
Priority:  runtime
Candidates: ['dev-libs/E']
   ebuild: dev-libs/E-1::test_repo

The debug output for the master branch shows it correctly select dev-libs/E-2::test_repo because it has foo in its IUSE:

Parent:    (app-misc/E-1:0/0::test_repo, ebuild scheduled for merge)
Depstring: dev-libs/E[foo?]
Priority:  runtime
Candidates: ['dev-libs/E']
   ebuild: dev-libs/E-2::test_repo

Comment thread lib/portage/dep/__init__.py Outdated

if isinstance(mydep, Atom):
mydep = str(mydep)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unevaluated_atom attribute is lost here, which causes the testSlotCollision failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb find, thanks!

Yeah, this is why I think there's too many foo = str(atom) conversions in this PR right now. Honestly a miracle that I only managed to break a single test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure.

There's only one place where the unevaluated_atom attribute is getattr'd with a fallback, in paren_enclose. All other accesses to the attribute would throw if they were operating on a str, so this is the only site where dropping the attribute could go wrong, right?

Yet if I patch that function to do

            if unevaluated_atom:
                if not hasattr(x, "unevaluated_atom"):
                    x = Atom(x).unevaluated_atom
                else:
                    x = getattr(x, "unevaluated_atom", x)

it still fails.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific code in match_from_list that will cause testSlotCollision to fail if you don't preserve unevaluated_atom is this:

    if mydep.unevaluated_atom.use:
        candidate_list = mylist
        mylist = []
        for x in candidate_list:
            use = getattr(x, "use", None)
            if use is not None:
                if mydep.unevaluated_atom.use and not x.iuse.is_valid_flag(
                    mydep.unevaluated_atom.use.required
                ):
                    continue

@Jannik2099 Jannik2099 force-pushed the string_soup branch 4 times, most recently from 4370306 to 870884b Compare November 2, 2025 12:05
@Jannik2099
Copy link
Copy Markdown
Contributor Author

Jannik2099 commented Nov 2, 2025

trying to put on some proper typing on portage.dep

It seems that Atom gets instances of Package occasionally. I am now handling them inside Atom so that the auxiliary functions don't have to pick up a Package argument type

Comment thread lib/_emerge/actions.py
atoms.append((x, atom))

myvars = sorted(set(atoms))
myvars = sorted(set(atoms), key=lambda t: str(t[0]))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can omit this sort because myvars is only used to populate the cp_map variable.


expanded = cpv_expand(mydep, mydb=mydb, use_cache=use_cache, settings=settings)
return Atom(orig_dep.replace(mydep, expanded, 1), allow_repo=True)
return Atom(str(orig_dep).replace(mydep, expanded, 1), allow_repo=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should pass str(mydep) to replace in case it's an Atom.

if mydep is not None:
tmp = len(mydep) >= 1
if deplist[mypos][0] == "!":
if str(deplist[mypos]).startswith("!"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They say startswith is marginally slower than slice comparison, and for the large number of these it might make sense to avoid startswith.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's better if we use the Atom.blocker attribute here since that also avoids the str conversion.

@Jannik2099
Copy link
Copy Markdown
Contributor Author

I discovered that dep_getcpv, dep_getusedeps and get_operator have no uses within portage. Can these be removed, or are they considered public API?

@zmedico
Copy link
Copy Markdown
Member

zmedico commented Nov 2, 2025

I discovered that dep_getcpv, dep_getusedeps and get_operator have no uses within portage. Can these be removed, or are they considered public API?

They are public but deprecated, so let's just add a DeprecationWarning for now.

Actually, we could use UserWarning to make them more noisy.



def match_to_list(mypkg, mylist):
def match_to_list(mypkg: Union[str, Atom], mylist: list) -> list:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypkg should not be Atom, but _pkg_str or Package.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time following the typing for these functions, thanks.
Ideally these would be decoupled so that the auxiliary functions in the Atom file don't get Package objects, seeing that Package imports Atom... I'll see if I can unmangle them.



def best_match_to_list(mypkg, mylist):
def best_match_to_list(mypkg: Union[str, Atom], mylist: list) -> Optional[Atom]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypkg should not be Atom, but _pkg_str or Package.

if not isinstance(mydep, Atom):
if isinstance(mydep, Atom):
if mydep.blocker:
mydep = Atom(str(mydep).lstrip("!"), allow_wildcard=True, allow_repo=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not really be necessary to do this, and would be better to avoid if possible because it discards the unevaluated_atom.

Comment thread lib/portage/dep/__init__.py
xs = x.cpv_split
elif hasattr(x, "cpv"):
# Package object
xs = x.cpv.cpv_split
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be Atom since it has a cpv attribute, right?

# Only package objects have 'use' and 'iuse' attributes
if not hasattr(x, "use"):
mylist.append(x)
continue
Copy link
Copy Markdown
Member

@zmedico zmedico Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be Atom since it has a use attribute, right? We can't handle it like Package because it has no iuse attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants