Skip to content

Atom: drink up the string soup (rebase of #1497 + more)#1594

Closed
mattst88 wants to merge 10 commits into
gentoo:masterfrom
mattst88:string_soup
Closed

Atom: drink up the string soup (rebase of #1497 + more)#1594
mattst88 wants to merge 10 commits into
gentoo:masterfrom
mattst88:string_soup

Conversation

@mattst88

Copy link
Copy Markdown
Contributor

No description provided.

@mattst88 mattst88 force-pushed the string_soup branch 2 times, most recently from 59bed7f to 59cb02b Compare June 21, 2026 11:51
@Jannik2099

Copy link
Copy Markdown
Contributor

thanks matt for picking up on this - I've been very short on time lately, sorry.

In the long term we should probably get rid of all sorted(atoms, key=str) aswell, as the only legitimate use of atom ordering is version comparisons (right?).

For the public record, I approve of reusing and modifying my commits for this.

@mattst88 mattst88 force-pushed the string_soup branch 2 times, most recently from 385f57b to f788603 Compare June 21, 2026 15:50

@thesamesam thesamesam left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments from a first pass. Thanks a lot to you and Jannik for doing this.

Please make sure you're daily driving it if you're not already (I recommend this in general for Portage patches), it's invaluable testing.

Comment thread lib/_emerge/depgraph.py Outdated
Comment thread lib/_emerge/depgraph.py Outdated
Comment thread lib/_emerge/depgraph.py
else:
cp_exists = False
if atom.package and not atom.cp.startswith("null/"):
if atom.package and atom.category != "null":

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not asking you to fix this now, but this null crap really needs to go in the future.

@mattst88 mattst88 force-pushed the string_soup branch 3 times, most recently from 0628c37 to 9e16de7 Compare June 23, 2026 20:04
Comment thread lib/_emerge/depgraph.py Outdated
Comment thread lib/portage/dep/__init__.py
Comment thread lib/portage/versions.py Outdated
Comment thread lib/_emerge/resolver/slot_collision.py Outdated
Jannik2099 and others added 10 commits June 23, 2026 20:42
[mattst88]
Fix call sites that relied on Atom being a str subclass:
- wrap str method calls (lstrip, indexing, join, concatenation) with str()
  in BlockerDB, glsa, actions, depgraph, bintree, binrepo/config,
  repository/config
- use key=str in emaint/world sorted() call instead of adding ordering
  methods to Atom
- add TYPE_CHECKING guard and lazyimport block in versions.py

Co-authored-by: Matt Turner <mattst88@gentoo.org>
Signed-off-by: Matt Turner <mattst88@gentoo.org>
Convert @type/@rtype docstring annotations to PEP 3107 function
annotations in dep/__init__.py. Also replace getattr()-based duck
typing in dep_getslot() and dep_getrepo() with explicit isinstance()
checks now that Atom has a defined interface rather than being a str
subclass with arbitrarily-set attributes. In paren_enclose(), replace
getattr(x, "unevaluated_atom", x) with an isinstance guard.

Signed-off-by: Matt Turner <mattst88@gentoo.org>
…erator

Signed-off-by: Matt Turner <mattst88@gentoo.org>
Replace the post-construction __setattr__ hack for _orig_atom with a
proper orig_atom keyword argument. The slot is now always initialized
in __init__ (to None for ordinary atoms), evaluate_conditionals()
propagates it, and depgraph.py accesses it via getattr() to remain
safe for SonameAtom and other non-Atom types in mixed-type loops.

Signed-off-by: Matt Turner <mattst88@gentoo.org>
Add Atom.with_cp(cp) to replace the str(atom).replace(atom.cp, cp, 1)
pattern, and update three call sites in _sets/base.py and depgraph.py.

Signed-off-by: Matt Turner <mattst88@gentoo.org>
Defer construction of the without_use Atom until first access. For atoms
with USE deps the __init__ previously always built a second full Atom
eagerly; with the lazy property it is computed on demand and cached in the
slot. For atoms without USE deps the result is still self, at zero cost.

Benchmarks on USE-dep atoms: ~40% faster construction, ~22% less memory.

Signed-off-by: Matt Turner <mattst88@gentoo.org>
_pkg_str(cpv) called catpkgsplit() which fully re-parsed the atom string
in __init__. Since the atom regex already extracts both cpv and cp as
separate groups, the version is simply cpv[len(cp)+1:] -- no second parse
needed. Store _cpv as a plain string and compute _version directly.

Update best_match_to_list's cmp_cpv to use cpv_getversion() as a fallback
for plain strings that lack a .version attribute.

Construction time (µs):          Before → After
  =sys-apps/portage-3.0.50         4.5  →  2.8  (-38%)
  >=dev-libs/glib-2.68:2           4.9  →  3.2  (-35%)
  >=dev-python/requests-2.28[ssl]  6.3  →  4.6  (-27%)

Memory (bytes):
  =sys-apps/portage-3.0.50        1665  →   487  (-71%)
  >=dev-libs/glib-2.68:2          1690  →   518  (-69%)
  >=dev-python/requests-2.28[ssl] 3321  →  2137  (-36%)

Signed-off-by: Matt Turner <mattst88@gentoo.org>
Add __repr__ returning Atom('=cat/pkg-1.0') for readable output in
debuggers and the REPL.

Add is_versioned property (self._version is not None) to replace the
atom.cp != atom.cpv string comparison pattern at call sites. Avoids a
full string equality check in favor of a single None check.

  hyperfine 'python bench.py' (250k is_versioned checks):
    cp != cpv (before):  89.2 ms ± 3.8 ms
    is_versioned (after): 83.7 ms ± 3.4 ms  (-6%)

Signed-off-by: Matt Turner <mattst88@gentoo.org>
__eq__ previously called str() on both operands, invoking __str__ even
when we can read _string directly. Compare _string to _string for
Atom-to-Atom and _string to the str directly.

category used split("/")[0] which allocates a list; partition("/")[0]
returns the first element without the list allocation.

  hyperfine (500k × (a==a, a==a_unequal, a==str, a.category)):
    before: 420.7 ms ± 12.3 ms
    after:  286.5 ms ±  4.8 ms  (-32%)

Signed-off-by: Matt Turner <mattst88@gentoo.org>
vardb.match() calls match_from_list(), which uses only cp, cpv, use,
operator, etc. -- none of which differ between an evaluated Atom and a
freshly constructed Atom(str(evaluated_atom)). Pass the atom directly.

Update the comment to drop the now-misleading Atom(str(atom)) reference.

Signed-off-by: Matt Turner <mattst88@gentoo.org>

@thesamesam thesamesam left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, thanks! I assume all of @zmedico's comments from #1497 are handled (at a glance it looks like they are).

I wouldn't be surprised if there's some fallout, so be prepared for that.

@mattst88

Copy link
Copy Markdown
Contributor Author

Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants