Skip to content

Conversation

@savannahostrowski
Copy link
Member

@savannahostrowski savannahostrowski commented Dec 15, 2025

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, you're pretty close already. Just one thing missing.

@Fidget-Spinner
Copy link
Member

Just to demonstrate the power of this class of optimization. If you were able to remove all decrefs from STORE_ATTR_SLOT' (including the Py_XDECREF), this is what you'd get.

Before (current main):

    // 
    // _STORE_ATTR_SLOT_r20.o:        file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 48 89 f8                      movq    %rdi, %rax
    // 3: 48 83 e0 fe                   andq    $-0x2, %rax
    // 7: 48 b9 00 00 00 00 00 00 00 00 movabsq $0x0, %rcx
    // 0000000000000009:  R_X86_64_64  _JIT_OPERAND0
    // 11: 0f b7 c9                      movzwl  %cx, %ecx
    // 14: 48 8b 1c 08                   movq    (%rax,%rcx), %rbx
    // 18: 41 f6 c7 01                   testb   $0x1, %r15b
    // 1c: 74 14                         je      0x32 <_JIT_ENTRY+0x32>
    // 1e: 49 83 e7 fe                   andq    $-0x2, %r15
    // 22: 41 8b 17                      movl    (%r15), %edx
    // 25: 81 fa ff ff ff bf             cmpl    $0xbfffffff, %edx       # imm = 0xBFFFFFFF
    // 2b: 77 05                         ja      0x32 <_JIT_ENTRY+0x32>
    // 2d: ff c2                         incl    %edx
    // 2f: 41 89 17                      movl    %edx, (%r15)
    // 32: 50                            pushq   %rax
    // 33: 4c 89 3c 08                   movq    %r15, (%rax,%rcx)
    // 37: 4d 89 6c 24 40                movq    %r13, 0x40(%r12)
    // 3c: 40 f6 c7 01                   testb   $0x1, %dil
    // 40: 75 0a                         jne     0x4c <_JIT_ENTRY+0x4c>
    // 42: ff 0f                         decl    (%rdi)
    // 44: 75 06                         jne     0x4c <_JIT_ENTRY+0x4c>
    // 46: ff 15 00 00 00 00             callq   *(%rip)                 # 0x4c <_JIT_ENTRY+0x4c>
    // 0000000000000048:  R_X86_64_GOTPCRELX   _Py_Dealloc-0x4
    // 4c: 48 85 db                      testq   %rbx, %rbx
    // 4f: 74 15                         je      0x66 <_JIT_ENTRY+0x66>
    // 51: 8b 03                         movl    (%rbx), %eax
    // 53: 85 c0                         testl   %eax, %eax
    // 55: 78 0f                         js      0x66 <_JIT_ENTRY+0x66>
    // 57: ff c8                         decl    %eax
    // 59: 89 03                         movl    %eax, (%rbx)
    // 5b: 75 09                         jne     0x66 <_JIT_ENTRY+0x66>
    // 5d: 48 89 df                      movq    %rbx, %rdi
    // 60: ff 15 00 00 00 00             callq   *(%rip)                 # 0x66 <_JIT_ENTRY+0x66>
    // 0000000000000062:  R_X86_64_GOTPCRELX   _Py_Dealloc-0x4
    // 66: 4d 8b 6c 24 40                movq    0x40(%r12), %r13
    // 6b: 45 31 ff                      xorl    %r15d, %r15d
    // 6e: 31 ff                         xorl    %edi, %edi
    // 70: 31 f6                         xorl    %esi, %esi
    // 72: 58                            popq    %rax
    const unsigned char code_body[115] = {

After:

    // _STORE_ATTR_SLOT_r11.o:        file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 49 8b 45 f8                   movq    -0x8(%r13), %rax
    // 4: 49 83 c5 f8                   addq    $-0x8, %r13
    // 8: a8 01                         testb   $0x1, %al
    // a: 74 12                         je      0x1e <_JIT_ENTRY+0x1e>
    // c: 48 83 e0 fe                   andq    $-0x2, %rax
    // 10: 8b 08                         movl    (%rax), %ecx
    // 12: 81 f9 ff ff ff bf             cmpl    $0xbfffffff, %ecx       # imm = 0xBFFFFFFF
    // 18: 77 04                         ja      0x1e <_JIT_ENTRY+0x1e>
    // 1a: ff c1                         incl    %ecx
    // 1c: 89 08                         movl    %ecx, (%rax)
    // 1e: 4c 89 f9                      movq    %r15, %rcx
    // 21: 48 83 e1 fe                   andq    $-0x2, %rcx
    // 25: 48 ba 00 00 00 00 00 00 00 00 movabsq $0x0, %rdx
    // 0000000000000027:  R_X86_64_64  _JIT_OPERAND0
    // 2f: 0f b7 d2                      movzwl  %dx, %edx
    // 32: 48 89 04 11                   movq    %rax, (%rcx,%rdx)
    const unsigned char code_body[54] = {

So it's roughly half of the original size of code.

The problem right now is that we can't (yet) remove the _STORE_ATTR_SLOT's Py_XDECREF. However, in the very near future once we specialize on more forms of object creation, we can optimize thru that. Then for something like

class A:
    def __init__(self, x, y):
        self.x = x
        self.y = y

We would be able to reason that the current object is a fresh one, thus there's no decref needed as there's no attribute currently. So we'd be able to get the full optimization. (The above example is for non-slots, but it's the same idea).

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. Let's wait a day before merging.

@savannahostrowski savannahostrowski merged commit bef63d2 into python:main Dec 15, 2025
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants