Skip to content

Conversation

@aaronj0
Copy link
Contributor

@aaronj0 aaronj0 commented Jun 4, 2025

Based on root-project/root#18768, pinging @wlav cc @guitargeek

Currently, the iadd implementation for std::vector in Pythonize.cxx works perfectly with Python lists and iterables for which we have an ItemGetter.

>>> from cppyy.gbl.std import vector
>>> cppyy.cppdef("class A{};")
>>> clA = cppyy.gbl.A
>>> d = vector(clA)()
>>> a = [clA(), clA(), clA()]
>>> d+=a
>>> d
<cppyy.gbl.std.vector<A> object at 0x27de0280>
>>> d.size()
3

But the second code path for buffers:

CPyCppyy/src/Pythonize.cxx

Lines 444 to 456 in c6d623a

// if no getter, it could still be b/c we have a buffer (e.g. numpy); looping over
// a buffer here is slow, so use insert() instead
if (PyTuple_GET_SIZE(args) == 1) {
PyObject* fi = PyTuple_GET_ITEM(args, 0);
if (PyObject_CheckBuffer(fi) && !(CPyCppyy_PyText_Check(fi) || PyBytes_Check(fi))) {
PyObject* vend = PyObject_CallMethodNoArgs(self, PyStrings::gEnd);
if (vend) {
PyObject* result = PyObject_CallMethodObjArgs(self, PyStrings::gInsert, vend, fi, nullptr);
Py_DECREF(vend);
return result;
}
}
}

which we hit when i-adding an array.array or np.array, does not work as expected:

>>> d = cppyy.gbl.std.vector["unsigned short"]({0, 1})
>>> a = np.array([2, 3], dtype = np.short)
>>> d+=a
 
>>> d
<cppyy.gbl.__gnu_cxx.__normal_iterator<unsigned short*,vector<unsigned short> > object at 0x2099dd0>
>>> d.size()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'size'

If I try the equivalent by directly dispatching to std::vector::insert it works:

>>> d.insert(d.end(), a)
<cppyy.gbl.__gnu_cxx.__normal_iterator<unsigned short*,std::vector<unsigned short>> object at 0x3ad3e7d0>
>>> d
<cppyy.gbl.std.vector<unsigned short> object at 0x3ad50e50>
>>> d.size()
4
>>> list(d)
[0, 1, 2, 3]

here, the difference between x+=y and x.__iadd__(y) leads to unintuitive behaviour: if the users intention is to obtain the resulting iterator when calling insert, they can do d.insert(d.end(), a) which makes more sense than expecting it from d+=a. When the user does call d+=a, since __iadd__ is overriden, operation does not end with the call to VectorIAdd which should just return the iterator, and the std::vector d should reflect the change by insertion . Instead, it ends with d = d.__iadd__(a) which reassigns the vector to the returned iterator, losing the vector in the process.

This PR fixes this issue by returning self instead.

@aaronj0
Copy link
Contributor Author

aaronj0 commented Jun 12, 2025

@wlav Apologies for pinging you again on this, would be great to get a review on this when possible. Thanks!

@wlav
Copy link
Owner

wlav commented Jun 12, 2025

Don't change the indentation convention (the patch goes to 2 spaces instead of 4 for some reason).

If the insert fails, just return nullptr. The error set at that point will be a more detailed failure message from the call to insert itself, rather than the generic message that it failed as is now in the patch.

For the last comment, the change is to allow the assignment of the result in case of an override, not to prevent an assignment.

Aside, since the assignment is to self, the resulting call to operator= should be a no-op, right? I wonder whether there are better ways of avoiding it (in Python, obviously, there's not custom operator=).

@aaronj0
Copy link
Contributor Author

aaronj0 commented Jun 16, 2025

Don't change the indentation convention (the patch goes to 2 spaces instead of 4 for some reason).

If the insert fails, just return nullptr. The error set at that point will be a more detailed failure message from the call to insert itself, rather than the generic message that it failed as is now in the patch.

For the last comment, the change is to allow the assignment of the result in case of an override, not to prevent an assignment.

Thanks! I have addressed these comments.

Aside, since the assignment is to self, the resulting call to operator= should be a no-op, right? I wonder whether there are better ways of avoiding it (in Python, obviously, there's not custom operator=).

I am unsure how Python handles that... but yes, that would make sense if the reassignment causes the original vector to be garbage collected, without an extra reference to prevent that (if I understand correctly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants