Skip to content

Dealing with python object structure #1299

@Thrameos

Description

@Thrameos

Background

This is the last foundational piece that I see necessary for long term support of the JPype module. For a long time JPype has used a hack to trick the Python memory model into supporting multiple inheritance. While it looks like Python has multiple inheritance that is an illusion because it only works if the objects have no memory to back them other than a dictionary. We can try to conform to such a model which is what was the state of JPype 0.6 but it has huge pitfalls as that means hundreds of dictionary lookup per method resolution. My solution was to make a custom allocator which did functioned exactly as the alloc/free contract was designed. I add memory that I need at the end of the object outside of the view of Python and create a way to navigate to it. It is not without issues as some objects grow down meaning knowing exactly where it is for those objects takes some work, but it was very workable.

Unfortunately, Python core developers have busted their contracts in several fundamental ways. First the int contract was updated such that it didn't have a firm grasp of the size of the object. That broke us because I lost track of the end of the object. That was fixable. Next a clever Python core developer decided lets merge the Python dictionary to the end of the object. At the same time they hid the size of the dictionary and removed all of the symbols required to use the allocator. Thus they both made it hard to see how much memory I need to skip and disabled the contract that I was using without any deprecation notice nor PEP to indicate such a change was happening.

That was really the last straw for me as far as nice behavior. I have had many very important proposals such as massive speed up in the isinstance checks for derived objects or generalize support for merged exceptions that have gone nowhere because I don't have the time dedicate to Python core to write a PEP (though I was able to sign the Python contributor contract), but the core developers don't have to follow the same rules. There is a reason my avatar has been "Anger" from "Inside Out" and it isn't for my winning personality nor the fact I am a Lewis Black fan. I simply lack the demeanor to work in a politic structure.

Hence my only solution was to perform gross manipulations of the type system. There were two mechanisms in Python which are supposed to aid our usage. The first manipulated the base class size which simply runs into the same problems. The second is an Unstable API which when I tested it is broken meaning it may have worked in one version of the Python development, but the same destructive contract breaking that got me destroyed it. They have light unit testing, but it misses the broken behavior meaning they simply don't know that is nonfunctional. Python needs an overhaul of its memory model to add a layout object. Instead it has a pile of unorganized kludges with if chains checking if something gc, has inline dict, or weak reference list attachments, etc. I am not impressed. (Sorry refer to my avatar.) It also doesn't help that my work cycle only has time free in the spring quarter and never in the fall and winter which corresponds to the Python release cycle. Meaning I am forced to find solutions only when I have no time to devote to solving it.

Fundamental issue

The Python and Java tree are incompatible because Exceptions, String, Long, and Float are all related in slightly different ways. And as multiple inherence is not on the table currently, that means out of scope memory is the "easiest" solution. However, it is not the only way to tackle it.

The other way which is more challenging. If we can't add memory at object we could add it at all the leaf nodes (or at least the ones that Python allows to be extended.) Most of the object fit such a description meaning that if we are willing to accept a few slot modifications to help with the finding of the slot we can operate a bit like a relocatable dictionary.

That leaves two types in limbo. java.lang.Object and java.lang.Number. Both of these are cursed because if I add memory to them they will have base conflicts with the derived classes. And there is no way we can just not have those classes existing in Java. Though Number is not often used Object is a foundational class. Even derived classes have to go through an Object to get to the invocation stage. Meaning unless I can beat that one type into submission I am stuck with horrible kludges which will never allow us to move to alternative Python implementations.

An alternative approach

So here is my latest attempt at solving the issue. First we are going to add some memory to the _jpype._JClass type which will give us a specialized slots. As every class has a JClass instance backing it, we can store a number of hooks to help us get to our memory. I can give a slot offset which then points to the address of our memory. I will then need to implement a derived type for every Python concrete class that is in conflict. But that only solves their problem.

Next up those classes which are not part of the concrete class tree. Currently they get their memory from the JValue structure. But I can analyze the tree and run time and determine if any base class would be in conflict and if I don't find a conflict add the slot. That solves 98% of the conflicts without any need for gross hacks.

That leaves just two classes which are broken: Object and Number. My proposal is to make these "dual classes" in which we have both an abstract version which has no memory and a concrete object which holds our memory. The Python system will take any instance that was destine for the original class and vector it to our concrete class.

This isn't without pitfalls as the type of the concrete class is not the same as the type of the abstract class. Thus if someone wrote code that checked isinstance relationships using the type function on an object it would be broken. However, that seems like a very small compromise to getting rid of all of the hacks that I am currently forced to do and the ability to run on PyPy.

Request for input

@marscher @astrelsky @pelson @michi42 @altendky @Christopher-Chianelli you all contributed to the success of this package so it would help to get your opinion. This would be the first time since 0.7 that I have actually broken a contract that we have in our testbench. Though I suspect the alternative system will still be functionally equivalent, there will be code somewhere that gets broken by this change. Is is worth changing the core object memory model to make us comply with the Python object model even if it creates edge cases?

Benefits:

  • Hack "free" solution. There are some classes in Python that don't have the BASE_CLASS tag that I still need to manipulate, but most will work.
  • Build with PyPy
  • Immunity from some level of Python core changes.

Downsides:

  • More complex object model
  • Different models will be present for classes meaning at least one extra function call on method resolution. Our current system is heavily optimized so one check will cost in terms of speed.
  • Some objects may be forced to use the dictionary solution if I can't make them comply. Those edge cases will be slower. Clarification: this assumes that like jproxy we can keep the dict check confined, if it is required on all checks it will be cripplingly slow.
  • Special is instance methods and is subclass checks to try to fool the user into not seeing extra class relationships.
  • Two classes will report to the same name which can be confusing to users.
  • Unknown number of packages may be broken even if our testbench passes because we can't anticipate how the model change is affects the edge cases.

Metadata

Metadata

Assignees

Labels

enhancementImprovement in capability planned for future release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions