|
| 1 | +# Type Mapping |
| 2 | +One of the challenging tasks for the Java to Python bridge is to select the |
| 3 | +correct set of interfaces to implement that presents the available behaviors |
| 4 | +for each Python class as Java interfaces. |
| 5 | + |
| 6 | +This is represented in two ways. |
| 7 | + |
| 8 | + - A list of interfaces that describes the functional behavior of |
| 9 | + the object in Java. |
| 10 | + |
| 11 | + - A dictionary that provides a mapping from Java function names |
| 12 | + to Python callables. |
| 13 | + |
| 14 | +With these two objects we can create a JProxy that captures |
| 15 | +the behaviors of the Python object in Java. |
| 16 | + |
| 17 | +We will break the wrappers into two types. |
| 18 | + |
| 19 | + - Protocol wrappers are entirely behavioral which look at the |
| 20 | + dunder functions of the class an present that behavior to |
| 21 | + user. |
| 22 | + |
| 23 | + - Concrete mappings are specializations to each of the |
| 24 | + Python builtin classes. We would like to support |
| 25 | + additional late specialized wrappers by the user. |
| 26 | + (See Specialized Wrappers) |
| 27 | + |
| 28 | + |
| 29 | +## Probing |
| 30 | + |
| 31 | +Probes will take place in two stages. First we need to collect |
| 32 | +a list protocols that are supported for the object. These will be |
| 33 | +provided as a bit mapped field. |
| 34 | + |
| 35 | +There are three ways we can probe for concrete types. We can consult the mro |
| 36 | +to see what is the second type slot. For objects that haven't reordered |
| 37 | +their mro this will be successful. We can consult the `tp_flags` for those |
| 38 | +types that Python has already accelerated support to check for. |
| 39 | +We can perform a list of isinstance searches. The last has the significant |
| 40 | +downside that it will end up O(n). |
| 41 | + |
| 42 | + |
| 43 | +Concrete Types: |
| 44 | + |
| 45 | +- PyByteArray |
| 46 | +- PyBytes - inherits from byte `Py_TPFLAGS_BYTES_SUBCLASS` |
| 47 | +- PyComplex |
| 48 | +- PyDict - inherits from dict `Py_TPFLAGS_DICT_SUBCLASS` |
| 49 | +- PyEnumerate |
| 50 | +- PyLong - inherits from int `Py_TPFLAGS_LONG_SUBCLASS` |
| 51 | +- PyExceptionBase - inherits from exception `Py_TPFLAGS_BASE_EXC_SUBCLASS` |
| 52 | +- PyList - inherits from list `Py_TPFLAGS_LIST_SUBCLASS` |
| 53 | +- PyMemoryView |
| 54 | +- PyObject - Base class and special case when an object has len(mro)==1 |
| 55 | +- PyRange |
| 56 | +- PySet |
| 57 | +- PySlice |
| 58 | +- PyString - inherits from str `Py_TPFLAGS_UNICODE_SUBCLASS` |
| 59 | +- PyTuple - inherits from list `Py_TPFLAGS_TUPLE_SUBCLASS` |
| 60 | +- PyType - inherits from type `Py_TPFLAGS_TYPE_SUBCLASS` |
| 61 | +- PyZip |
| 62 | + |
| 63 | + |
| 64 | +Protocols include: |
| 65 | + |
| 66 | +- PyASync - Abstract interface for classes with the async behavior. |
| 67 | + Identify with `tp_as_async` slot. |
| 68 | + |
| 69 | +- PyAttributes - Virtual map interface to the attributes of an object. |
| 70 | + Python has two types of map object in getattr/setattr and getitem/setitem. |
| 71 | + As we can't have the same function mapping for two behaviors we |
| 72 | + will split into a seperate wrapper for this type. |
| 73 | + |
| 74 | +- PyBuffer - Abstract interface for objects which look like memory buffers. |
| 75 | + Identify with `tp_as_buffer` |
| 76 | + |
| 77 | +- PyCallable - Abstraction for every way that an object can be called. |
| 78 | + This will have methods to simplify keyword arguments in Java syntax. |
| 79 | + Identify with `tp_call` slot. |
| 80 | + |
| 81 | +- PyComparable - Class with the `tp_richcompare` slot. |
| 82 | + |
| 83 | +- PyGenerator - An abstraction for an object which looks both like a iterable and a iterator. |
| 84 | + Generators are troublesome as Java requires that every use of an iterable starts |
| 85 | + back from the start of the collection. We will need create an illusion of this. |
| 86 | + Identify with `tp_iter` and `tp_next` |
| 87 | + |
| 88 | +- PyIterable - Abstract interface that can be used as a Java Iterable. |
| 89 | + Identify with `tp_iter` slot. |
| 90 | + |
| 91 | +- PyIter - Abstract interface that can be converted to a Java Iterator. |
| 92 | + PyIter is not well compatible with the Java representation because |
| 93 | + it requires a lot of state information to be wrapped, thus we |
| 94 | + will need both a Python like and a Java like instance. The |
| 95 | + concrete type PyIterator is compatible with the Java concept. |
| 96 | + Identify with `tp_next` (the issue is many Python iter also |
| 97 | + look like generators, so may be hard to distiguish.) |
| 98 | + |
| 99 | +- PySequence - Abstract interface for ordered list of items. |
| 100 | + Identify with `Py_TPFLAGS_SEQUENCE` |
| 101 | + |
| 102 | +- PyMapping - Abstract interface that looks like a Java Map. |
| 103 | + Identify with `Py_TPFLAGS_MAPPING` |
| 104 | + |
| 105 | +- PyNumber - Abstraction for class that supports some subset of numerical operations. |
| 106 | + This one will be a bit hit or miss because number slots can mean |
| 107 | + number like, matrix like, array like, or some other use of |
| 108 | + operators. |
| 109 | + |
| 110 | + |
| 111 | +One special case exists which happens when a Java object is passed through a |
| 112 | +Python interface as a return. In such a case we have no way to satify the type |
| 113 | +relationship of being a PyObject. Thus we will prove a PyObject wrapper that |
| 114 | +holds the Java object. |
| 115 | + |
| 116 | +We will loosely separate these two types of iterfaces into two Java packages |
| 117 | +`python.lang` and `python.protocol`. There are also numerous private internal |
| 118 | +classes that are used in wrapping. |
| 119 | + |
| 120 | + |
| 121 | +## Presentation |
| 122 | + |
| 123 | +Python objects should whereever possible conform to the nearest Java concept |
| 124 | +if possible. This means the method names will often need to be remapped to |
| 125 | +Java behaviors. This is somewhat burdensome when Java has a different |
| 126 | +concept of what is returned or different argument orders. We don't need |
| 127 | +to be fully complete here as the user always has the option of using the |
| 128 | +builtin methods or using a string eval to get at unwrapped behavior. |
| 129 | + |
| 130 | +### Name Conflicts |
| 131 | + |
| 132 | +Python has two behaviors that share the same set of slots by have very |
| 133 | +different meanings. Sequence and Mapping both use the dunder functions |
| 134 | +getitem/setitem but in one case it can only accept an index and the other any |
| 135 | +object. When these are wrapped they map to two difference collections on the |
| 136 | +Java side which have extensive name conflicts. Thus the wrapping algorithm |
| 137 | +must ensure that these behaviors are mutually exclusive. |
| 138 | + |
| 139 | +## Specialized Wrappers |
| 140 | + |
| 141 | +In addition, to the predefined wrappers of Python concrete classes, we may want |
| 142 | +to provide the user with some way to add additional specializations. For |
| 143 | +example, wrappers for specific classes in nump, scipy, and matplotlib may be |
| 144 | +desireable. |
| 145 | + |
| 146 | +To support this we need a way for a Java module to register is wrappers and |
| 147 | +add them interface and dictionaries produced by the probe. There is also |
| 148 | +a minor issue that the interpret may not be active when the Jar is loaded. |
| 149 | + |
| 150 | +Potential implementations: |
| 151 | + |
| 152 | +- Initialization of a static field to call registerTypes() would be one option. |
| 153 | +But Java is lazy in loading classes. That means that if we probe a class |
| 154 | +before the corresponding class is loaded we will be stuck with a bad wrapper. |
| 155 | + |
| 156 | +- JNI onload function is guaranteed to be called when the jar is first |
| 157 | +encountered. But is has the significant disadvantage that it must |
| 158 | +be tied to a machine architeture. |
| 159 | + |
| 160 | +- Java module services may be able to force a static block to implement. |
| 161 | +This will take experimentation as the behavior of this function is |
| 162 | +not well documented. |
| 163 | + |
| 164 | + |
| 165 | + |
| 166 | +## Internal implemention |
| 167 | + |
| 168 | +### Converters |
| 169 | + |
| 170 | +We will need to define two converters for each Python type. One which gives |
| 171 | +the most specific type such that if Java requests a specialized type it is |
| 172 | +guaranteed to get it (or get a failure exception) and a second for the base |
| 173 | +class PyObject in which Java has requested an untyped object. |
| 174 | + |
| 175 | + |
| 176 | +### Probe Goals |
| 177 | + |
| 178 | +The system must endever to satisfy these goals. |
| 179 | + |
| 180 | +1) Minimize the number of probes for each object type. |
| 181 | + (Probing will always be an expensive operation) |
| 182 | + |
| 183 | +2) Memory efficient |
| 184 | + (many types will share the same list of interfaces and dictionary, which |
| 185 | + means that whereever possible we will want to join like items in the table.) |
| 186 | + |
| 187 | + |
| 188 | +To satify these goals we will use a caching strategy. |
| 189 | + |
| 190 | +We can consolidate the storage of our entities by using three maps. |
| 191 | + |
| 192 | +- WeakKeyDict(Type, (Interfaces[], Dict)) holds a cached copy of the results |
| 193 | + of every probe. This will be consulted first to avoid unnecessary probe |
| 194 | + calls. |
| 195 | + |
| 196 | +- Dict(Interfaces[], Interfaces[]) will hold the unique tuple of |
| 197 | + interfaces that were the results of the probes. When caching we form |
| 198 | + the dynamic desired tuple of interfaces, then consult this map to get |
| 199 | + the correct instance to store in the cache. |
| 200 | + |
| 201 | +- Dict(Interfaces[], Dict) as the Java methods are fixed based on the |
| 202 | + interfaces provided, we only need one dict for each unique set of interfaces. |
| 203 | + |
| 204 | +These will all be created at module load time. |
| 205 | + |
| 206 | + |
| 207 | +### Limitations |
| 208 | +As probing of classes only happens once, any monkey patching of classes will |
| 209 | +not be reflected in the Java typing system. Any changes to the type dictionary |
| 210 | +will likely result in broken behaviors in the Java representation. |
| 211 | + |
| 212 | + |
| 213 | +### Exceptions |
| 214 | + |
| 215 | +Exceptions are a particularly challenging type to wrap. To present an exception |
| 216 | +in Java, we need to deal with fact that both Python and Java choose to use |
| 217 | +concrete types. There is no easy way to use a proxy. Thus Python |
| 218 | +execptions will be represented in two pieces. An interface based one will |
| 219 | +be used to wrap the Python class in a proxy. A concrete type in Java will |
| 220 | +redirect the Java behaviors to Python proxy. This may lead to some |
| 221 | +confusion as the user will can encounter both copies even if we try to |
| 222 | +keep the automatically unwrapped. (Ie. Java captures an exception and |
| 223 | +places it in a log, then Python is implementing a log listener and gets |
| 224 | +the Java version of the wrapper.) |
| 225 | + |
0 commit comments