-
Notifications
You must be signed in to change notification settings - Fork 191
Java to Python bridge #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Java to Python bridge #1274
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1274 +/- ##
==========================================
- Coverage 86.96% 85.29% -1.67%
==========================================
Files 113 115 +2
Lines 10354 10812 +458
Branches 4064 4184 +120
==========================================
+ Hits 9004 9222 +218
- Misses 757 995 +238
- Partials 593 595 +2
🚀 New features to boost your workflow:
|
How do you intend to make this usable from a java project? The jpype jar is bundled with the pypi wheel so it would make it a bit complicated unless it was made available independently on maven central or something. The only way I see that working is if it is in its own jar outside of the jpype jar. Would it make sense to have Or is this intended to only ever be an example? |
With regards to distribution, It requires the Jpype distribution and C++ code to function so it would be bundled with JPype. Though I expect that if one were to operate purely as a reverse bridge it would like want be formulated as a maven build in which it builds a portion of the support jar, then requires jpype installed in Python for the rest. Unfortunately, I don't play much in that field so I would be happy if some had more insight. Under the original plan from years ago I was going to build two jars "org.jpype" for the forward bridge, and "org.epypj" for the reverse bridge. Both would be built from Python, but the reverse would have a separate packager so that one could install from maven. For testing I am building the support pieces into our current jar, but that is not the final home. As for function as map, I would not want it to give map function freely. If you want to get the dictionary like behavior then you would ask for the member to produce a wrapper for it. That is the reason for all those additional interfaces like PyTuple, PyList, PyDict. Those are wrappers what add to the behavior that is available. If I were to add dictionary behaviors to the base Object it would conflict with those specializations. I want all the Python standard collections to map the the same standard collections in Java. As for wrapping attributes we would have two choices:
I did the second as it was quick and dirty for this demonstration, but logically by my philosophy I prefer language bindings to best present in the language being used (except for naming conventions as that will break documentation lookup). As to what this PR is.... Currently it is a demonstration with the goal of finding enough interest to make it part of our project. I did a very intricate design last time geared towards speed, but just didn't find sufficient users to justify the hours required to complete the test bench. This is a lightweight design could be grown into something larger without the burden of requiring restructuring of the jpype core (which doomed the last effort). Thanks for the feedback. As this is an experiment in progress user feedback is required to come up with a design that fits the needs. Important design decisions thus far. Previously, I was returning Java objects because one can't tell if a return object is going to be Python or Java though Java was a rare case. This forced a huge amount of casting on the user when I was working on testing. This time I return a PyObjectJava which is a wrapper to access the Java Object. This is fairly critical as collections need to be of uniform type. I believe usability favors this direction...
|
My question on the bundling and distribution is more towards just building a project that would use it. It would be weird to configure a Java project to look for a jar inside a python installation or virtual environment. You wouldn't bundle it with the project since jpype would load it. Actually it may be interesting to be able to have a project include the jar too. It would allow some way of having java code that works whether or not jpype and python are running. |
@astrelsky I checked in an example of how it would be packaged. There would be a separate jar containing all the code required for the binding with the code required to span a Python interpreter scope. Unfortunately it does not yet work because I haven't installed the magic compiler switched required to get Java to properly load the library meaning it fails on busted symbols. (I remember it took me a while to find the compiler flags, but it should still be in epypj) org.jpype.bridge.Bridge does most of the work. If figures out where Python libraries are located and tests jpype to make sure it can be loaded. It caches the values to ensure that the probe is only needed once. Afterward it would directly start by lunching libpython. There is one support hook required but that can be located in the main JPype dll so no need for a second native library. One odd bug is a was not able to get it running from within netbeans. It appears that there is an env variable that must be set for Python to locate packages from site-library. I don't recall that issue the last time I worked on it. |
@astrelsky I successfully loaded the natives on the linux build. It also tested fine on the windows build. Thus all that is missing from it starting a Python session is the entry point for launching an interpreter. Does this build system make sense to you? When using the Java->Python bridge:
When using the Python->Java bridge:
You can follow the sequence in the Bridge class. |
Yes this makes sense to me. Thank you. |
@VincentDary I seems like I need to modify the proxy code to make it a bit more flexible which would make this easy. As currently written it can only take a dictionary or a instance object but not both. To make it easy to wrap a Python object it would be much better if one can both define the methods and the object to be passed as self separately. I was able to pull over most of the rest of bridge code from the old repo that was relevant. I will ping you again when it is ready for some testing. If you want to influence the direction of the wrapper please make reviews as the work is in progress. Currently the code is divided into two parts.
The basic usage would be
You may have multiple scopes open in one Java program, but as they share an interpreter all module level changes will be global. I am open to naming suggestions, feature requests, and general direction advice as I would rather have something that people are interested in using rather than another academic exercise. |
@astrelsky I put another session in. Does this look like a workable direction for the API? I broke the Python object into two sets: concrete and protocols. The basic object is just a fan out to protocols. I would try to make it castable with the Java cast operations, but there is no way to know if I have the right binding up front and without ASM support I can't freely mixin interfaces. Thus the user would need to request the protocol that best suits their needs from the object. We then have a bunch of predefined concrete types which will get both object and protocols as well as any of the special behaviors that define the type. Last the protocols define behaviors and where possible map back to Java concepts such that Python objects can be handled by typical Java Collections type classes. As always this is a monster amount of typing and testing to make it work. |
I'll look through it over the week as I was already burnt out by the time I saw this today. As a disclaimer, I hate just about every API until I actually use it. I see the potential for it's use in Ghidra, which is where I would intend to use it as well, so maybe @ryanmkurtz could offer an opinion as well. As for the ASM support, it is used in the extension classes anyways so maybe that will free that up? I'll open that pull request as a draft at for now just to get my ass moving on it again. It just needs to pull in commits, deal with conflicts, reformatted and then sit through review. |
I won't expect a lot of love when it comes for attempts to wrap an untyped language in a typed one. I have torn up the classes three times already and I can't say it is satisfying in any form. There is always a leakage between the arguments that can be taken, and what Python can wrap properly. It is also exposing some very strange edge cases in JPype that I will be fighting to fix. (Did you know that if a Java interface looked like and iterable the JProxy would assume the class methods were a list and tried to press it through a tuple?) I will be trying to avoid asm this time for a few reasons. The main use I had for it was to create mixins so I could take a much of interfaces and mask them together so that the Java isinstance would actually be able to navigate them via casting. That meant probing a class, finding all its dunder functions and then making a wrapper with the right exposure. That was cool in that it looked neat, but it has some serious down sides.
This time I am just going to accept that there may be many wrappers for the same Python object with different interfaces exposed for each. Though until I program with it for a while I am not sure how comfortable it will be. As for automatically wrapping modules, if we aren't doing it on the fly we don't really need ASM. We could just generate code and compile it. So without the mixin there we could avoid it entirely. Items that remain:
|
Very ugly segfault was fixed. I needed a handful of upgrades to the proxy system and it triggered a bunch of very nasty referencing errors. I was forced to unroll the whole native changes and apply them one by one to locate the problem. My secondary issue is that my build system on Windows produces corrupt dlls which means that everything must happen on WSL which is a much slower path. dlls built from azure work, just my local visual studio install. Once I get the native part working under windows (may have to trigger a release to get a working binary) I can start asking netbeans to sub some test cases. That will be the first execise of paths. |
test/jpypetest/test_fault.py
Outdated
# with self.assertRaises(TypeError): | ||
# _jpype._JProxy(None, None) | ||
# with self.assertRaises(TypeError): | ||
# _jpype._JProxy(None, []) | ||
# with self.assertRaises(TypeError): | ||
# _jpype._JProxy(None, [type]) |
Check notice
Code scanning / CodeQL
Commented-out code Note test
I just thought of this while looking through the code. I'm not sure if you already are or not but you might want to filter out access to the package from Python to prevent anything unexpected if you think that could cause a problem. |
I was briefly able to test from both Java and Python before I took a header on Windows and tried to execute “pip uninstall JPype1”, which sadly deleted my entire development directory because the install was pointed back to my internal copy. I though it would just remove the link that setup.py installed but nope it did an rm -Rf instead. Lost a good 3 hours of work on that one.
As for the corrupt builds I suspect it is a library conflict between launching from netbeans and from the command line. I have no issue running the same DLL from cmd but the minute I run it under netbeans I get a windows access error. Copying the dll from the release build works, but it is missing the hooks I need for the new code. I could put in a second PR with just the proxy enhancements. Thus my plan it to finish the C++ changes, force a release build run, install the wheel from azure than debug the Python and Java bindings. There are a good number of entry points so I need netbeans auto test writer to do a good chunk of work.
The “missing” feature was the ability to specify both a dict and an inst to the JProxy. We need both to hold a “self” reference to the wrapped object and a mapping which tells us what methods that Java should see. With that the rest of the binding is trivial. I already had a working copy of the launch script. Everything else is just trying to satisfy the Java contracts with the tools that Python provides.
|
native/common/jp_bridge.cpp
Outdated
{ | ||
jboolean copy; | ||
const char* name = env->GetStringUTFChars(str, ©); | ||
for (std::list<void*>::iterator iter = libraries.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell if this is leftovers that still needs to be refactored but I couldn't find the Java native method declaration.
Also, the access to the libraries
list is not thread safe. I know msvc does thread safe static globals by default (which is really stupid to create a mutex for every global by default and silently add lock accesses) but I don't know what gcc or clang does. I would like to think they are sane and you would encounter a problem here.
Sorry for going on a minor tangent, I can't help myself sometimes.
That was code left over from the epypj ap which require name lookup for the CFFI address lookup. I copied it over just to make sure that the libraries are loaded properly as it provides a method to search for library resources. The good news is the only code in that file which will be in the finished product is the “start” method which searches for the Python modules “jpype” and “_jpype” and then calls the initialize resources. That bypasses the entire startJVM path.
I agree it would be a problem (it was marked to refactor to something safe where I copied it from) but as it is now dead code you can safely ignore it.
|
@astrelsky It seems like there are some wrinkles with using maven to build the support Jar. I can't get the order of the main jar to function properly. The only way that it will work is if the org.jpype components are already in memory prior to start of the Python. If not then org.jpype doesn't get the permissions it needs to operate. I don't think this is a show stopper. I often run with a org.jpype that did not come installed with Python during testing. I have even had it named differently than expected. What it does mean is that rather than using maven I will likely need to build the package the usual way under Python and then use Apache Ivy to publish a bare artifact to maven central rather than an actual maven pom. Yes that means there will be two copies of org.jpype on the users system. One under the Python site and the other under the Java resources. So long as one of them gets loaded on the class path it won't matter which copy we run from. It does mean I will need to add some version guards to the process as one could imagine cases in which there was version sheer between the two copies. (Oh well so much for maven.... back to using ant) Scratch that.... ant is with modules is now completely busted in netbeans. So back to maven again! |
@marscher Sorry about this turning into another monster PR. I was hoping to just put all of the changes into one directory and leave the rest untouched, but it didn't work out that way. The only way to get it functional is to place all of the contents in one tree and build a unified jar from it. That requires moving every file in the native/java into a native/jpype_module/src/main/java. Most of it is just simple motion, but even with git move it still makes for a huge amount of files changed. I could split out the C++ changes into a separate PR if that helps. The C++ changes were to fix three bugs I found in the type system and make it so that one can create JProxy with both inst and dict set which is useful for mapping a new interface over an existing object. One other important note.... getting this to function in Java 1.8 would be a hopeless task so I have to move the minimum version to 11 which next LTS. |
* | ||
* Optionally, specific bytes can be deleted during decoding. | ||
* | ||
* @param encoding the encoding to use for decoding (e.g., "utf-8"). |
Check notice
Code scanning / CodeQL
Spurious Javadoc @param tags Note
* Optionally, specific bytes can be deleted during decoding. | ||
* | ||
* @param encoding the encoding to use for decoding (e.g., "utf-8"). | ||
* @param delete the bytes to delete during decoding, or {@code null} for no |
Check notice
Code scanning / CodeQL
Spurious Javadoc @param tags Note
Started work on unittesting and documentation. That is a very large amount of typing. I am still trying to decide if the protocols rightly belong in the python.lang package or if they should be in their own python.protocol package. I moved PyBuiltIn to the python.lang as it looked better there. Still up for work.
|
@astrelsky Looks like it won't be until next weekend that I finish with the probe interface which creates mixins. But I did get a lot of the documentation and a portion of the test bench so that I can see what the code may actually look like. Most of the Python language features are in "PyBuiltIn" which is intended to use the the statement The item that bothers me the most is what should the method name to retrieve the Python static type element from each class. I had a conflict with getType() in base PyObject, though I have dropped the method and can rename them now. Having two class systems will be a challenge. The intent is that is we do have an actual Python module converted to Java we would start by scanning the type info and autogenerating a Java stub package which then registers itself as a service. THat would add the new types to the autoconversion facility. It will still be clucky to as most interfaces just give you a PyObject and then the user needs to know what to cast it to. I haven't looked into jep to see how much we would have to do to be on parity in terms of capabilities. I assume that I have overshot in that I am implementing significant details of the Python language. |
I could have sworn I said that I was going to try and put an example together using json this weekend so I can try it out and provide feedback, but I can't find where I said this. Anyway I'm aware that it isn't currently in a functional state. Let me know when it is and the commit hash and I'll set aside some time to put something together and provide feedback. I'll need the commit hash because I anticipate you'll keep working and break things, this way I don't need to go searching to find a working state. |
@@ -0,0 +1,878 @@ | |||
import jpype |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note test
def iter_filter(self, callable): | ||
return filter(callable, self) | ||
|
||
def list_add(self, e, v=missing): |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note test
@@ -204,8 +205,8 @@ | |||
break; | |||
case 5: | |||
case 6: | |||
is.skip(6); // double and long are special as they are double entries | |||
i++; // long and double take two slots | |||
is.skip(6); // Double and long are special as they take two slots |
Check notice
Code scanning / CodeQL
Ignored error status of call Note
* Creates a Python tuple from a variable-length array of arguments. | ||
* | ||
* @param items the objects to include in the tuple. | ||
* @param <T> the type of the objects. |
Check notice
Code scanning / CodeQL
Spurious Javadoc @param tags Note
public Iterator<V> iterator() | ||
{ | ||
Function<K, V> function = p -> this.map.get(p); | ||
Iterator<?> iter = this.map.keySet().iterator(); |
Check notice
Code scanning / CodeQL
Unread local variable Note
/** | ||
* Checks if the tuple contains the specified object. | ||
* | ||
* @param o the object to check |
Check notice
Code scanning / CodeQL
Spurious Javadoc @param tags Note
Still working on the probe api. While I am making progress it takes a while to work through all the details required to pull this off. I implemeted the backend to match the current interfaces, but still need to add C++ hooks to make tgr type conversion magic all work. |
Just wanted to clarify that I'm in no rush. I actually didn't want to yesterday but decided to try and take a look anyway because I thought I said I would. I am very much absorbed in something else at the moment. |
All good. I am rushing for personal reasons as I know that this will take 80+ hours if I want a full and complete wrapper of Python from within Java. It is hard to maintain focus for that many nights and weekends as I have many other projects that I work on. I appreciate your support in getting this capability launched. |
if (interface == nullptr) | ||
return nullptr; | ||
|
||
PyObject* methods_item = PyDict_GetItem(_methodsDict, interface); // Borrowed reference |
Check notice
Code scanning / CodeQL
Declaration hides variable Note
line 840
I need to finish and additional foundational PR before I can resume this one. This PR makes it frightenly easy to end up with irresolvable reference loops. I believe that I can set up a cross language garbage collector that deals with the issue. Required changes: Next we add two sentinels into the Python gc to provide memory barriers for interactions between java held python objects and python held java objects. This allows us to turn this irresolvable loops into java side resolvable ones. |
This is a placeholder for now.