Skip to content

Java to Python bridge #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from
Draft

Java to Python bridge #1274

wants to merge 14 commits into from

Conversation

Thrameos
Copy link
Contributor

This is a placeholder for now.

Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 52.85996% with 239 lines in your changes missing coverage. Please review.

Project coverage is 85.29%. Comparing base (60398fc) to head (ab99467).

Files with missing lines Patch % Lines
native/common/jp_bridge.cpp 0.00% 147 Missing ⚠️
jpype/_jbridge.py 70.03% 86 Missing ⚠️
native/common/include/jp_exception.h 0.00% 2 Missing ⚠️
native/common/jp_context.cpp 80.00% 0 Missing and 2 partials ⚠️
native/common/jp_proxy.cpp 95.65% 0 Missing and 1 partial ⚠️
native/python/pyjp_proxy.cpp 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1274      +/-   ##
==========================================
- Coverage   86.96%   85.29%   -1.67%     
==========================================
  Files         113      115       +2     
  Lines       10354    10812     +458     
  Branches     4064     4184     +120     
==========================================
+ Hits         9004     9222     +218     
- Misses        757      995     +238     
- Partials      593      595       +2     
Files with missing lines Coverage Δ
jpype/__init__.py 95.23% <100.00%> (+0.11%) ⬆️
jpype/_core.py 84.82% <100.00%> (+0.10%) ⬆️
jpype/_jproxy.py 97.87% <100.00%> (ø)
native/common/include/jp_class.h 100.00% <ø> (ø)
native/common/include/jp_proxy.h 100.00% <ø> (ø)
native/common/jp_class.cpp 89.71% <100.00%> (+0.34%) ⬆️
native/common/jp_classhints.cpp 76.72% <100.00%> (+0.67%) ⬆️
native/common/jp_exception.cpp 78.73% <ø> (ø)
native/common/jp_functional.cpp 78.94% <100.00%> (+0.56%) ⬆️
native/python/include/pyjp.h 100.00% <ø> (ø)
... and 6 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@astrelsky
Copy link
Contributor

astrelsky commented Apr 11, 2025

How do you intend to make this usable from a java project? The jpype jar is bundled with the pypi wheel so it would make it a bit complicated unless it was made available independently on maven central or something. The only way I see that working is if it is in its own jar outside of the jpype jar.

Would it make sense to have PyObject implement Map to obey the unwritten rule that everything in python is a dict? (As I type I'm leaning more towards no). Maybe it doesn't need to be everything but I think you should be able to get a list of all the attributes as Java strings for example.

Or is this intended to only ever be an example?

@Thrameos
Copy link
Contributor Author

With regards to distribution, It requires the Jpype distribution and C++ code to function so it would be bundled with JPype. Though I expect that if one were to operate purely as a reverse bridge it would like want be formulated as a maven build in which it builds a portion of the support jar, then requires jpype installed in Python for the rest. Unfortunately, I don't play much in that field so I would be happy if some had more insight. Under the original plan from years ago I was going to build two jars "org.jpype" for the forward bridge, and "org.epypj" for the reverse bridge. Both would be built from Python, but the reverse would have a separate packager so that one could install from maven. For testing I am building the support pieces into our current jar, but that is not the final home.

As for function as map, I would not want it to give map function freely. If you want to get the dictionary like behavior then you would ask for the member to produce a wrapper for it. That is the reason for all those additional interfaces like PyTuple, PyList, PyDict. Those are wrappers what add to the behavior that is available. If I were to add dictionary behaviors to the base Object it would conflict with those specializations. I want all the Python standard collections to map the the same standard collections in Java.

As for wrapping attributes we would have two choices:

  1. An attributions() property that gives a Map like view with Java centered notations (add, remove, size, entrySet, keys, values)
  2. Expose the same set of members that Python uses (getattr, hasattr, delattr, setattr)
  3. Both??

I did the second as it was quick and dirty for this demonstration, but logically by my philosophy I prefer language bindings to best present in the language being used (except for naming conventions as that will break documentation lookup).

As to what this PR is.... Currently it is a demonstration with the goal of finding enough interest to make it part of our project. I did a very intricate design last time geared towards speed, but just didn't find sufficient users to justify the hours required to complete the test bench. This is a lightweight design could be grown into something larger without the burden of requiring restructuring of the jpype core (which doomed the last effort).

Thanks for the feedback. As this is an experiment in progress user feedback is required to come up with a design that fits the needs.

Important design decisions thus far. Previously, I was returning Java objects because one can't tell if a return object is going to be Python or Java though Java was a rare case. This forced a huge amount of casting on the user when I was working on testing. This time I return a PyObjectJava which is a wrapper to access the Java Object. This is fairly critical as collections need to be of uniform type. I believe usability favors this direction...

   PyObject obj;

   obj.getAttr("field").asFloat()  <== if everything is expected to be PyObject on returns then we don't cast here allowing chaining.

   for (Map.Entry<PyObject,PyObject> entry : obj.attribututes().entrySet())
   {
      // if the entry contains Java you will need to check and call get().
     if (!(entry.getValue() instanceof PyObjectJava))
        continue;  // skipping to the Java attributes

     MyClass c = (MyClass) entry.getValue().get();  // casting was going to be needed anyway so no real loss here
  }

@astrelsky
Copy link
Contributor

astrelsky commented Apr 11, 2025

My question on the bundling and distribution is more towards just building a project that would use it. It would be weird to configure a Java project to look for a jar inside a python installation or virtual environment. You wouldn't bundle it with the project since jpype would load it.

Actually it may be interesting to be able to have a project include the jar too. It would allow some way of having java code that works whether or not jpype and python are running.

@Thrameos
Copy link
Contributor Author

@astrelsky I checked in an example of how it would be packaged. There would be a separate jar containing all the code required for the binding with the code required to span a Python interpreter scope. Unfortunately it does not yet work because I haven't installed the magic compiler switched required to get Java to properly load the library meaning it fails on busted symbols. (I remember it took me a while to find the compiler flags, but it should still be in epypj)

org.jpype.bridge.Bridge does most of the work. If figures out where Python libraries are located and tests jpype to make sure it can be loaded. It caches the values to ensure that the probe is only needed once. Afterward it would directly start by lunching libpython. There is one support hook required but that can be located in the main JPype dll so no need for a second native library.

One odd bug is a was not able to get it running from within netbeans. It appears that there is an env variable that must be set for Python to locate packages from site-library. I don't recall that issue the last time I worked on it.

@Thrameos
Copy link
Contributor Author

@astrelsky I successfully loaded the natives on the linux build. It also tested fine on the windows build. Thus all that is missing from it starting a Python session is the entry point for launching an interpreter.

Does this build system make sense to you?

When using the Java->Python bridge:

  • There would be a maven project that distributes portion that Java must compile against available in maven-central.
  • The compliable portion would mostly be a bunch of interfaces without any substance.
  • There would be a single entry point class which finds the Python install locates the real jar file and launches the entry point from the native portion which was shipped in the JPype Python package.
  • If JPype/Python is not found it would error out.
    (If there were a bundled version it would ship Python and set the properties to go to the bundled version.)

When using the Python->Java bridge:

  • JPype ships with both jars in the Python install directory.
  • Python launches the org.jpype jar and entry points.
  • During startup we load the reverse potion of the bridge from org.jpype.bridge so that the same behaviors used with reverse bridge are available in the forward bridge (if some function takes PyObject in Java we will convert to it).

You can follow the sequence in the Bridge class.

@astrelsky
Copy link
Contributor

@astrelsky I successfully loaded the natives on the linux build. It also tested fine on the windows build. Thus all that is missing from it starting a Python session is the entry point for launching an interpreter.

Does this build system make sense to you?

When using the Java->Python bridge:

  • There would be a maven project that distributes portion that Java must compile against available in maven-central.
  • The compliable portion would mostly be a bunch of interfaces without any substance.
  • There would be a single entry point class which finds the Python install locates the real jar file and launches the entry point from the native portion which was shipped in the JPype Python package.
  • If JPype/Python is not found it would error out.
    (If there were a bundled version it would ship Python and set the properties to go to the bundled version.)

When using the Python->Java bridge:

  • JPype ships with both jars in the Python install directory.
  • Python launches the org.jpype jar and entry points.
  • During startup we load the reverse potion of the bridge from org.jpype.bridge so that the same behaviors used with reverse bridge are available in the forward bridge (if some function takes PyObject in Java we will convert to it).

You can follow the sequence in the Bridge class.

Yes this makes sense to me. Thank you.

@Thrameos
Copy link
Contributor Author

@VincentDary I seems like I need to modify the proxy code to make it a bit more flexible which would make this easy. As currently written it can only take a dictionary or a instance object but not both. To make it easy to wrap a Python object it would be much better if one can both define the methods and the object to be passed as self separately.

I was able to pull over most of the rest of bridge code from the old repo that was relevant. I will ping you again when it is ready for some testing. If you want to influence the direction of the wrapper please make reviews as the work is in progress.

Currently the code is divided into two parts.

  • A set of objects that don't exist directly in Python which represent the Java interface to the interpreter.
  • A set of interfaces under python.lang representing actual Python concrete classes that can be manipulated.

The basic usage would be

  • Create a Bridge which starts the interpreter.
  • Use the bridge to create a Scope which represents a global dictionary. It could be named module, but I don't want confusion between actual Python modules or Java modules. It is just a dictionary that would be returned by globals().
  • Use methods in the scope to import packages, manipulate objects, evaluate python code.

You may have multiple scopes open in one Java program, but as they share an interpreter all module level changes will be global.

I am open to naming suggestions, feature requests, and general direction advice as I would rather have something that people are interested in using rather than another academic exercise.

@Thrameos
Copy link
Contributor Author

@astrelsky I put another session in. Does this look like a workable direction for the API?

I broke the Python object into two sets: concrete and protocols.

The basic object is just a fan out to protocols. I would try to make it castable with the Java cast operations, but there is no way to know if I have the right binding up front and without ASM support I can't freely mixin interfaces. Thus the user would need to request the protocol that best suits their needs from the object.

We then have a bunch of predefined concrete types which will get both object and protocols as well as any of the special behaviors that define the type.

Last the protocols define behaviors and where possible map back to Java concepts such that Python objects can be handled by typical Java Collections type classes.

As always this is a monster amount of typing and testing to make it work.

@astrelsky
Copy link
Contributor

@astrelsky I put another session in. Does this look like a workable direction for the API?

I broke the Python object into two sets: concrete and protocols.

The basic object is just a fan out to protocols. I would try to make it castable with the Java cast operations, but there is no way to know if I have the right binding up front and without ASM support I can't freely mixin interfaces. Thus the user would need to request the protocol that best suits their needs from the object.

We then have a bunch of predefined concrete types which will get both object and protocols as well as any of the special behaviors that define the type.

Last the protocols define behaviors and where possible map back to Java concepts such that Python objects can be handled by typical Java Collections type classes.

As always this is a monster amount of typing and testing to make it work.

I'll look through it over the week as I was already burnt out by the time I saw this today.

As a disclaimer, I hate just about every API until I actually use it.

I see the potential for it's use in Ghidra, which is where I would intend to use it as well, so maybe @ryanmkurtz could offer an opinion as well.

As for the ASM support, it is used in the extension classes anyways so maybe that will free that up? I'll open that pull request as a draft at for now just to get my ass moving on it again. It just needs to pull in commits, deal with conflicts, reformatted and then sit through review.

@Thrameos
Copy link
Contributor Author

I won't expect a lot of love when it comes for attempts to wrap an untyped language in a typed one. I have torn up the classes three times already and I can't say it is satisfying in any form. There is always a leakage between the arguments that can be taken, and what Python can wrap properly. It is also exposing some very strange edge cases in JPype that I will be fighting to fix. (Did you know that if a Java interface looked like and iterable the JProxy would assume the class methods were a list and tried to press it through a tuple?)

I will be trying to avoid asm this time for a few reasons. The main use I had for it was to create mixins so I could take a much of interfaces and mask them together so that the Java isinstance would actually be able to navigate them via casting. That meant probing a class, finding all its dunder functions and then making a wrapper with the right exposure. That was cool in that it looked neat, but it has some serious down sides.

  1. We will never be able to use it on Android.
  2. Objects often had conflicting interfaces which meant it was hard to get all the corner cases without some interfaces being badly compromised.

This time I am just going to accept that there may be many wrappers for the same Python object with different interfaces exposed for each. Though until I program with it for a while I am not sure how comfortable it will be.

As for automatically wrapping modules, if we aren't doing it on the fly we don't really need ASM. We could just generate code and compile it. So without the mixin there we could avoid it entirely.

Items that remain:

  1. Exception handling.
  2. Finish the collections behaviors.
  3. Testing.

@Thrameos
Copy link
Contributor Author

Very ugly segfault was fixed. I needed a handful of upgrades to the proxy system and it triggered a bunch of very nasty referencing errors. I was forced to unroll the whole native changes and apply them one by one to locate the problem. My secondary issue is that my build system on Windows produces corrupt dlls which means that everything must happen on WSL which is a much slower path. dlls built from azure work, just my local visual studio install.

Once I get the native part working under windows (may have to trigger a release to get a working binary) I can start asking netbeans to sub some test cases. That will be the first execise of paths.

Comment on lines 345 to 351
# with self.assertRaises(TypeError):
# _jpype._JProxy(None, None)
# with self.assertRaises(TypeError):
# _jpype._JProxy(None, [])
# with self.assertRaises(TypeError):
# _jpype._JProxy(None, [type])

Check notice

Code scanning / CodeQL

Commented-out code Note test

This comment appears to contain commented-out code.
@astrelsky
Copy link
Contributor

I just thought of this while looking through the code. I'm not sure if you already are or not but you might want to filter out access to the package from Python to prevent anything unexpected if you think that could cause a problem.

@Thrameos
Copy link
Contributor Author

Thrameos commented Apr 14, 2025 via email

{
jboolean copy;
const char* name = env->GetStringUTFChars(str, &copy);
for (std::list<void*>::iterator iter = libraries.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this is leftovers that still needs to be refactored but I couldn't find the Java native method declaration.

Also, the access to the libraries list is not thread safe. I know msvc does thread safe static globals by default (which is really stupid to create a mutex for every global by default and silently add lock accesses) but I don't know what gcc or clang does. I would like to think they are sane and you would encounter a problem here.

Sorry for going on a minor tangent, I can't help myself sometimes.

@Thrameos
Copy link
Contributor Author

Thrameos commented Apr 14, 2025 via email

@Thrameos
Copy link
Contributor Author

Thrameos commented Apr 15, 2025

@astrelsky It seems like there are some wrinkles with using maven to build the support Jar. I can't get the order of the main jar to function properly. The only way that it will work is if the org.jpype components are already in memory prior to start of the Python. If not then org.jpype doesn't get the permissions it needs to operate.

I don't think this is a show stopper. I often run with a org.jpype that did not come installed with Python during testing. I have even had it named differently than expected. What it does mean is that rather than using maven I will likely need to build the package the usual way under Python and then use Apache Ivy to publish a bare artifact to maven central rather than an actual maven pom. Yes that means there will be two copies of org.jpype on the users system. One under the Python site and the other under the Java resources. So long as one of them gets loaded on the class path it won't matter which copy we run from.

It does mean I will need to add some version guards to the process as one could imagine cases in which there was version sheer between the two copies.

(Oh well so much for maven.... back to using ant)

Scratch that.... ant is with modules is now completely busted in netbeans. So back to maven again!

@Thrameos
Copy link
Contributor Author

Thrameos commented Apr 15, 2025

@marscher Sorry about this turning into another monster PR. I was hoping to just put all of the changes into one directory and leave the rest untouched, but it didn't work out that way. The only way to get it functional is to place all of the contents in one tree and build a unified jar from it. That requires moving every file in the native/java into a native/jpype_module/src/main/java. Most of it is just simple motion, but even with git move it still makes for a huge amount of files changed.

I could split out the C++ changes into a separate PR if that helps. The C++ changes were to fix three bugs I found in the type system and make it so that one can create JProxy with both inst and dict set which is useful for mapping a new interface over an existing object.

One other important note.... getting this to function in Java 1.8 would be a hopeless task so I have to move the minimum version to 11 which next LTS.

*
* Optionally, specific bytes can be deleted during decoding.
*
* @param encoding the encoding to use for decoding (e.g., "utf-8").

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "encoding" does not match any actual parameter of method "fromHex()".
* Optionally, specific bytes can be deleted during decoding.
*
* @param encoding the encoding to use for decoding (e.g., "utf-8").
* @param delete the bytes to delete during decoding, or {@code null} for no

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "delete" does not match any actual parameter of method "fromHex()".
/**
* Creates an empty Python tuple.
*
* @param args the objects to include in the tuple.

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "args" does not match any actual parameter of method "tuple()".
@Thrameos
Copy link
Contributor Author

Started work on unittesting and documentation. That is a very large amount of typing.

I am still trying to decide if the protocols rightly belong in the python.lang package or if they should be in their own python.protocol package. I moved PyBuiltIn to the python.lang as it looked better there.

Still up for work.

  • Finish the probe mixin interface.
  • Plan how services will work so that we can implement extended wrappers for modules.
  • Exceptions
  • Resync the Python model with the updated Java code base.
  • Get a module built on the jpype.release so that I can test everything.

@Thrameos
Copy link
Contributor Author

Thrameos commented Apr 27, 2025

@astrelsky Looks like it won't be until next weekend that I finish with the probe interface which creates mixins. But I did get a lot of the documentation and a portion of the test bench so that I can see what the code may actually look like.

Most of the Python language features are in "PyBuiltIn" which is intended to use the the statement import static python.lang.PyBuiltIn.*. I have tried to keep it to Python names with moderate type safety. The rest is just wrappers to make it easier to interact with Python objects though those generally have Java naming convensions. I dropped the as* interfaces in favor of actual casting because the number of protocols had grown unwieldy. That means PyObject is very minimal without same cast applied to it.

The item that bothers me the most is what should the method name to retrieve the Python static type element from each class. I had a conflict with getType() in base PyObject, though I have dropped the method and can rename them now. Having two class systems will be a challenge.

The intent is that is we do have an actual Python module converted to Java we would start by scanning the type info and autogenerating a Java stub package which then registers itself as a service. THat would add the new types to the autoconversion facility. It will still be clucky to as most interfaces just give you a PyObject and then the user needs to know what to cast it to.

I haven't looked into jep to see how much we would have to do to be on parity in terms of capabilities. I assume that I have overshot in that I am implementing significant details of the Python language.

@astrelsky
Copy link
Contributor

I could have sworn I said that I was going to try and put an example together using json this weekend so I can try it out and provide feedback, but I can't find where I said this.

Anyway I'm aware that it isn't currently in a functional state. Let me know when it is and the commit hash and I'll set aside some time to put something together and provide feedback. I'll need the commit hash because I anticipate you'll keep working and break things, this way I don't need to go searching to find a working state.

@@ -0,0 +1,878 @@
import jpype

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note test

Module 'jpype' is imported with both 'import' and 'import from'.
def iter_filter(self, callable):
return filter(callable, self)

def list_add(self, e, v=missing):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note test

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
@@ -204,8 +205,8 @@
break;
case 5:
case 6:
is.skip(6); // double and long are special as they are double entries
i++; // long and double take two slots
is.skip(6); // Double and long are special as they take two slots

Check notice

Code scanning / CodeQL

Ignored error status of call Note

Method isPublic ignores exceptional return value of InputStream.skip.
* Creates a Python tuple from a variable-length array of arguments.
*
* @param items the objects to include in the tuple.
* @param <T> the type of the objects.

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "" does not match any actual parameter of method "tuple()".
public Iterator<V> iterator()
{
Function<K, V> function = p -> this.map.get(p);
Iterator<?> iter = this.map.keySet().iterator();

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'Iterator<?> iter' is never read.
/**
* Checks if the tuple contains the specified object.
*
* @param o the object to check

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "o" does not match any actual parameter of method "contains()".
@Thrameos
Copy link
Contributor Author

Still working on the probe api. While I am making progress it takes a while to work through all the details required to pull this off. I implemeted the backend to match the current interfaces, but still need to add C++ hooks to make tgr type conversion magic all work.

@astrelsky
Copy link
Contributor

astrelsky commented Apr 28, 2025

I could have sworn I said that I was going to try and put an example together using json this weekend so I can try it out and provide feedback, but I can't find where I said this.

Anyway I'm aware that it isn't currently in a functional state. Let me know when it is and the commit hash and I'll set aside some time to put something together and provide feedback. I'll need the commit hash because I anticipate you'll keep working and break things, this way I don't need to go searching to find a working state.

Just wanted to clarify that I'm in no rush. I actually didn't want to yesterday but decided to try and take a look anyway because I thought I said I would. I am very much absorbed in something else at the moment.

@Thrameos
Copy link
Contributor Author

All good. I am rushing for personal reasons as I know that this will take 80+ hours if I want a full and complete wrapper of Python from within Java. It is hard to maintain focus for that many nights and weekends as I have many other projects that I work on. I appreciate your support in getting this capability launched.

def initialize():
return
# Install the handler
bridge = JClass("org.jpype.bridge.Interpreter").getInstance()

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.
if (interface == nullptr)
return nullptr;

PyObject* methods_item = PyDict_GetItem(_methodsDict, interface); // Borrowed reference

Check notice

Code scanning / CodeQL

Declaration hides variable Note

Variable methods_item hides another variable of the same name (on
line 840
).
@Thrameos
Copy link
Contributor Author

Thrameos commented May 5, 2025

I need to finish and additional foundational PR before I can resume this one. This PR makes it frightenly easy to end up with irresolvable reference loops. I believe that I can set up a cross language garbage collector that deals with the issue.

Required changes:
We need to remove gc from as many objects as possible on drop local dictionaries on jobjects. It was technically possible to set attributes on jobjects though doing so was purely transient. This will allow us to make generally safer objects and potentially fix the memory placement problems that leads to version issues. (Still going to be edge cases, but allowing us to choose the location of the jslot should help)

Next we add two sentinels into the Python gc to provide memory barriers for interactions between java held python objects and python held java objects. This allows us to turn this irresolvable loops into java side resolvable ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement in capability planned for future release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants