Skip to content

Recover From Arbitrary Tor Crashes #190

Open
bitmold wants to merge 5 commits intomasterfrom
abort-report
Open

Recover From Arbitrary Tor Crashes #190
bitmold wants to merge 5 commits intomasterfrom
abort-report

Conversation

@bitmold
Copy link
Copy Markdown
Contributor

@bitmold bitmold commented Mar 1, 2026

Moving tor into its own process has helped Orbot not completely crash whenever there's an error in tor - these often come from torrerr.h's tor_raw_abort_(void) method.

When this happens, Orbot gets into so many broken states. You have to reopen it, see that it's broken, and force stop the app. It's not really clear to the user that the background process stopped, other than that the VPN died. if users don't have "always-on VPN" and "block connections without VPN" they might not even realize that Orbot suddenly broke in the background and are now browsing in the clear...

With a small method inserted into tor_raw_abort we can actually quickly invoke a method in TorService that can be used to tell Orbot, hey, the tor process is about to be killed. This can allow users to automatically recover from the process, and can also be used to inform the user that Tor had a technical difficulty and will be back online shortly. This is so much better than users having to reopen the app, force stop & restart it/etc.

This works by:

  • Attempting to give Orbot access to JNI_GetCreatedJavaVMs - which you normally need a min API of 31 to access, but it's often been present on Android devices throughout the years just not officially included in what the NDK exposes. We can use this method to grab a reference to the JVM. I've found that some huge libraries do attempt this kind of approach. So even though JNI_GetCreatedJavaVMs isn't linked into apps building against the NDK, it's often found in shared objects that are on Android devices and can be dynamically loaded. Most devices have at least one of: [libnativehelper.so](https://android.googlesource.com/platform/libnativehelper/+/HEAD/include_jni/jni.h?autodive=0#1103), libart.so, libdvm.so with a working implementation of this method.
  • So first, try to see if you can dlopen() any of these shared objects. If you can't find it after trying all of them if the JNI method isn't found, just give up and let tor_raw_abort() do what it does.
  • If you succeeded in getting a dynamically linked reference to that method, use it to obtain a reference in C to the TorService java class. If you can load the TorService in the JVM, we can invoke a new static method that I've defined in TorService to quickly blurt out an Intent to the main Obrot app saying that tor's about to die, right before it the abort() takes place.
  • But if you can't reach the method in TorService, just return and go back to the abort(). Again, when this new apporach fails, you just fall back to the behavior we have today. You'll hit tor_raw_abort_ , tor will crash, and users will be a bit confused with Orbot...
  • This implementation seems pretty solid after trying it across various Android APIs and devices. If the C-> Java pathway isn't possible, it fails gracefully. If for some reason this new C code causes some unforeseen problem that brings about a crash, well.... that's still what was about to happen anyway. (this happened a bit before I finished debugging this implementation...) 🤷‍♀️ So even if normally this mentality of "idk it shouldn't crash but it might" 🤷‍♀️ is completely unacceptable, it's actually something useful and harmless here because one way or the other everything's just going to fatally crash & go up in flames all the same 😈

Here's the changes to tor's liberr.c (putting #imports and stuff here just to make everything visible on this page in one place, it's one new method that we invoke once in tor_raw_abort()):

#define LOGCAT_TAG "OrbotAbortReport"
#include <jni.h>
#include <dlfcn.h>
#include <android/log.h>
typedef jint (*GetCreatedJavaVMs)(JavaVM **pvm[], jint count, jint *num);
void attempt_to_tell_orbot_of_abort(void);
void attempt_to_tell_orbot_of_abort(void)
{
  static const char* possibleLibs[] = {"libnativehelper.so", "libart.so", "libdvm.so", NULL};
  GetCreatedJavaVMs getCreatedJavaVMs;
  const char ** libPointer = possibleLibs;
  while (*libPointer != NULL) {
    void *lib = dlopen(*libPointer, RTLD_NOW);
    if (lib) {
      getCreatedJavaVMs = (GetCreatedJavaVMs)dlsym(lib, "JNI_GetCreatedJavaVMs");
      break;
    }
    libPointer++;
  }

  if (!getCreatedJavaVMs) {
    __android_log_print(ANDROID_LOG_VERBOSE, LOGCAT_TAG, "couldnt dlsym JNI_GetCreatedJavaVMs method");
    return;
  }

  JavaVM *jvm;
  int jvmCreatedSize;
  jint result = getCreatedJavaVMs(&jvm, 1, &jvmCreatedSize);
  if (result != JNI_OK) {
    __android_log_print(ANDROID_LOG_VERBOSE, LOGCAT_TAG, "issue with JVM");
    return;
  }

  // Attach the current thread to the JVM
  JNIEnv* jniEnv;
  result = (*jvm)->AttachCurrentThread(jvm, (void**)&jniEnv, NULL);
  if (result != JNI_OK) {
    __android_log_print(ANDROID_LOG_VERBOSE, LOGCAT_TAG, "couldn't attach to current thread in JVM");
    return;
  }

  jclass clazz = (*jniEnv)->FindClass(jniEnv, "org/torproject/jni/TorService");
  if (!clazz) {
    __android_log_print(ANDROID_LOG_VERBOSE, LOGCAT_TAG, "couldn't find TorService class");
    return;
  }

  jmethodID func = (*jniEnv)->GetStaticMethodID(jniEnv, clazz, "onTorRawAbort", "()V");
  if (!func) {
    __android_log_print(ANDROID_LOG_VERBOSE, LOGCAT_TAG, "couldn't find onTorRawAbort() in Java");
    return;
  }

  (*jniEnv)->CallStaticVoidMethod(jniEnv, clazz, func);
  (*jvm)->DetachCurrentThread(jvm);
}

^ we need this kind of nonsense because you can't start from C and jump into Java. Under normal circumstances, it is possible to go from Java -> C -> Java, but you are always starting from Java. When you go from Java -> C your C method has a JNIEnv pointer as one of its parameters, this is the critical thing you need to then go about invoking arbitrary Java code from within a C method..

So with the normal constraints (which are otherwise sensible) it's impossible to deliver the news to Orbot. We have to do this cursed ritual in order to pull the necessary JNIEnv pointer out of thin air.

I've been testing this in Orbot by creating a method in TorService's C code that calls tor_raw_abort_(). I wired it into a button on the main screen and have been using it to crash the app on demand. I've been consistently had success running some Java just before tor's abort() nukes everything...

#include "lib/err/torerr.h"
JNIEXPORT jboolean JNICALL
Java_org_torproject_jni_TorService_fatal
(JNIEnv *env, jobject thisObj)
{
// method can trivially be called in Java via `TorService.fatal()`
 tor_raw_abort_(); 
 return true;
}

private native int runMain();


public native static boolean fatal();
Copy link
Copy Markdown
Contributor Author

@bitmold bitmold Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this fatal() method and the corresponding C method JNIEXPORT jboolean JNICALL Java_org_torproject_jni_TorService_fatal(JNIEnv *env, jobject thisObj);. This is just temporarily here for testing

@bitmold bitmold changed the title Recover From Tor Crashes Recover From Arbitrary Tor Crashes Mar 1, 2026
@bitmold bitmold requested a review from n8fr8 March 1, 2026 13:32
@bitmold bitmold force-pushed the abort-report branch 3 times, most recently from 5d73f6d to ba73a7f Compare March 1, 2026 15:31
@syphyr
Copy link
Copy Markdown
Contributor

syphyr commented Mar 1, 2026

What branch should I be looking at for the required changes to Tor?

@bitmold
Copy link
Copy Markdown
Contributor Author

bitmold commented Mar 1, 2026

For tor-android it's this branch abort-report
For tor, the repo is https://github.com/bitmold/tor and the branch is also abort-report.

I think if you run this in tor-android it should add my fork as a remote, and then automatically grab the branch. Not 100% certain right now.:

cd external/tor
git remote add bitmold https://github.com/bitmold/tor 
git fetch bitmold 
cd ../...
./tor-droid-make.sh fetch -c 

I don't have a modified version of Orbot online yet. I didn't change it much, I was just printing debug logs in TorService and sending an unused Intent to the main application.

@syphyr
Copy link
Copy Markdown
Contributor

syphyr commented Mar 1, 2026

Thanks.

I just cherry picked your commit since I have additional changes that I want to keep.

git fetch https://github.com/bitmold/tor abort-report && git cherry-pick e150d5b1ef6aec8f4e49553216751ea22fdc023b

@syphyr
Copy link
Copy Markdown
Contributor

syphyr commented Mar 1, 2026

One thing I have noticed since Tor is now running in it's own process is when Android is under heavy load, lowmemory killer can sometimes kill Orbot's process. But, when that happens, the Tor service now becomes zombified. Simply restarting Orbot does not work in this case. I have to go into android developer settings, and then select running services and manually stop the remaining cached Orbot processes before Orbot will run properly again.

@bitmold bitmold force-pushed the abort-report branch 2 times, most recently from da4c889 to 51e6203 Compare March 2, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants