The backend system for Pwno. We make ways for LLMs to security research on binaries. This is the system we design to orchestrate pwno-mcp with FastAPI, k8s (GCP Autopilot), Supabase. From uploading a binary (authorizations, storage, API designs...); to exposing an MCP instance to the public (libc (env) setups, k8s ingress, cloudflared...).
This backend went through an iteration of around 7 months, I started to work on this on late March, it had been ~7 months of iterations working on it. The core idea of it also changed pretty much but not much, It's about making the perfect wheel for letting LLMs security research on a binary, effortlessly. When I just got started, I knew nothing about k8s, it was really a huge pain-in-the-ass, I remember debugging a simple issue for at least 2 weeks then finding out I need to rewrite the entire component (it was a GCP k8s authentication problem), but at the meantime I learn incredible on k8s, everything.
Credibly most of time spent on this backend, instead of building, is rather on thinking. E.g., one question we always have about how to build a good system balancing both the utility and UX. Since we build this originally orienting to security researchers, observability and "looking cool" is a question for us too. As you read past this, you will see how we switch from buffering k8s native streams and pure socket I/O based transportation, with a centralized MCP backend to the move each MCP server to k8s instances themselves, and then expose them via k8s ingress, and how we tried load balancer at first and how it didn't work out great (we're looking at around 10 sec spin-up time for load-balancer, switching to ingress+cloudflared only take us 1 sec!). The entire k8s architecture was pushed and rebuilt at a point for efficiency for LLMs.
I am open sourcing this, on one hand, because we're moving our focus to a more ingenious angle for this question of "putting LLMs into innermost security (low-level security)", pre-binary research is cool but we're moving to a more ambitious pov, on the other hand I think k8s is the type of thing if you see other people build it, you learn so much faster., I want this to help people build their own thing.
On the other of the other, I wrote this system, so I have the right to... (hahaha), past 7 months is really a bit hard, we're gaining little traction since we're still figuring things out, but it's always a process and we trust what the experience taught us. From a bit selfish POV, I just want to show people what I worked on for the past days :)
History of past 7 months
- I first tried writing a
gdbplugin and executing via gdb python api (autogdb.io), integrating MCP backend on a single backend, authorization by rewriting bit low-level implementation of early MCP's SSE (this was around March, see this post): Didn't work out well, since first of capturing program stdio was a problem forgdbAPIs (we did tried delimiters but another story regarding timing we will mention later), while stopping multi-thread binary is bit problematic (makes the entire part of actual executing pretty much unusable) although this version was pretty scalable with only one command backend was enough. (autogdb was only solving the problem of connecting from your debugging (research) machine to agent client for research), it sounds easy but it was mixed with jumping between frontend, auth and specific compatibilization problem.- After realizing the scalability problem of autogdb.io, I started this idea of bring even the entire research environment on cloud with scalable pre-configured environments. Tons of time learning and making mistakes in k8s specifically gke, pretty much starting learning everything from thin air. We got a working MVP on around 2 weeks diving into this (back then I still have my AP exams). Anyway backend it's still a major problem of "how to start a environment for everyone, and how to let everyone access their own environment?" We still sticked with the original centralized MCP backend approach, but this time we assign a k8s stream channel for each users, and use these io channels on one hand to natively interact with gdb (with delimiters), this was still intended to solve the problem of program IO capturing, it's a tricky problem, I then thought about you should also let users see their gdb session on cloud, so I came up with the approach of duplicating a stdio channel back into frontend via k8s's stream and websockets, with around 2 months of development, we got our pwno.io up-and-running, but still tons of problems that spent incredible amount of time that I didnt mentioned, from gke integration to network issues.
- pwno.io was working I can't say well, but at a working level, there's still asynchronization problems and gke native problems but we managed to solve the most pain-in-the-ass scalability, interactive IO problem that we spent around by far 3 months on. This is when I started working on pwnuous our cooperation with GGML, which will need a new thing like the previous version of pwno-mcp but for more stable support. Since for previous version, we're plugged into GDB via direct IO stream, asynchronization problem as I mentioned was another huge pain-in-the-ass, some IO slipped away and it just not stable enough for use. This is when I started thinking rewriting everything, and throw away some part just for usability for LLMs and it's full agentic compatibilization. I was working on my black hat talk back then so thought a little about statefulness, learnt about this wonderful thing that just seem to be born for us GDB/MI (Debugging with GDB), I spent few days rewriting the entire thing by reading docs. I definitely did spent less time conceptualizing backend architecture for pwno.io for this version of
pwno-mcp(around 2 days mainly on gke gateway things), it's definite not a very elaborate or sophisticated framework by all mean, but it did came from a shit tons of experience of trial-and-erroring myself while thinking about the question of making something that's can scale (multi-agent, researcher using it), so I will say it's by far the best conceptualizations and work to best serve for the purpose of LLMs using it stability and scalability.