Skip to content

Comments

Add multi-worker ResultServer: one process per VM#2923

Open
wmetcalf wants to merge 2 commits intokevoreilly:masterfrom
wmetcalf:multiworker-resultserver
Open

Add multi-worker ResultServer: one process per VM#2923
wmetcalf wants to merge 2 commits intokevoreilly:masterfrom
wmetcalf:multiworker-resultserver

Conversation

@wmetcalf
Copy link
Contributor

Summary

  • Adds optional multi-worker mode to ResultServer: one dedicated multiprocessing.Process per VM, each with its own gevent event loop on a unique port
  • Bypasses GIL serialization when many VMs upload results concurrently
  • Default off (multiworker = no) — existing single-port behavior completely unchanged
  • Auto-assigns sequential ports to VMs (base_port + machine_index) when no explicit resultserver_port is configured per machine
  • Fixes analysis_manager.py rooter calls to use machine.resultserver_port instead of global config port, so iptables rules open the correct per-VM port

Config

[resultserver]
# Enable multi-worker mode: one ResultServer process per VM
multiworker = no
# Base port for auto-assigning per-VM ports (multiworker mode)
base_port = 2042

Files changed

File Change
conf/default/cuckoo.conf.default Add multiworker and base_port settings
lib/cuckoo/core/resultserver.py Add ResultServerWorkerProcess, SingleVMResultServerWorker, refactor ResultServer singleton
lib/cuckoo/common/abstracts.py Auto-assign sequential ports when multiworker enabled
lib/cuckoo/core/analysis_manager.py Use machine.resultserver_port in all rooter calls

Test plan

  • multiworker = no — verify existing behavior unchanged
  • multiworker = yes with single VM — confirm results arrive on per-VM port
  • multiworker = yes with multiple concurrent VMs — confirm no cross-contamination
  • Kill a worker process mid-analysis — confirm replacement spawns on next task
  • Verify all routing modes (drop, inetsim, tor, socks5, VPN) open correct per-VM port

Please test — my testing environment is limited these days so additional eyes on this would be greatly appreciated.

Spawns a dedicated Python process per VM, each with its own gevent
event loop on a unique port, bypassing GIL serialization of result
uploads across concurrent analyses. Enabled via multiworker=yes in
cuckoo.conf [resultserver] section; default off preserves existing
single-port behavior unchanged.

- ResultServerWorkerProcess: multiprocessing.Process per VM
- SingleVMResultServerWorker: gevent server reading task_id from
  shared multiprocessing.Value
- Auto-assigns sequential ports (base_port + index) to machines
- analysis_manager rooter calls use machine.resultserver_port
  instead of global config port for correct per-VM firewall rules
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @wmetcalf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ResultServer's ability to handle concurrent result uploads from multiple virtual machines by introducing an optional multi-worker architecture. By dedicating a separate process and gevent event loop to each VM, it effectively bypasses Python's Global Interpreter Lock (GIL) limitations, improving performance and stability under heavy load. The changes are opt-in, ensuring backward compatibility with existing single-port configurations, and include automatic port assignment for ease of deployment.

Highlights

  • Multi-Worker ResultServer: Introduced an optional multi-worker mode for the ResultServer, dedicating a separate 'multiprocessing.Process' for each virtual machine.
  • GIL Bypass: Implemented per-VM gevent event loops on unique ports to mitigate Python GIL contention during concurrent result uploads.
  • Configuration Options: Added configuration options ('multiworker' and 'base_port') to 'cuckoo.conf.default' for enabling and configuring the new mode, which is off by default.
  • Automatic Port Assignment: Enabled automatic assignment of sequential ports to VMs when 'multiworker' mode is active and no explicit 'resultserver_port' is set.
  • Dynamic Port Usage in Analysis Manager: Updated 'analysis_manager.py' to ensure rooter calls utilize the correct, dynamically assigned per-VM 'resultserver_port'.
Changelog
  • conf/default/cuckoo.conf.default
    • Added 'multiworker' configuration option to enable/disable the new multi-worker mode.
    • Added 'base_port' configuration option for specifying the starting port for auto-assigned per-VM result servers.
  • lib/cuckoo/common/abstracts.py
    • Modified '_initialize' method to detect 'multiworker' mode and 'base_port' from configuration.
    • Updated the loop iterating through 'machine_id's to include 'machine_index'.
    • Implemented logic to auto-assign 'machine.resultserver_port' sequentially based on 'base_port' and 'machine_index' when 'multiworker' is enabled and no specific port is configured for the machine.
  • lib/cuckoo/core/analysis_manager.py
    • Updated all 'rooter' calls within 'route_network' and 'unroute_network' methods to use 'self.machine.resultserver_port' instead of 'self.cfg.resultserver.port'. This ensures that network routing rules are configured for the correct per-VM result server port.
  • lib/cuckoo/core/resultserver.py
    • Imported 'multiprocessing' and 'signal' modules.
    • Introduced 'SingleVMResultServerWorker' class, a 'GeventResultServerWorker' variant designed for multi-worker mode, which uses a shared 'multiprocessing.Value' for 'task_id' instead of IP lookup.
    • Added 'ResultServerWorkerProcess' class, a 'multiprocessing.Process' subclass, to encapsulate a dedicated gevent event loop and 'StreamServer' for a single VM, handling its own port and task management.
    • Refactored the 'ResultServer' singleton to support both legacy single-worker mode and the new multi-worker mode.
    • Added 'multiworker' and 'ip' attributes to 'ResultServer' initialization.
    • Implemented '_start_legacy_server' method to encapsulate the original single-port server setup.
    • Modified 'add_task' and 'del_task' methods to conditionally call '_add_task_multiworker' and '_del_task_multiworker' based on the 'multiworker' flag.
    • Added '_add_task_multiworker' method to manage spawning or reusing 'ResultServerWorkerProcess' instances for VMs and registering tasks.
    • Added '_del_task_multiworker' method to clear tasks from workers.
    • Added 'shutdown_workers' method to gracefully terminate all multi-worker processes.
Activity
  • No human activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a multi-worker mode for the ResultServer, which is a significant improvement for handling concurrent analyses by dedicating a process per VM. The implementation using multiprocessing is well-structured. The changes to configuration, machine initialization, and network routing are all consistent with this new architecture.

I've found a couple of areas for improvement: a potential race condition that could lead to misattributed analysis results, and a minor code cleanup opportunity.

Comment on lines +635 to +636
if not protocol:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential race condition here. The task could be cancelled and a new one started for this VM while negotiate_protocol() is blocking on I/O. The original GeventResultServerWorker handles this by re-checking the task ID after negotiation. This check is missing here and should be added to prevent results from being associated with the wrong task. This could lead to incorrect analysis results.

            if self._shared_task_id.value != task_id:
                log.warning(
                    "Task #%d for VM %s was cancelled during negotiation, new task is %s",
                    task_id, self._vm_ip, self._shared_task_id.value or "none"
                )
                return

            if not protocol:
                return

- Add task_id re-check after negotiate_protocol() to prevent results
  from being associated with wrong task if cancelled during I/O
- Remove unused task_id parameter from clear_task()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant