Skip to content

Add vm create collector#23

Merged
mohanson merged 5 commits intonervosnetwork:mainfrom
mohanson-fork:vm_create_collector
Sep 2, 2025
Merged

Add vm create collector#23
mohanson merged 5 commits intonervosnetwork:mainfrom
mohanson-fork:vm_create_collector

Conversation

@mohanson
Copy link
Collaborator

Collect VM creation information

@mohanson mohanson requested a review from xxuejie August 25, 2025 03:28

#[derive(Default, Clone)]
pub struct VmCreateCollector {
pending: Arc<Mutex<Option<(VmId, BinaryLocator)>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things:

  • Personally, I would not simply track the creation of VM, for each spawn calls, I would recommend track the caller VM ID and the callee VM ID, the goal here is really to be able to build a tree structure of invocations of VMs within a single script group. Exec syscalls would also fit the tree structure.
  • To me the invocation tree is in parallel to the BinaryLocator collector: there shall be one collector tracking the invocations between VMs, and there shall be another collector tracking the executed binary for each VM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code.

  • Now the caller is stored in VmCreateCollector.data.CollectorKey and the callee is stored in VmCreateCollector.data.Vec<VmId> , allowing me to fully construct the call tree.
  • You're correct. I used VmCreateCollector and BinaryLocatorCollector together, forming BinaryLocatorCollector<VmCreateCollector> , which allows me to track both the VM ID and the binary of the VM.

@mohanson mohanson force-pushed the vm_create_collector branch from 390dc01 to 3c37d0e Compare August 26, 2025 04:12
Copy link
Contributor

@xxuejie xxuejie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks okay.

However, if I were doing this, I would make 2 changes:

  • I would track exec (generation changes) in addition to spawn syscalls. They could also be helpful in generation of the VM invocation tree. Explicitly stating potential exec calls helps reveal the structure better.
  • What is needed here is a CombineCollector or similar names, which takes more than one collector, runs them all, and collects all the results. This can be a generic feature so collector tracking VM creation, and collector tracking loaded binaries can be loaded together.

@mohanson
Copy link
Collaborator Author

The code looks okay.

However, if I were doing this, I would make 2 changes:

  • I would track exec (generation changes) in addition to spawn syscalls. They could also be helpful in generation of the VM invocation tree. Explicitly stating potential exec calls helps reveal the structure better.
  • What is needed here is a CombineCollector or similar names, which takes more than one collector, runs them all, and collects all the results. This can be a generic feature so collector tracking VM creation, and collector tracking loaded binaries can be loaded together.

It is noted that the key of the collected data's key is CollectorKey:

pub struct CollectorKey {
    pub vm_id: VmId,
    pub generation_id: u64,
}

so in fact, we can already reconstruct the exec call stack just through CollectorKey. For example, if there are two keys CollectorKey {vm_id: 0, generation_id: 0} and CollectorKey {vm_id: 0, generation_id: 1}, we can infer that the former execs the latter.

Regarding the second point, I really want to implement it, but I’m currently unsure how to structure the code. Logically, I’d like to put different Collectors into a Vector, something like:

struct CombineCollect {
    inner: Vec<Box<dyn Collector<Trace = ???>>>
}

However, since Collector has an unknown associated type Trace, I cannot put different Collectors into a single Vector.

pub trait Collector: Clone + Default {
    type Trace;
}

I can think of some very "inelegant" implementation approaches, like the code below, but this is really not what I want.

struct CombineCollect<T: Collector, U: Collector, V: Collector, ...> {
    c1: T,
    c2: U,
    c3: V,
    ...
}

Do you have any ideas? @xxuejie

@xxuejie
Copy link
Contributor

xxuejie commented Aug 29, 2025

I can only say from personal experience that the following style of code:

struct CombineCollect<T: Collector, U: Collector, V: Collector, ...> {
    c1: T,
    c2: U,
    c3: V,
    ...
}

is really common practice in this industry. I see code like this everywhere deep down complicated projects, such as:

@xxuejie
Copy link
Contributor

xxuejie commented Aug 29, 2025

so in fact, we can already reconstruct the exec call stack just through CollectorKey. For example, if there are two keys CollectorKey {vm_id: 0, generation_id: 0} and CollectorKey {vm_id: 0, generation_id: 1}, we can infer that the former execs the latter.

I can only say that if I were doing this, I'd make this more explicit than current behavior. Or you think it is possible, you should do it when sealing the data.

@mohanson mohanson force-pushed the vm_create_collector branch from f05b506 to 8833a13 Compare September 1, 2025 02:21
@mohanson
Copy link
Collaborator Author

mohanson commented Sep 1, 2025

@xxuejie I have implemented the two improvements mentioned above, please review

}

impl Collector for VmCreateCollector {
type Trace = Vec<CollectorKey>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just discuss this Trace type, assuming I want to use this collector to build a VM invocation tree for a particular CKB script, how should I use the data structure here?

And just to add more: yes we will need an answer here, but it would be best if the answer to this question, is also an example for current crate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A practical example is: https://github.com/mohanson-fork/ckb-standalone-debugger/blob/tracer/ckb-debugger/src/main.rs#L466-L518

$ ckb-debugger --mode full --tx-file examples/spawn.json

It will output:

Spawn tree: 0                    cell_dep[0][0..101624]
Spawn tree: └── 1                cell_dep[1][0..100960]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, this is my fault, I forgot that seal returns a HashMap, not just a Trace type, in that sense this should be fine.

I would approve this, but I still recommend a simple minimal example, not just part of the debugger.


define_combine_collector!(CombineCollector2, CombineCollector2Trace, c1: C, c2: D);
define_combine_collector!(CombineCollector3, CombineCollector3Trace, c1: C, c2: D, c3: E);
define_combine_collector!(CombineCollector4, CombineCollector4Trace, c1: C, c2: D, c3: E, c4: F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you are writing a macro, I would recommend write it from 2 all the way to 16.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mohanson mohanson merged commit 3386de3 into nervosnetwork:main Sep 2, 2025
1 check passed
@mohanson mohanson deleted the vm_create_collector branch September 2, 2025 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants