refactor(BA-3160): Separate loader and writer of Kernel registry recovery#6958
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the kernel registry code by separating loading and writing responsibilities into distinct components, following the Single Responsibility Principle.
- Introduces separate loader and writer abstractions with concrete pickle-based implementations
- Creates a composed
PickleBasedKernelRegistrythat uses both loader and writer components - Converts
kernel_registry.pyfrom a class definition to a type variable for generic typing support
Reviewed changes
Copilot reviewed 7 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai/backend/agent/kernel_registry/kernel_registry.py | Replaced KernelRegistry class with TKernelRegistry type variable for generic typing |
| src/ai/backend/agent/kernel_registry/loader/abc.py | Renamed AbstractKernelRegistryRecovery to AbstractKernelRegistryLoader and removed save method, now focused only on loading |
| src/ai/backend/agent/kernel_registry/loader/pickle_based.py | Refactored to PickleBasedKernelRegistryLoader, removed save functionality and simplified initialization with explicit path parameters |
| src/ai/backend/agent/kernel_registry/writer/abc.py | New abstract base class for writer functionality |
| src/ai/backend/agent/kernel_registry/writer/types.py | New data class for save metadata |
| src/ai/backend/agent/kernel_registry/writer/pickle_based.py | New pickle-based writer implementation extracted from the original loader |
| src/ai/backend/agent/kernel_registry/writer/noop.py | New no-op writer implementation for scenarios where persistence is not needed |
| src/ai/backend/agent/kernel_registry/pickle/kernel_registry.py | New composition class that combines loader and writer for complete pickle-based registry operations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2ac0bc5 to
b1def96
Compare
| ) | ||
| try: | ||
| with open(final_file_path, "rb") as f: | ||
| return pickle.load(f) |
Check notice
Code scanning / devskim
Deserializing attacker-supplied data using `pickle` or `cPickle` can result in code execution. Note
|
|
||
| @abstractmethod | ||
| async def load_kernel_registry(self) -> KernelRegistry: | ||
| async def load_kernel_registry(self) -> TKernelRegistry: |
There was a problem hiding this comment.
Is there a reason to make it generic?
| class PickleBasedKernelRegistryRecovery: | ||
| def __init__(self, args: PickleBasedKernelRegistryRecoveryArgs) -> None: | ||
| registry_file_name = f"kernel_registry.{args.agent_id}.dat" | ||
| fallback_registry_file_name = f"kernel_registry.{args.local_instance_id}.dat" | ||
| legacy_registry_file_path = args.ipc_base_path / registry_file_name | ||
| fallback_registry_file_path = args.var_base_path / fallback_registry_file_name | ||
| last_registry_file_path = args.var_base_path / registry_file_name | ||
|
|
||
| self._loader = PickleBasedKernelRegistryLoader( | ||
| last_registry_file_path, fallback_registry_file_path, legacy_registry_file_path | ||
| ) | ||
| self._writer = PickleBasedKernelRegistryWriter(last_registry_file_path) | ||
|
|
||
| async def save_kernel_registry( | ||
| self, registry: KernelRegistry, metadata: KernelRegistrySaveMetadata | ||
| ) -> None: | ||
| await self._writer.save_kernel_registry(registry, metadata) | ||
|
|
||
| async def load_kernel_registry(self) -> KernelRegistry: | ||
| return await self._loader.load_kernel_registry() |
There was a problem hiding this comment.
This doesn't seem right. I think we need to create a structure that receives multiple loaders and a structure that receives one writer.
|
|
||
| from ai.backend.logging import BraceStyleAdapter | ||
|
|
||
| from ...kernel import KernelRegistry |
There was a problem hiding this comment.
KernelRegistry here is not the correct type. KernelRegistry is actually a MutableMapping[AgentKernelRegistryKey, AbstractKernel], where AgentKernelRegistryKey is essentially a named tuple of AgentId and KernelId.
The name of the class is a bit misleading, but this class is actually a global registry of all kernels shared by all agents within the same agent runtime. Individual agents get a "slice" or "view" of the global KernelRegistry, which is an object of type KernelRegistryAgentMapping.
(it is obtained through calling KernelRegistry.agent_mapping(AgentId) and passed to Agent on construction)
There is currently no type alias defined for a Mapping[KernelId, AbstractKernel] or a MutableMapping[KernelId, AbstractKernel]
| log = BraceStyleAdapter(logging.getLogger(__spec__.name)) | ||
|
|
||
|
|
||
| class NoopKernelRegistryWriter(AbstractKernelRegistryWriter[None]): |
There was a problem hiding this comment.
Maybe we can just remove generics here and just make the no-op writer ignore whatever Mapping object given to it.
Also we could define a loader that also just always returns an empty dictionary {}
hhoikoo
left a comment
There was a problem hiding this comment.
Also, this is not something that should be done in this change, but after this is merged we could/should move the existing KernelRegistry and related classes (KernelRegistryAgentMapping and KernelRegistryGlobalView) currently in src/ai/backend/agent/kernel.py to this submodule
resolves #6957 (BA-3160)
Checklist: (if applicable)
ai.backend.testdocsdirectory