Skip to content

refactor(BA-3160): Separate loader and writer of Kernel registry recovery#6958

Merged
HyeockJinKim merged 4 commits into
mainfrom
refactor/divide-loader-writer-in-kernel-registry
Nov 26, 2025
Merged

refactor(BA-3160): Separate loader and writer of Kernel registry recovery#6958
HyeockJinKim merged 4 commits into
mainfrom
refactor/divide-loader-writer-in-kernel-registry

Conversation

@fregataa

@fregataa fregataa commented Nov 26, 2025

Copy link
Copy Markdown
Member

resolves #6957 (BA-3160)

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@fregataa fregataa added this to the 25.18 milestone Nov 26, 2025
@fregataa fregataa self-assigned this Nov 26, 2025
Copilot AI review requested due to automatic review settings November 26, 2025 11:02
@github-actions github-actions Bot added size:L 100~500 LoC comp:agent Related to Agent component labels Nov 26, 2025
@fregataa fregataa marked this pull request as draft November 26, 2025 11:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the kernel registry code by separating loading and writing responsibilities into distinct components, following the Single Responsibility Principle.

  • Introduces separate loader and writer abstractions with concrete pickle-based implementations
  • Creates a composed PickleBasedKernelRegistry that uses both loader and writer components
  • Converts kernel_registry.py from a class definition to a type variable for generic typing support

Reviewed changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/ai/backend/agent/kernel_registry/kernel_registry.py Replaced KernelRegistry class with TKernelRegistry type variable for generic typing
src/ai/backend/agent/kernel_registry/loader/abc.py Renamed AbstractKernelRegistryRecovery to AbstractKernelRegistryLoader and removed save method, now focused only on loading
src/ai/backend/agent/kernel_registry/loader/pickle_based.py Refactored to PickleBasedKernelRegistryLoader, removed save functionality and simplified initialization with explicit path parameters
src/ai/backend/agent/kernel_registry/writer/abc.py New abstract base class for writer functionality
src/ai/backend/agent/kernel_registry/writer/types.py New data class for save metadata
src/ai/backend/agent/kernel_registry/writer/pickle_based.py New pickle-based writer implementation extracted from the original loader
src/ai/backend/agent/kernel_registry/writer/noop.py New no-op writer implementation for scenarios where persistence is not needed
src/ai/backend/agent/kernel_registry/pickle/kernel_registry.py New composition class that combines loader and writer for complete pickle-based registry operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ai/backend/agent/kernel_registry/pickle/kernel_registry.py Outdated
Comment thread src/ai/backend/agent/kernel_registry/writer/pickle.py Outdated
Comment thread src/ai/backend/agent/kernel_registry/writer/abc.py
Comment thread src/ai/backend/agent/kernel_registry/writer/abc.py
Comment thread src/ai/backend/agent/kernel_registry/loader/abc.py
Comment thread src/ai/backend/agent/kernel_registry/writer/noop.py
Comment thread src/ai/backend/agent/kernel_registry/writer/pickle.py
@fregataa fregataa force-pushed the refactor/divide-loader-writer-in-kernel-registry branch from 2ac0bc5 to b1def96 Compare November 26, 2025 11:17
@fregataa fregataa marked this pull request as ready for review November 26, 2025 11:17
@fregataa fregataa changed the title refactor(BA-3160): Divide loader and writer of kernel registry refactor(BA-3160): Separate loader and writer of Kernel registry recovery Nov 26, 2025
)
try:
with open(final_file_path, "rb") as f:
return pickle.load(f)

Check notice

Code scanning / devskim

Deserializing attacker-supplied data using `pickle` or `cPickle` can result in code execution. Note

Do not deserialize untrusted data.

@abstractmethod
async def load_kernel_registry(self) -> KernelRegistry:
async def load_kernel_registry(self) -> TKernelRegistry:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to make it generic?

Comment on lines +20 to +39
class PickleBasedKernelRegistryRecovery:
def __init__(self, args: PickleBasedKernelRegistryRecoveryArgs) -> None:
registry_file_name = f"kernel_registry.{args.agent_id}.dat"
fallback_registry_file_name = f"kernel_registry.{args.local_instance_id}.dat"
legacy_registry_file_path = args.ipc_base_path / registry_file_name
fallback_registry_file_path = args.var_base_path / fallback_registry_file_name
last_registry_file_path = args.var_base_path / registry_file_name

self._loader = PickleBasedKernelRegistryLoader(
last_registry_file_path, fallback_registry_file_path, legacy_registry_file_path
)
self._writer = PickleBasedKernelRegistryWriter(last_registry_file_path)

async def save_kernel_registry(
self, registry: KernelRegistry, metadata: KernelRegistrySaveMetadata
) -> None:
await self._writer.save_kernel_registry(registry, metadata)

async def load_kernel_registry(self) -> KernelRegistry:
return await self._loader.load_kernel_registry()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. I think we need to create a structure that receives multiple loaders and a structure that receives one writer.

Comment thread src/ai/backend/agent/kernel_registry/writer/pickle.py Outdated

from ai.backend.logging import BraceStyleAdapter

from ...kernel import KernelRegistry

@hhoikoo hhoikoo Nov 26, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KernelRegistry here is not the correct type. KernelRegistry is actually a MutableMapping[AgentKernelRegistryKey, AbstractKernel], where AgentKernelRegistryKey is essentially a named tuple of AgentId and KernelId.
The name of the class is a bit misleading, but this class is actually a global registry of all kernels shared by all agents within the same agent runtime. Individual agents get a "slice" or "view" of the global KernelRegistry, which is an object of type KernelRegistryAgentMapping.
(it is obtained through calling KernelRegistry.agent_mapping(AgentId) and passed to Agent on construction)

There is currently no type alias defined for a Mapping[KernelId, AbstractKernel] or a MutableMapping[KernelId, AbstractKernel]

log = BraceStyleAdapter(logging.getLogger(__spec__.name))


class NoopKernelRegistryWriter(AbstractKernelRegistryWriter[None]):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just remove generics here and just make the no-op writer ignore whatever Mapping object given to it.
Also we could define a loader that also just always returns an empty dictionary {}

@hhoikoo hhoikoo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is not something that should be done in this change, but after this is merged we could/should move the existing KernelRegistry and related classes (KernelRegistryAgentMapping and KernelRegistryGlobalView) currently in src/ai/backend/agent/kernel.py to this submodule

@HyeockJinKim HyeockJinKim added this pull request to the merge queue Nov 26, 2025
Merged via the queue into main with commit 807482d Nov 26, 2025
28 checks passed
@HyeockJinKim HyeockJinKim deleted the refactor/divide-loader-writer-in-kernel-registry branch November 26, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:agent Related to Agent component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor kernel registry by separating loader and writer

5 participants