Skip to content

Reduce artifact footprint for Session Manager use-case #76

@andy-k-improving

Description

@andy-k-improving

Problem statement

A subset of use cases requires generic graph instance connectivity and operational interaction, independent of any NetworkX-based graph representation or algorithmic features.

However, the current package pulls in NetworkX, which adds significant dependency weight. This creates friction in constrained environments—especially AWS Lambda—where:

  • deployment package size matters, and
  • installation time and complexity increase due to the extra dependency footprint.

As a result, use cases that require only graph instance–level operations are forced to install a larger package than necessary. This increases dependency footprint and introduces unnecessary friction in constrained environments such as AWS Lambda, where deployment size and installation complexity are critical.

Proposed solution

To resolve this issue, we will introduce a second distribution artifact (wheel) that contains only the code required for graph instance interactions, with no NetworkX dependency. This provides a smaller runtime footprint while preserving the existing full-featured package.

High-level approach

  • Extract all Graph interaction logic into a standalone module (working name: Neptune Controller / Neptune Connector).
  • Introduce Protocol interfaces where needed to decouple the new module from the existing nx_neptune package.

Implementation considerations

1) No breaking changes to nx_neptune

The refactor may move modules internally, but it must not remove or change existing public functionality:

  • All currently exposed classes and functions must remain available after the change.
  • Package path changes are acceptable only if they are handled via compatibility re-exports (and ideally aligned with a major release boundary).

2) Stable import path across “full” and “lite” installs

User code should import the same symbols regardless of whether they install the full package (nx-neptune) or the lightweight package (neptune-controller).

Example user code:

from neptune_controller import export_csv_to_s3

task_id = await export_csv_to_s3("g-xxx", s3_location_export)

Expectation

  • This code works when only neptune-controller is installed.
  • It also works when only nx-neptune is installed (the full package should include a compatibility layer that provides the same neptune_controller import surface).
  • Users should not need to modify application code to switch between the two distributions. Any adaptation must be handled internally via re-exports or thin wrappers.

List of to-do

The following phases outline the work required to complete the PoC and make the feature production-ready.

Phase 1: Code separation (internal refactor)

Goal

Perform the initial separation of the codebase with minimal user impact (no breaking changes).

Implementation

  • Introduce a GraphContext Protocol to cover all existing usages of NeptuneGraph inside instance_management.py.

  • Create a new Python package (working name: neptune_controller) that contains:

    • instance_management.py
    • its direct dependencies (e.g., connection and task orchestration modules such as task_future.py)
  • Place the new package at the same directory level as nx_neptune (sibling package layout).

  • Update the existing pyproject.toml to include the new package in the build.

  • Update init.py exports and internal imports so all internal usages are routed through the new package path.

Success criteria

  • Users can invoke all existing Instance Management and Session Manager APIs from the existing distribution even if networkx is not installed in the runtime environment (e.g., a fresh venv without NetworkX).

Phase 2: New distribution artifact (standalone package)

Goal
Formalize and publish the new package as a standalone installable artifact, with a clear user guide. No breaking changes expected.

Implementation

  • Define and confirm packaging metadata:

    • package name
    • versioning strategy
    • dependency strategy (shared dependencies vs duplicated vs optional extras) between neptune_controller (lite) and nx_neptune (full)
  • Create a standalone public surface:

    • dedicated neptune_controller/__init__.py for the new package’s audience
  • Restructure tests:

    • move controller-related tests into a dedicated subfolder (e.g., tests/neptune_controller/)
  • Update GitHub workflows:

    • build and publish both artifacts (nx_neptune and neptune_controller)
  • Update documentation:

    • README with installation instructions, usage examples, and migration notes (if any)

Success criteria

  • Users can follow the guide to install neptune_controller and develop applications using it without requiring nx_neptune or networkx.

Phase 3: Validation, cleanup, and GA readiness

Goal
Housekeeping and validation to ensure both packages are GA-ready.

Implementation

  • Validate existing notebooks and examples:

    • confirm all flows relying on instance_management.py continue to work unchanged
  • Confirm there are no circular dependencies at all levels:

    • public exports layer (__init__.py)
    • module import graph (runtime imports)
  • Resolve any circular dependencies using the following direction:

    • neptune_controller → nx_neptune only via lightweight/shared interfaces (e.g., Protocols like GraphContext), or move these interfaces to a shared core module if required
    • nx_neptune → neptune_controller by switching imports to neptune_controller.* where instance management is used

Success criteria

  • nx_neptune continues to work as-is with no breaking changes.
  • neptune_controller is installable and usable as a standalone package.
  • Users who only need instance management can switch from nx_neptune to neptune_controller with minimal or no code changes and no runtime issues.

Reduced dependency footprint

By introducing a standalone Neptune Controller artifact that excludes NetworkX, users can significantly reduce their runtime footprint—especially in constrained environments like AWS Lambda.

A comparison of virtual environment sizes illustrates the impact:

  • Plain venv: ~23 MB
  • With neptune-controller: ~60 MB
  • With nx-neptune: ~72 MB

This reduction improves deployment size, cold-start performance, and operational simplicity for users who only require Graph control and management functionality, while still preserving full feature availability through nx-neptune for advanced graph use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions