Skip to content

Conversation

@JadenFiotto-Kaufman
Copy link
Member

No description provided.

JadenFiotto-Kaufman and others added 11 commits January 12, 2026 19:39
refactor(config): streamline environment variable handling in AppConfigModel

- Simplified the logic for setting API key and host from environment variables and Colab userdata.
…nd responses

- Added zstandard as a dependency for improved data compression.
- Updated configuration to replace ZLIB with a COMPRESS flag for better clarity.
- Refactored serialization and deserialization methods to support zstandard compression.
- Enhanced RemoteBackend to utilize the new compression method during data transmission.
- Updated InterleavingTracer to set the source for each mediator's intervention.
- Improved NNsight and LanguageModel classes with custom __getstate__ and __setstate__ methods for better serialization of the model and tokenizer.
- Enhanced RemoteableMixin to include persistent objects in the remoteable model key.
…ncy with pickle API

- Updated serialization methods to match the standard pickle module naming conventions.
- Enhanced error handling in make_function and CustomCloudUnpickler for better robustness.
- Improved documentation and examples to reflect the new method names and added support for pathlib.Path in file handling.
… NDIF servers

- Introduced RemoteBackend class for managing remote job execution, including HTTP submission and WebSocket status updates.
- Enhanced JobStatusDisplay for improved user feedback in terminal and Jupyter environments.
- Added support for both blocking and non-blocking execution modes, allowing flexible job handling.
- Updated documentation to reflect new features and usage examples for remote execution.
…IF setups

- Introduced NdifEnvComparison class to compare local and remote Python environments, including package versions and Python version.
- Implemented methods to fetch local environment data and request remote environment data from the NDIF API.
- Enhanced output with a formatted comparison table highlighting version differences, especially for critical packages.
- Updated ndif_status function to improve clarity and maintainability.
- Added a custom exception hook to provide clean tracebacks for NNsight exceptions, enhancing error visibility in non-debug modes.
- Integrated support for IPython to handle exceptions with rich formatting.
- Introduced ExceptionWrapper class to encapsulate exceptions with additional context and improved traceback formatting.
- Enhanced the print_rich method for better output in terminal environments, utilizing syntax highlighting when possible.
- Refactored the ExceptionWrapper class to improve traceback collection and formatting.
- Introduced a new method `_collect_frames` to centralize frame collection logic, enhancing maintainability.
- Updated `format_traceback` and `print_rich` methods to utilize the new frame collection method, ensuring consistent output.
- Improved handling of internal NNsight frames and added support for rich formatting in terminal outputs.
@greptile-apps
Copy link

greptile-apps bot commented Jan 17, 2026

Greptile Summary

This PR implements v0.5.16 with major enhancements to remote execution on NDIF servers. The changes focus on improving cross-version compatibility, user experience, and robustness.

Key Changes:

  • Source-based serialization: Replaced bytecode serialization with source code serialization to enable cross-Python-version compatibility between client and server (e.g., Python 3.10 client with Python 3.11 server)
  • Compression upgrade: Migrated from zlib to zstandard for better compression ratios and performance in API requests/responses
  • Environment comparison: Added compare() function to help users identify version mismatches between local and remote environments, highlighting critical packages like nnsight, transformers, and torch
  • Enhanced error handling: Implemented ExceptionWrapper with rich traceback formatting and proper source code context for traced code
  • Improved remote backend: Complete refactor with WebSocket support for real-time status updates, animated spinners, elapsed time tracking, and hybrid local/remote execution via streaming
  • Better pickling support: Added __getstate__/__setstate__ methods to NNsight base class for proper serialization

Code Quality:

  • Extensive documentation with detailed docstrings throughout
  • Well-structured architecture with clear separation of concerns
  • Proper error handling and graceful degradation

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The code is well-documented, follows best practices, and implements significant improvements to the remote execution infrastructure. The changes are substantial but well-structured, with proper error handling and graceful fallbacks. No logical errors, security issues, or breaking changes were identified.
  • No files require special attention

Important Files Changed

Filename Overview
src/nnsight/intervention/backends/remote.py Major refactor implementing remote NDIF backend with WebSocket support, status display, and comprehensive error handling. Code is well-documented with detailed docstrings.
src/nnsight/intervention/serialization.py Complete rewrite replacing bytecode-based serialization with source-based serialization for cross-Python-version compatibility. Well-documented implementation using cloudpickle.
src/nnsight/intervention/tracing/util.py Enhanced exception handling with rich traceback formatting, better frame reconstruction logic, and support for displaying traced code context. Refactored ExceptionWrapper for cleaner output.
src/nnsight/ndif.py Added environment comparison feature (NdifEnvComparison class and compare() function) to check local vs remote package versions. Improved status checking with better formatting.
src/nnsight/schema/request.py Updated compression from zlib to zstandard, added source code attachment to interventions, improved deserialization to support persistent objects.
src/nnsight/init.py Added linecache preloading for better traceback support, integrated custom exception hooks for both standard Python and IPython environments to display clean NNsight exceptions.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Local)
    participant Tracer as Tracer
    participant Backend as RemoteBackend
    participant Serializer as CustomCloudPickler
    participant HTTP as NDIF HTTP API
    participant WS as WebSocket
    participant Server as NDIF Server
    participant LocalTracer as LocalTracer

    Client->>Tracer: trace(input, remote=True)
    Tracer->>Backend: __call__(tracer)
    Backend->>Serializer: dumps(RequestModel)
    Note over Serializer: Serializes functions<br/>by source code
    Serializer-->>Backend: serialized bytes
    Backend->>Backend: compress with zstandard
    Backend->>WS: connect()
    WS-->>Backend: session_id
    Backend->>HTTP: POST /request (data, headers)
    HTTP-->>Backend: job_id
    Backend->>Backend: JobStatusDisplay.update("RECEIVED")
    
    loop Status Updates
        Server->>WS: emit(ResponseModel)
        WS-->>Backend: status update
        Backend->>Backend: JobStatusDisplay.update(status)
        alt Status: STREAM
            Note over Backend: Server requests<br/>local execution
            Backend->>LocalTracer: execute(streamed_fn)
            LocalTracer->>LocalTracer: run locally
            LocalTracer-->>Backend: local_values
            Backend->>WS: emit("stream_upload", values)
        end
    end
    
    Server->>WS: emit("COMPLETED", result_url)
    Backend->>Backend: JobStatusDisplay.update("COMPLETED")
    Backend->>HTTP: GET result_url
    HTTP-->>Backend: compressed result bytes
    Backend->>Backend: decompress with zstandard
    Backend->>Backend: torch.load(result)
    Backend-->>Tracer: result
    Tracer-->>Client: traced values
Loading

@greptile-apps
Copy link

greptile-apps bot commented Jan 17, 2026

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

…NDIF

- Introduced a new `register` function to enable local module serialization by value for remote execution.
- This function wraps `cloudpickle.register_pickle_by_value`, allowing local modules to be sent along with requests to avoid `ModuleNotFoundError`.
- Added detailed docstring with usage examples to guide users on registering modules before remote execution.
…sDisplay

- Enhanced the `_format_time` method to provide more precise elapsed time formatting based on verbosity.
- Adjusted timeout settings in the `RemoteBackend` class to optimize spinner animation updates, differentiating between verbose and non-verbose modes.
- Ensured consistent time representation for elapsed seconds in both verbose and non-verbose modes.
…aceModel

- Introduced an ID_CACHE dictionary to store model IDs, reducing redundant API calls.
- Updated the _remoteable_model_key method to check the cache before fetching model info from HfApi, avoiding rate limits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants