Skip to content

nasa-jpl/visual-perception-engine

Repository files navigation

Visual Perception Engine

We created Visual Percpetion Engine to have one coherent module that will process visual percepts on GPU and output desired insights such as monocular depth. It was designed to be efficient, modular, use constant memory and dynamic such that it can be controlled at runtime. We provide an implementation that in real time (30Hz) computes monocular depth estimation (based on DepthAnythingV2), semantic segmentation and object detections. Furhtermore, the engine was created to be easily configurable and extendable, such that one can add their own model heads into the framework or change the existing ones.

Tip

We also provide C++ ROS2 (Humble) node that can be easily incorporated into existing robotic stacks.

How does it work?

The engine levarages a single powerful vision foundation model (DINOv2 in our case) that computes features and several model heads that use those features to compute desired insights. Whenever possible we converted the models into TensorRT engine for efficiency, however our architecture also supports PyTorch models. Each model (foundation model and individual model heads) are run in separate processes, and send data between each other using custom queues and buffers that allow for sending pointers to tensors on GPU. See figure below for an overview:

Figure 1: Overview of the engine's internal structure

Data transfer

Between all elements of the engine data should be exchanged in a dictionary format using meaningful keys for indetifiability. For example, DINOv2 outputs a tuple of 8 tensors, to which we assign names such as FM_INTERMEDIATE_FEATURES_3, FM_INTERMEDIATE_CLS_TOKEN_3, FM_OUTPUT_FEATURES, FM_OUTPUT_CLS_TOKEN. This way each model head can specify which tensors does it need. The names of all keys used for data transfered should be declared in src/engine/naming_convention.py.

How to use

In this section you will learn how to set up the engine and use it as a black box module.

Note

Visual Perception Engine was developed and tested on Jetson Orin AGX.

First, you will have to clone the repository:

git clone https://github.com/nasa-jpl/visual-perception-engine

If you want to use ROS2 node be careful where do you clone it. For more info see the section on building ROS2 node.

Next, for simplicity of use install the package locally. To do so make sure that you are in the nn_engine/ directory containing pyproject.toml. Then type:

python3 -m pip install .

# Or if you intend to modify the package internals use
python3 -m pip install -e .

Now you can verify that the package was successfully installed by running pip show nn_engine.

Set up docker

For portability we used docker. Our environment is based on great work by Dustin Franklin. First, navigate to the nn_engine/docker directory and then run these commands:

docker compose -f docker-compose.yml build # add `--no-cache` at the end to build from scratch
docker compose -f docker-compose.yml up -d

After these are completed run the following commands inside the container:

sudo git clone https://github.com/NVIDIA-AI-IOT/torch2trt /opt/torch2trt
cd /opt/torch2trt
sudo pip3 install .
cd - # to return to previous directory

Preparing model checkpoints

To run the default version of the engine first you will have to download all the necessary checkpoints from here and place them into nn_engine/models/checkpoints folder. Once there run this command:

python3 -c "import nn_engine; nn_engine.export_default_models()"

This will export all the PyTorch models to TensorRT engines (stored in nn_engine/models/engines directory) and register all the models (i.e. add them to registry file nn_engine/model_registry/registry.jsonl) such that they can be easily loaded into the engine with desired parameters (e.g. precision)

Set CUDA MPS

The engine uses multiple processes each using the same GPU. To make it possible you need to enable CUDA MPS. It is available for Tegra devices from CUDA 12.5. To enable it do:

#As $UID, run the commands

export CUDA_VISIBLE_DEVICES=0 # Select GPU 0.

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps # Select a location that’s accessible to the given $UID

export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Select a location that’s accessible to the given $UID

nvidia-cuda-mps-control -d # Start the daemon.

#This will start the MPS control daemon that will spawn a new MPS Server instance for that $UID starting an application and associate it with GPU visible to the control daemon.

Engine configuration

Engine was made to be easily configurable. You can write your own configuration files based on your needs and use it with the engine. The config should be in json format and follow the schema defined in nn_engine/schemas/nn_engine_config.json. In nn_engine/configs there is already a default.json configuration file which specifies default configuration with 3 model heads.

In each configuration file one needs to specify the name of desired foundation models/model heads as specified in the model registry (cannonical name). Additionally, one can specify an alias that will be used across the engine instead of the lengthy cannonical name. For foundation model, you can specify a preprocessing function that you want to use, and for each model head you can specify a postprocessing function. Lastly, for each model you can specify rate, which is the upperbound on the inference frequency of each model (i.e. model can run slower in unexpected cases but it will not run faster than specified value).

Other parameters:

  • log_to_consol specifies whether the log will be visible in the console
  • logging_level specifies the level of information during logging
  • log_to_file specifies the path to file where the log will be written, if not provided no log will be saved
  • canonical_image_shape_hwc specifies the shape of images used as input to the engine
  • queue_sizes specifies how many slots individual queues and buffers have
  • output_dir is a optional parameter that is used with some internal testing functions

Launching the engine

Once all of the above is completed you can use the engine. Below is an example code snippet:

from nn_engine import Engine

engine = Engine()

# start all the processes and establishes interprocess communication
engine.build()

# send start signal all models are waiting for
engine.start_inference()

# send 3 dummy inputs to test and warmup the engine, HIGHLY RECOMMENDED
was_success: bool = engine.test() 

### engine is READY

Caution

Once you do not need the engine anymore call engine.stop(). This will make sure to correctly close all the child processes. Without it there might be some processes hanging. Furthermore, if the engine crashed (Using CTRL+C is fine), there might be some processes hanging as well.

Now for basic functionality:

# get an image in torch or numpy format
img = torch.zeroes(WIDTH, HEIGHT, N_CHANNELS).to(torch.uint8)

# to insert the engine for processing
was_success: bool = engine.input_image(img_torch) 

# get info about all the models
fm_params: dict = engine.get_foundation_model_params()
mh_params: list[dict] = engine.get_model_heads_params()
# In the parameters you can see what type of output to expect
# Currently it can be either an image or object detection result (labels+scores+normalized_boxes)

# check if the output of head number X is ready
X = 0 # as an example
output: None | np.ndarray | list[np.ndarray] = engine.get_head_output(X)

# to change the firing rate of any model at runtime:
new_rate = 10 #Hz
was_success: bool = change_model_rate("Model_name_to_target", new_rate)

Note

The engine class and its methods can be only called from within the process it was started in. Engine is not designed to be moved across processes.

ROS2 node

If you want to use provided ROS2 (Humble) node, you will have to build it first. To do so you should have a ROS2 workspace directory set up (e.g. ~/ros2_ws), in which you should have src directory containing the source code of all your packages. Ideally, this repository should be in that src directory (e.g. ~/ros2_ws/src/nn_engine). Then, navigate to ROS2 workspace directory and run the following commands:

source /opt/ros/humble/install/setup.bash # set up ROS2 underlay, i.e. be able to use ros2 from command line
colcon build --packages-select nn_engine
source install/setup.bash

Note

The core files for the node can be found in nn_engine/ros_node or nn_engine/include/nn_engine directories.

Usage

Once the package is built you can launch it using:

ros2 launch nn_engine engine_launch.xml

The launch file nn_engine/launch/engine_launch.xml contains several parameters that you can adjust as needed, for example the topic name from which the images should be taken.

The node will create a publisher for every model head listed in the configuration, using alias as a topic name (or cannonical_name if alias was not provided).

The node offers two services: GetModelNames - which returns list of the names of all models inside the engine, and ChangeModelRate- which allows the user to change firing rate of a particular model at runtime.

For developers

In this section you will learn how the engine works internally and how to extend it with your own models.

If you would like to contribute check out CONTRIBUTING.md

Adding new models

  1. Add your implementation to src/model_architectures. Whether it is a backbone or a model head make sure to implement ModelInterfaceBase. Make sure that you implement input_signature and output_signature properties correctly. There are shapes checks throughout the engine so you need to know what kind of inputs and outputs you are expecting.
  2. Add the class name to __all__ in src/model_architectures/__init__.py. Make sure to correctly import the class.
  3. Use ModelExporter to initialize the model. This way it will be converted to correct precision/framework and it will be registered in ModelRegistry, so that it will be easier to use later on.
  4. Once your model is in the model registry you can simply add it to the configuration file and it's ready to use!

Adding new preprocessing/postprocessing functions

The process here is similar to the one for models. However, here you need to implement either AbstractPreprocessing or AbstractPostprocessing, and add the class name to __all__ in src/transforms/__init__.py.

How all of this works internally

All models within the engine (foundation_models and model_heads) implement ModelInterfaceBase to make the tensor exchange between individual models easier. For example, tensors are packed into a dict when moved between different models. This allows model A to output a set of different feature tensors such that model B can flawlessly take a subset of these features. A highlevel functionality forward_annotated is exposed, which takes dict as an input and outputs a dict, allows users to differctly stack models on top of eachother.

However, each model is internally build around forward call which can take arguments (and output results) of any shape their creators' hearts desired. This combined with the fact that ONNX (our implementation first converts models to ONNX and only then to TensorRT) does not support dicts resulted in forward_annotated being a wrapper of forward instead of drop-in replacement as originally planned. Furhtermore, ModelInterfaceBase requires one to implement deannotate_input and annotate_output methods. First function converts the input dict to the format accepted by forward, and the latter converts the output of forward to a dict.

Important: For proper functioning ensure that dict keys match across different models. E.g. make sure that when you implement a new model head its input_signature will have keys that are in output_signature of the foundation model.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published