DOCS-2675: Document cloud inference (#4322)

nathan-contino · web-flow · commit 80351a425bc1 · 2025-05-21T22:28:44.000-04:00
diff --git a/docs/data-ai/ai/run-inference.md b/docs/data-ai/ai/run-inference.md
@@ -1,6 +1,6 @@
 ---
 linkTitle: "Run inference"
-title: "Run inference on a model"
+title: "Run inference"
 weight: 50
 layout: "docs"
 type: "docs"
@@ -18,57 +18,64 @@ aliases:
 description: "Run inference on a model with a vision service or an SDK."
 ---
 
-After deploying an ml model, you need to configure an additional service to use the inferences the deployed model makes.
-You can run inference on an ML model with a vision service or use an SDK to further process inferences.
+Inference is the process of generating output from a machine learning (ML) model.
+With Viam, you can run inference to generate the following kinds of output:
 
-## Use a vision service
+- object detection (using bounding boxes)
+- classification (using tags)
 
-Vision services work to provide computer vision.
-They use an ML model and apply it to the stream of images from your camera.
+You can run inference locally on a Viam machine, or remotely in the Viam cloud.
 
-{{<resources_svc api="rdk:service:vision" type="vision">}}
-
-{{< readfile "/static/include/create-your-own-mr.md" >}}
-
-Note that many of these services have built in ML models, and thus do not need to be run alongside an ML model service.
+## Machine inference
 
-One vision service you can use to run inference on a camera stream if you have an ML model service configured is the `mlmodel` service.
+You can use `viam-server` to deploy and run ML models directly on your machines.
 
-### Configure an mlmodel vision service
+You can run inference on your machine in the following ways:
 
-Add the `vision / ML model` service to your machine.
-Then, from the **Select model** dropdown, select the name of the ML model service you configured when [deploying](/data-ai/ai/deploy/) your model (for example, `mlmodel-1`).
+- with a vision service
+- manually in application logic with an SDK
 
-**Save** your changes.
+Entry-level devices such as the Raspberry Pi 4 can run small ML models, such as TensorFlow Lite (TFLite).
+More powerful hardware, including the Jetson Xavier or Raspberry Pi 5 with an AI HAT+, can process larger AI models, including Tensorflow and ONNX.
 
-### Test your changes
+{{< tabs >}}
+{{% tab name="Vision service" %}}
 
-You can test a deployed vision service by clicking on the **Test** area of its configuration panel or from the [**CONTROL** page](/manage/troubleshoot/teleoperate/default-interface/#viam-app).
+Vision services apply an ML model to a stream of images from a camera to generate bounding boxes or classifications.
 
-The camera stream shows when the vision service identifies something.
-Try pointing the camera at a scene similar to your training data.
+{{<resources_svc api="rdk:service:vision" type="vision">}}
 
-{{< imgproc src="/tutorials/data-management/blue-star.png" alt="Detected blue star" resize="x200" class="shadow" >}}
-{{< imgproc src="/tutorials/filtered-camera-module/viam-figure-preview.png" alt="Detection of a viam figure with a confidence score of 0.97" resize="x200" class="shadow" >}}
+{{% alert title="Tip" color="tip" %}}
+Some vision services include their own ML models, and thus do not require a deployed ML model.
+If your vision service does not include an ML model, you must [deploy an ML model to your machine](/data-ai/ai/deploy/) to use that service.
+{{% /alert %}}
 
-{{% expand "Want to limit the number of shown classifications or detections? Click here." %}}
+To use a vision service:
 
-If you are seeing a lot of classifications or detections, you can set a minimum confidence threshold.
+1. Visit the **CONFIGURE** page of the Viam app.
+1. Click the **+** icon next to your main machine part and select **Component or service**.
+1. Type in the name of the service and select a vision service.
+1. If your vision service does not include an ML model, [deploy an ML model to your machine](/data-ai/ai/deploy/) to use that service.
+1. Configure the service based on your use case.
+1. To view the deployed vision service, use the live detection feed in the Viam app.
+   The feed shows an overlay of detected objects or classifications on top of a live camera feed.
+   On the **CONFIGURE** or **CONTROL** pages for your machine, expand the **Test** area of the service panel to view the feed.
 
-Start by setting the value to 0.8.
-This reduces your output by filtering out anything below a threshold of 80% confidence.
-You can adjust this attribute as necessary.
+   {{< imgproc src="/tutorials/data-management/blue-star.png" alt="Detected blue star" resize="x200" class="shadow" >}}
+   {{< imgproc src="/tutorials/filtered-camera-module/viam-figure-preview.png" alt="Detection of a viam figure with a confidence score of 0.97" resize="x200" class="shadow" >}}
 
-Click the **Save** button in the top right corner of the page to save your configuration, then close and reopen the **TEST** panel of the vision service configuration panel.
-Now if you reopen the panel, you will only see classifications or detections with a confidence value higher than the `default_minimum_confidence` attribute.
+For instance, you could use [`viam:vision:mlmodel`](/operate/reference/services/vision/mlmodel/) with the `EfficientDet-COCO` ML model to detect a variety of objects, including people, bicycles, and apples, in a camera feed.
 
-{{< /expand>}}
+Alternatively, you could use [`viam-soleng:vision:openalpr`](https://app.viam.com/module/viam-soleng/viamalpr) to detect license plates in images.
+Since this service includes its own ML model, there is no need to configure a separate ML model.
 
-For more detailed information, including optional attribute configuration, see the [`mlmodel` docs](/operate/reference/services/vision/mlmodel/).
+After adding a vision service, you can use a vision service API method with a classifier or a detector to get inferences programmatically.
+For more information, see the APIs for [ML Model](/dev/reference/apis/services/ml/) and [Vision](/dev/reference/apis/services/vision/).
 
-## Use an SDK
+{{% /tab %}}
+{{% tab name="SDK" %}}
 
-You can also run inference using a Viam SDK.
+With the Viam SDK, you can pass image data to an ML model service, read the output annotations, and react to output in your own code.
 Use the [`Infer`](/dev/reference/apis/services/ml/#infer) method of the ML Model API to make inferences.
 
 For example:
@@ -103,10 +110,46 @@ output_tensors, err := myMLModel.Infer(context.Background(), input_tensors)
 {{% /tab %}}
 {{< /tabs >}}
 
-After adding a vision service, you can use a vision service API method with a classifier or a detector to get inferences programmatically.
-For more information, see the ML Model and Vision APIs:
+{{% /tab %}}
+{{< /tabs >}}
+
+## Cloud inference
+
+Cloud inference enables you to run machine learning models in the Viam cloud, instead of on a local machine.
+Cloud inference often provides more computing power than edge devices, so you can benefit from:
+
+- larger, more accurate models
+- faster inference times
+
+You can run cloud inference using any Tensorflow model in the Viam registry, including private models owned by or shared with your organization.
+
+To run cloud inference, you must pass
+
+- the binary data ID and organization of the data you want to run inference on
+- the name, version, and organization of the model you want to use for inference
+
+The [`viam infer`](/dev/tools/cli/#infer) CLI command runs inference in the cloud on a piece of data using the specified ML model:
+
+```sh {class="command-line" data-prompt="$" data-output="2-18"}
+viam infer --binary-data-id <binary-data-id> --model-name <model-name> --model-org-id <org-id-that-owns-model> --model-version "2025-04-14T16-38-25" --org-id <org-id-that-executes-inference>
+Inference Response:
+Output Tensors:
+  Tensor Name: num_detections
+    Shape: [1]
+    Values: [1.0000]
+  Tensor Name: classes
+    Shape: [32 1]
+    Values: [...]
+  Tensor Name: boxes
+    Shape: [32 1 4]
+    Values: [...]
+  Tensor Name: confidence
+    Shape: [32 1]
+    Values: [...]
+Annotations:
+Bounding Box Format: [x_min, y_min, x_max, y_max]
+  No annotations.
+```
 
-{{< cards >}}
-{{< card link="/dev/reference/apis/services/ml/" customTitle="ML Model API" noimage="True" >}}
-{{% card link="/dev/reference/apis/services/vision/" customTitle="Vision service API" noimage="True" %}}
-{{< /cards >}}
+`infer` returns a list of detected classes or bounding boxes depending on the output of the ML model you specified, as well as a list of confidence values for those classes or boxes.
+This method returns bounding box output using proportional coordinates between 0 and 1, with the origin `(0, 0)` in the top left of the image and `(1, 1)` in the bottom right.
diff --git a/docs/dev/tools/cli.md b/docs/dev/tools/cli.md
@@ -544,6 +544,42 @@ done
 | `--resource-name` | Resource name. Sometimes called "component name". | `export tabular` | **Required** |
 | `--resource-subtype` | Resource {{< glossary_tooltip term_id="api-namespace-triplet" text="API namespace triplet" >}}. | `export tabular` | **Required** |
 
+### `infer`
+
+The `infer` command enables you to run [cloud inference](/data-ai/ai/run-inference/#cloud-inference) on data. Cloud inference runs in the cloud, instead of on a local machine.
+
+```sh {class="command-line" data-prompt="$" data-output="2-18"}
+viam infer --binary-data-id <binary-data-id> --model-name <model-name> --model-org-id <org-id-that-owns-model> --model-version "2025-04-14T16-38-25" --org-id <org-id-that-executes-inference>
+Inference Response:
+Output Tensors:
+  Tensor Name: num_detections
+    Shape: [1]
+    Values: [1.0000]
+  Tensor Name: classes
+    Shape: [32 1]
+    Values: [...]
+  Tensor Name: boxes
+    Shape: [32 1 4]
+    Values: [...]
+  Tensor Name: confidence
+    Shape: [32 1]
+    Values: [...]
+Annotations:
+Bounding Box Format: [x_min, y_min, x_max, y_max]
+  No annotations.
+```
+
+#### Named arguments
+
+<!-- prettier-ignore -->
+| Argument | Description | Required? |
+| -------- | ----------- | --------- |
+| `--binary-data-id` | The binary data ID of the image you want to run inference on.  | **Required** |
+| `--model-name` | The name of the model that you want to run in the cloud. | **Required** |
+| `--model-version` | The version of the model that you want to run in the cloud. To find the latest version string for a model, visit the [registry page](https://app.viam.com/registry?type=ML+Model) for that model. You can find the latest version string in the **Version history** section, for instance "2024-02-16T12-55-32". Pass this value as a string, using double quotes. | **Required** |
+| `--org-id` | The organization ID of the organization that will run the inference.  | **Required** |
+| `--model-org-id` | The organization ID of the organization that owns the model. | **Required** |
+
 ### `locations`
 
 The `locations` command allows you to manage the [locations](/manage/reference/organize/) that you have access to.