[AGX Orin] GroundingDINO Speed is not as advertised

Dear NVIDIA team,

I am currently testing GDINO on the JPS, step-by-step from [this link](https://docs.nvidia.com/jetson/jps/inference-services/gdino.html). However, I get a lot slower inference (2-3 FPS vs 11.6 FPS advertised), am I missing something here?

FYI, I tested it with NVIDIA AGX Orin 64GB. Jetpack R36.4
```
# R36 (release), REVISION: 4.3, GCID: 38968081, BOARD: generic, EABI: aarch64, DATE: Wed Jan  8 01:49:37 UTC 2025
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
```

For debugging purpose, I attach experiment logs and step-by-step on how to produce the results below.

# INFO

Image can be obtained [here](https://www.patriotledger.com/gcdn/authoring/2008/06/27/NPAL/ghows-WL-1dd01bec-6a11-4b67-ba4e-a77dd47fa524-68200e26.jpeg).
Input: jpeg, 800x1600px

Prompt: 
```
{
    "model": "Grounding-Dino",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "car, bike"
                },
                {
                    "type": "media_url",
                    "media_url": {
                        "url": "data:image/jpeg;asset_id,7f1465db-8f52-4641-822d-d6e94f1a63e2"
                    }
                }
            ]
        }
    ],
    "threshold": 0.35
}
```

# Experiment and Log

Here is how I test it, which is the same as in the tutorial link above.
```
Step 1.
curl -X POST "http://localhost:8000/files" -H  "Content-Type: multipart/form-data" -F purpose="vision" -F media_type="image" -F "file=image.jpe
g"

Step 2.
curl -X POST http://localhost:8000/inference -H "Content-Type: application/json" -d @test.json
```

I also did a for-loop test like below:

```
import time
import os

# os.system('curl -X POST "http://localhost:8000/files" -H  "Content-Type: multipart/form-data" -F purpose="vision" -F media_type="image" -F "file=@image.jpeg')

for _ in range(10):
    start = time.time()
    os.system('curl -X POST http://localhost:8000/inference -H "Content-Type: application/json" -d @test.json')
    print(_, f'prompt time: {time.time()-start:.3f}s')
```

And I get these results (2-3FPS)

![Image](https://github.com/user-attachments/assets/c729e8b3-fc05-4a05-8dae-bbc3d4877263)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AGX Orin] GroundingDINO Speed is not as advertised #3

INFO

Experiment and Log

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AGX Orin] GroundingDINO Speed is not as advertised #3

Description

INFO

Experiment and Log

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions