Skip to content

[Compiled Model API] Does it support the dynamic size of the input tensor shape on NPU? #6085

@leeroyka

Description

@leeroyka

LiteRT v2.1.1 CompiledModel API
C++ Android
NPU with Qualcomm backend

--

Based on the QNN documentation, there is support for a variable batch for the input tensor. But when trying to create a buffer for a batch other than 1, CreateInputBuffers returns an error.

Example:
model in int8 with input tensor [-1, 224, 224, 3].

    std::vector<litert::Environment::Option> environment_options;
    environment_options.push_back(litert::Environment::Option{
        litert::Environment::OptionTag::DispatchLibraryDir,
        absl::string_view(lib_path),
    });
    environment_options.push_back(litert::Environment::Option{
        litert::Environment::OptionTag::CompilerPluginLibraryDir,
        absl::string_view(lib_path),
    });
    setenv("ADSP_LIBRARY_PATH", lib_path, 1);
    LITERT_ASSIGN_OR_ABORT(auto envPtr, litert::Environment::Create(std::move(environment_options)));
    
    LITERT_ASSIGN_OR_ABORT(
        litert::Options options, litert::Options::Create()
    );
    options.SetHardwareAccelerators(
        litert::HwAccelerators::kNpu
    );
    
    LITERT_ASSIGN_OR_ABORT(
        auto& qnn_opts, options.GetQualcommOptions()
    );
    qnn_opts.SetLogLevel(litert::qualcomm::QualcommOptions::LogLevel::kVerbose);
    qnn_opts.SetHtpPerformanceMode(
        litert::qualcomm::QualcommOptions::HtpPerformanceMode::kBurst
    );
    qnn_opts.SetUseHtpPreference(true);
    LITERT_ASSIGN_OR_ABORT(
        auto compiled_model,
        litert::CompiledModel::Create(
            envPtr,
            litert::BufferRef<uint8_t>(
                reinterpret_cast<uint8_t*>(buffer),
                                       size
            ),
            options
        )
    );
    
    std::vector<int> shape = {5, 224, 224, 3}; // no error with [1, 224, 224, 3]
    LITERT_ABORT_IF_ERROR(compiled_model.ResizeInputTensor(0, shape));
    LITERT_ASSIGN_OR_ABORT(
        auto input_buffers,
        compiled_model.CreateInputBuffers() // < error here
    );
    LITERT_ASSIGN_OR_ABORT(
        auto output_buffers,
        compiled_model.CreateOutputBuffers()
    );

logcat:

13:41:58.270 qnn                 I  INFO: [Qnn] Set HTP performance mode: 2
13:41:58.272 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1934: manage_poll_qos: poll mode updated to 3 for domain 3, handle 0xb400007440c1e640 for timeout 9999
13:41:58.272 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2454: remote_handle_control_domain: requested QOS 3, latency 9999 for domain 3 handle 0xb400007440c1e640
13:41:58.272 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_latency.c:97: fastrpc_latency_thread_handler started for QoS with activity window 100 ms
13:41:58.272 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2454: remote_handle_control_domain: requested QOS 1, latency 0 for domain 3 handle 0xb400007440c1e640
13:41:58.272 litert              I  [qnn_compiler_plugin.cc:286] QNN manager created
13:41:58.425 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FC -> CONV2D
13:41:58.425 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FAILURE: Unsupported Input
13:41:58.425 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FC -> CONV2D
13:41:58.425 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FAILURE: Unsupported Input
13:41:58.426 litert              I  [compiler_plugin.cc:472] Partition strategy: 0
13:41:58.427 litert              I  [model.cc:588] DCE removed 192 ops, 328 tensors
13:41:58.427 litert              I  [compiler_plugin.cc:645] Partitioned subgraph<0>, selected 193 ops, from a total of 193 ops. resulted in 1 partitions.
13:41:58.427 litert              I  [qnn_compiler_plugin.cc:349] Starting QNN Compilation for 1 subgraphs, soc_model=(null)
13:41:58.427 litert              I  [qnn_compiler_plugin.cc:414] Creating context handle
13:41:58.427 litert              I  [qnn_compiler_plugin.cc:451] Context handle created
13:41:58.427 litert              I  [qnn_compiler_plugin.cc:466] Composing graph
13:41:58.427 litert              I  [qnn_compiler_plugin.cc:470] Entry point name: qnn_partition_0
13:41:58.585 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FC -> CONV2D
13:41:58.585 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FAILURE: Unsupported Input
13:41:58.585 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FC -> CONV2D
13:41:58.585 qnn                 I  INFO: [Qnn] [FullyConnected Optimization] FAILURE: Unsupported Input
13:41:59.194 litert              I  [qnn_compiler_plugin.cc:476] Graph composed
13:41:59.194 litert              I  [qnn_compiler_plugin.cc:489] Generating context binary
13:41:59.206 litert              I  [qnn_manager.cc:316] Serialized a context bin of size (bytes): 5132288
13:41:59.206 litert              I  [qnn_compiler_plugin.cc:492] Context binary 0 generated
13:41:59.221 litert              I  [compiled_model.cc:406] 1 compiler plugins were applied successfully: Qualcomm compiler plugin (ver 0.1.0)
13:41:59.222 litert              W  [compiled_model.cc:408] Plugin errs:
13:41:59.222 litert              I  [compiled_model.cc:716] JIT compilation changed model, reserializing...
13:41:59.223 tflite              I  Initialized TensorFlow Lite runtime.
13:41:59.223 litert              I  [dynamic_loading.cc:83] Found shared library: /data/app/~~HY11Ung6C_hMISWvTa2LGA==/com.example.shelf_recognition-h7tx2ccETTHuTSiXeqC95g==/lib/arm64/libLiteRtDispatch_Qualcomm.so
13:41:59.223 litert              I  [litert_dispatch.cc:126] Loading shared library: /data/app/~~HY11Ung6C_hMISWvTa2LGA==/com.example.shelf_recognition-h7tx2ccETTHuTSiXeqC95g==/lib/arm64/libLiteRtDispatch_Qualcomm.so
13:41:59.231 litert              I  [common.h:135]
                                    ::qnn::Options:
                                    LogLevel: 4
                                    Profiling: 0
                                    UseHtpPreference: true
                                    UseQint16AsQuint16: false
                                    EnableWeightSharing: false
                                    UseConvHMX: true
                                    UseFoldReLU: true
                                    HtpPerformanceMode: 2
                                    DumpTensorIds:
                                    IrJsonDir:
                                    DlcDir:
                                    VtcmSize: 0
                                    HvxThread: 0
                                    OptimizationLevel: 2
13:41:59.231 litert              I  [qnn_manager.cc:358] Adding shared library dir to path: /data/app/~~HY11Ung6C_hMISWvTa2LGA==/com.example.shelf_recognition-h7tx2ccETTHuTSiXeqC95g==/lib/arm64
13:41:59.231 litert              I  [dynamic_loading.cc:143] Adding /data/app/~~HY11Ung6C_hMISWvTa2LGA==/com.example.shelf_recognition-h7tx2ccETTHuTSiXeqC95g==/lib/arm64 to LD_LIBRARY_PATH
13:41:59.231 litert              I  [qnn_manager.cc:117] Loading qnn shared library from "libQnnHtp.so"
13:41:59.232 litert              I  [qnn_manager.cc:119] Loaded qnn shared library
13:41:59.232 qnn                 E  ERROR: [Qnn] Failed to find available SoC!
13:41:59.232 qnn                 I  INFO: [Qnn] Succssfully get platform info. SoC model: 68. SoC name: NotFound.
13:41:59.232 qnn                 W  WARNING: [Qnn] Fail to get SoC info, using default.
13:41:59.232 qnn                 I  INFO: [Qnn] Initializing QNN backend for SoC model: SM8550
13:41:59.232 qnn                 I  INFO: [Qnn] Set HTP performance mode: 2
13:41:59.232 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1934: manage_poll_qos: poll mode updated to 3 for domain 3, handle 0xb400007440c1e640 for timeout 9999
13:41:59.233 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2454: remote_handle_control_domain: requested QOS 3, latency 9999 for domain 3 handle 0xb400007440c1e640
13:41:59.233 com.exam...gnition  I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2454: remote_handle_control_domain: requested QOS 1, latency 0 for domain 3 handle 0xb400007440c1e640
13:41:59.233 litert              I  [dispatch_delegate.cc:172] Dispatch API vendor ID: Qualcomm
13:41:59.233 litert              I  [dispatch_delegate.cc:176] Dispatch API build ID: Qualcomm Dispatch API version 0.1.0, QNN API version 2.31.0, build id: v2.41.0.251128145156_191518
13:41:59.233 litert              I  [dispatch_delegate.cc:181] Dispatch API version: 0.1.0
13:41:59.233 litert              I  [dispatch_delegate.cc:192] Dispatch API capabilities: 1
13:41:59.233 tflite              I  Replacing 1 out of 1 node(s) with delegate (DispatchDelegate) node, yielding 1 partitions for subgraph 0.
13:41:59.233 litert              I  [context_binary_info.cc:110] Found qnn graph: qnn_partition_0
13:41:59.261 litert              E  ERROR: [/home/blokhin/AndroidStudioProjects/shelf_recognition/app/src/main/cpp/native-lib.cpp:206]
                                    └ ERROR: [/home/blokhin/AndroidStudioProjects/shelf_recognition/app/src/main/vendor/litert_cc_sdk/litert/cc/litert_compiled_model.cc:139]
                                    └ ERROR: [/home/blokhin/AndroidStudioProjects/shelf_recognition/app/src/main/vendor/litert_cc_sdk/litert/cc/litert_compiled_model.cc:82]
                                    └ ERROR: [/home/blokhin/AndroidStudioProjects/shelf_recognition/app/src/main/vendor/litert_cc_sdk/litert/cc/litert_tensor_buffer.cc:53]
13:41:59.261 libc                A  Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 4405 (elf_recognition), pid 4405 (elf_recognition)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions