Implement AI-image-reader plugin #1925

kai2321 · 2025-03-19T02:15:56Z

Ⅰ. Describe what this PR did

Implement AI-image-reader plugin

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

docker-compose.yaml

version: '3.7'
services:
  envoy:
    image: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/gateway:v2.0.2
    entrypoint: /usr/local/bin/envoy
    # 开启了 debug 级别日志方便调试
    command: -c /etc/envoy/envoy.yaml --component-log-level wasm:debug
    depends_on:
      - echo-server
    networks:
      - higress-net
    ports:
      - "10000:10000"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./plugin.wasm:/etc/envoy/plugin.wasm
      - ./dify.wasm:/etc/envoy/dify.wasm
  echo-server:
    image: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/echo-server:1.3.0
    networks:
      - higress-net
    ports:
      - "3000:3000"
networks:
  higress-net: {}

envoy.yaml

# File generated by hgctl. Modify as required.
admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                scheme_header_transformation:
                  scheme_to_overwrite: https
                stat_prefix: ingress_http
                # Output envoy logs to stdout
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                # Modify as required
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: [ "*" ]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: dify
                            timeout: 300s
                http_filters:
                  - name: imagereader
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: imagereader
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/plugin.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                "apiKey": "***************",
                                "serviceName": "dashscope",
                                "servicePort": "443"
                              }
                  - name: wasmtest
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: wasmtest
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/dify.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                "provider": {
                                  "type": "dify",                                
                                  "apiTokens": [
                                    "************"
                                  ],
                                  "modelMapping": {
                                     "*": "dify"
                                  },
                                  "botType": "Chat"
                                }
                              }
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
    - name: dify
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: dify
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: api.dify.ai
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "api.dify.ai"
    - name: outbound|443||dashscope
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: outbound|443||dashscope
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: dashscope.aliyuncs.com
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "dashscope.aliyuncs.com"

测试请求：

curl -X POST 'http://localhost:10000/v1/chat/completions'   -H 'Content-Type: application/json'   -d '{
    "model": "gpt-4-turbo",
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "图片内容是什么？" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
                    },
                },
            ],
        }
    ],
    "stream": false
}'

响应：

{
    "id": "3f06c905-8b99-4152-a53e-b29f5b76cd28",
    "choices": [
    {
        "index": 0,
        "message": {
            "role": "assistant",
            "content": "我看到你给我的图片内容，里面提到了如果是Linux环境下的系统管理员，学会编写shell脚本会让你受益匪浅。书中也提到了如何自动化处理系统管理任务，包括监测系统统计数据、生成报表等。同时，对于家用Linux爱好者，也能从书中获益，学习如何在命令行下执行任务。看起来这本书对于Linux用户来说是非常有用的资源。"
        },
        "finish_reason": "stop"
    }
    ],
    "created": 1742349993,
    "model": "dify",
    "object": "chat.completion",
        "usage": {
        "prompt_tokens": 491,
        "completion_tokens": 136,
        "total_tokens": 627
    }
}

Ⅴ. Special notes for reviews

codecov-commenter · 2025-03-19T02:19:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.02%. Comparing base (ef31e09) to head (e082f00).
Report is 583 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1925       +/-   ##
===========================================
+ Coverage   35.91%   46.02%   +10.11%     
===========================================
  Files          69       81       +12     
  Lines       11576    13020     +1444     
===========================================
+ Hits         4157     5993     +1836     
+ Misses       7104     6681      -423     
- Partials      315      346       +31

see 78 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kai2321 · 2025-03-19T02:45:45Z

PTAL, Thanks @johnlanni

plugins/wasm-go/extensions/ai-image-reader/README.md

kai2321 · 2025-03-24T07:31:18Z

PTAL, thanks @johnlanni

johnlanni

This plugin should not be tightly coupled with the Qwen model; it should be designed with an abstraction layer, treating the Qwen model as one implementation, and also support other OCR models like Mistral

kai2321 · 2025-04-25T16:09:59Z

redesigned, PTAL thanks. @johnlanni

kai2321 · 2025-05-20T15:00:23Z

to be reviewed

plugins/wasm-go/extensions/ai-image-reader/README.md

plugins/wasm-go/extensions/ai-image-reader/ocr/qwen/qwen.go

plugins/wasm-go/extensions/ai-image-reader/ocr/types.go

plugins/wasm-go/extensions/ai-image-reader/README_EN.md

plugins/wasm-go/extensions/ai-image-reader/main.go

plugins/wasm-go/extensions/ai-image-reader/provider.go

CH3CHO

LGTM

johnlanni

LGTM

Implement AI-image-reader plugin

e15db36

kai2321 requested review from johnlanni, CH3CHO and rinfx as code owners March 19, 2025 02:15

CH3CHO reviewed Mar 19, 2025

View reviewed changes

plugins/wasm-go/extensions/ai-image-reader/README.md Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-image-reader/README.md Show resolved Hide resolved

plugins/wasm-go/extensions/ai-image-reader/README.md Show resolved Hide resolved

Implement AI-image-reader plugin

7f36eba

johnlanni requested changes Apr 18, 2025

View reviewed changes

redesigned with an abstraction layer

28291d4

johnlanni requested changes May 26, 2025

View reviewed changes

plugins/wasm-go/extensions/ai-image-reader/README.md Outdated Show resolved Hide resolved

kai2321 added 2 commits June 4, 2025 20:47

add provider field

83e831e

add provider field

a60d6eb

johnlanni requested changes Jun 5, 2025

View reviewed changes

plugins/wasm-go/extensions/ai-image-reader/README.md Outdated Show resolved Hide resolved

add readme-en

d39474a

CH3CHO reviewed Jun 17, 2025

View reviewed changes

redesign the interface

448262d

CH3CHO reviewed Jun 24, 2025

View reviewed changes

plugins/wasm-go/extensions/ai-image-reader/README_EN.md Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-image-reader/README_EN.md Outdated Show resolved Hide resolved

CH3CHO reviewed Jun 24, 2025

View reviewed changes

change to new plugin format

e082f00

CH3CHO reviewed Jun 25, 2025

View reviewed changes

johnlanni approved these changes Jun 25, 2025

View reviewed changes

johnlanni merged commit 1fe5eb6 into alibaba:main Jun 25, 2025
12 of 21 checks passed

Implement AI-image-reader plugin #1925

Implement AI-image-reader plugin #1925

Uh oh!

Conversation

kai2321 commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Uh oh!

codecov-commenter commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kai2321 commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kai2321 commented Mar 24, 2025

Uh oh!

johnlanni left a comment

Choose a reason for hiding this comment

Uh oh!

kai2321 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kai2321 commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CH3CHO left a comment

Choose a reason for hiding this comment

Uh oh!

johnlanni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kai2321 commented Mar 19, 2025 •

edited

Loading

codecov-commenter commented Mar 19, 2025 •

edited

Loading

kai2321 commented Apr 25, 2025 •

edited

Loading