Image Renderer Fails with Timeout & Retries on 60s Interval w/ InfluxDB Datasources

**What happened**:
I have recently setup the docker version of the grafana image renderer. This docker service is a part of a larger compose setup that includes both InfluxDB and Grafana. I have the services talking to each other and can successfully download images via curl and other methods. 

However, I am seeing pretty sporadic latency issues that I believe I've traced back to the renderer retrying after an initial failed network request. 

More often than not, I am able to download the images in about ~2s, however, every 2-5 requests, the images take _exactly_ ~61-62s. 

```bash
(base) [user@server grafana]$ curl -L "http://[URL]" -o tmp.png
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 52310  100 52310    0     0  31474      0  0:00:01  0:00:01 --:--:-- 31474
(base) [user@server grafana]$ curl -L "http://[URL]" -o tmp.png
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11196  100 11196    0     0   6397      0  0:00:01  0:00:01 --:--:--  6401
(base) [user@server grafana]$ curl -L "http://[URL]" -o tmp.png
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11187  100 11187    0     0    181      0  0:01:01  0:01:01 --:--:--  3199
(base) [user@server grafana]$ curl -L "http://admin:admin@localhost:3001/render/d-solo/cemi669r5v5s0f?orgId=1&from=now()&to=now()-1m&var-Drainages=Carbon&panelId=panel-67&&width=400&height=300&tz=UTC&theme=light" -o tmp.png
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10246  100 10246    0     0    165      0  0:01:02  0:01:01  0:00:01  2180
```

In the renderer logs, I get the following _only when I experience the 60s+ requests_:

```
renderer  | {"failure":"net::ERR_ABORTED","level":"error","message":"Browser request failed","method":"POST","url":"http://grafana:3000/api/ds/query?ds_type=influxdb&requestId=SQR100"}
renderer  | {"err":"TimeoutError: Waiting failed: 60000ms exceeded\n    at new WaitTask (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:50:34)\n    at IsolatedWorld.waitForFunction (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Realm.js:25:26)\n    at CdpFrame.waitForFunction (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Frame.js:561:43)\n    at CdpFrame.<anonymous> (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/util/decorators.js:98:27)\n    at CdpPage.waitForFunction (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:1366:37)\n    at waitForQueriesAndVisualizations (/home/nonroot/build/browser/browser.js:595:16)\n    at /home/nonroot/build/browser/browser.js:375:19\n    at callback (/home/nonroot/build/browser/browser.js:546:34)\n    at ClusteredBrowser.withMonitoring (/home/nonroot/build/browser/browser.js:553:16)\n    at ClusteredBrowser.performStep (/home/nonroot/build/browser/browser.js:509:36)","level":"error","message":"Error while performing step","step":"panelsRendered","url":"http://grafana:3000/d-solo/[URL....]"}
```

The reason I think there is a retry involved is because I always successfully download the image after the 60s timeout happens. This also only seems to happen with the Influx queries. I tried replicating the issue with a prometheus backend and I never hit the issue. To be clear, I don't experience delays when running the influx query in grafana directly. 

**What you expected to happen**:
I'd expect consistent download times. It feels a bit like this could be resolved by just shortening the retry period. 

**How to reproduce it (as minimally and precisely as possible)**:
I simply retry the same image rendering requests. 

**Anything else we need to know?**:

**Environment**:
- Grafana Image Renderer version: latest (4.x.x+)
- Grafana version: latest (12.x.x+)
- Installed plugin or remote renderer service: remote
- OS Grafana Image Renderer is installed on: default docker OS
- User OS & Browser: RHEL
- Others: 
- - Influxdb v2.7. 
- - I've included the compose setup below for grafana and renderer

```yaml
  renderer:
    image: grafana/grafana-image-renderer:latest
    container_name: renderer
    shm_size: 1g
    environment: 
      - AUTH_TOKEN=test-token
      - RENDERING_MODE=clustered
      - RENDERING_CLUSTERING_TIMEOUT=600
      - RENDERING_VIEWPORT_MAX_WIDTH=3000
      - RENDERING_VIEWPORT_MAX_HEIGHT=3000
      - ENABLE_METRICS=true
      - RENDERING_TIMING_METRICS=true
      # Try timeout
      - LOG_LEVEL=debug
    ports:
      - "8081:8081"
    networks: 
      - test_network

  grafana:
    build:
      context: ./grafana
      dockerfile: Dockerfile
    image: grafana:latest
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=test
      - GF_SECURITY_ADMIN_PASSWORD=test
      - GF_SERVER_DOMAIN=grafana
      - GF_SERVER_ROOT_URL=http://grafana:3000/
      - GF_RENDERING_CALLBACK_URL=http://grafana:3000/
      - GF_RENDERING_SERVER_URL=http://renderer:8081/render
      - GF_RENDERING_RENDERER_TOKEN=test-token
      - GF_RENDERING_RENDERING_TIMEOUT=30
    ports:
      - "3001:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - influxdb
      - renderer
    networks:
      - test_network
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image Renderer Fails with Timeout & Retries on 60s Interval w/ InfluxDB Datasources #785

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Image Renderer Fails with Timeout & Retries on 60s Interval w/ InfluxDB Datasources #785

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions