Skip to content

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

@BigLep

Description

@BigLep

Background

IP Shipyard has been entrusted to steward the ipfs.io gateway. Other leaders in the ecosystem should have the ability to see the health and usage of the ipfs.io gateway. This issue is about defining the minimum graphs needed to give others confidence in the maintained health of the service.

Graphs

Some general requests includes:

  • Provide time ranges 1+ year. Why? The longer context helps highlight small changes over time that can get missed in too short of a time period.
  • Weekly rather than daily. Why? It facilitates looking at longer time horizons. There has been discussion on how to accomplish this in slack thread.

Unique Clients accessing ipfs.io / dweb.link

Current source: https://probelab.io/ipfsgateways/#daily-unique-clients-accessing-ipfsio--dweblink
Snapshot:
gateway-clients-overall
Improvements needed:

image

HTTP Requests to ipfs.io / dweb.link, by region

Current source: https://probelab.io/ipfsgateways/#daily-http-requests-to-ipfsio--dweblink-by-region
Snapshot:
gateway-requests-region
Improvements needed:

image

p95 of TTFB for “200” responses

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1 . I'm also not sure if that value is including "200" responses or all responses.
Existing data: in https://docs.google.com/spreadsheets/d/1qnrAhqt_i5l9m48jge6617XD0hRK4qbTebTxWKhJdV0/edit#gid=1875197224 there is
image. That said, I don't know if that is for "200" responses or all responses.

What's needed:

  • Create new weekly plot for p95 of TTFB for “200” responses. We don't need to combine with existing data.

Response code distribution

For the requests in a given week, we should be able to show how the gateway is responding.

Why:

  1. Catch if there is a deployment issue that is affecting traffic.
  2. Prove the value of certain functionality.

Example looking at the last 7 days:
image

The high 410’s emphasizes the importance of “Badbits”. If we didn’t have it, the majority of requests would be served offering content we don’t want to serve.
If this distribution were ever to change (e.g., “badbits” was disabled) that would be bad and we’d want to see it.

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1

What's needed:

  • Create new weekly plot that shows status code distribution. Maybe use the top 5-10 status codes and bucket the rest as other.

Unique CIDs requested per week

Why: Gives a sense of how much of the content addressable space is being requested through the ipfs.io gateway.

What's needed:

  • weekly plot for the number of top-level / root CIDs requested by clients
  • weekly plot for total number of CIDs fetched by ipfs.io gateway
    • If clients only fetched non-badbit content, then this number would always be larger than the number above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions