Skip to content

Increase Matomo Sampling to 100%, Eliminate load for bot request user-agents #11620

@mekarpeles

Description

@mekarpeles

Feature Request

Problem / Opportunity

Right now we're forced to down-sample to 1% of of traffic for anonymized analytics.

We're losing out on the ability to measure how often searches are successful.

Drastically decreasing matomo/athena loads for bad bot traffic so we can turn matomo sampling to 100%

Proposal

  1. First, we will need to update our nginx js rules to flag/label requests we want to exclude.
  2. We will use this label + known user-agents to avoid loading matomo + athena.js
  3. We will update matomo rates with @scottbarnes so 100% of traffic is samples.

Breakdown

Related files

Refer to this map of common Endpoints:

is_bot from https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/openlibrary/code.py#L1321-L1348 is already a web.ctx global

We do something similar, i.e. $if not is_bot(): in

$if not is_bot():
$:render_template('site/donation_banner')
.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Lead: @mekarpelesIssues overseen by Mek (Staff: Program Lead) [managed]Needs: BreakdownThis big issue needs a checklist or subissues to describe a breakdown of work. [managed]Needs: Staff / InternalReviewed a PR but don't have merge powers? Use this.Priority: 2Important, as time permits. [managed]Type: Feature RequestIssue describes a feature or enhancement we'd like to implement. [managed]

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions