Skip to content

Telemetry#5381

Draft
gmarcon wants to merge 11 commits into
fisharebest:mainfrom
gmarcon:telemetry
Draft

Telemetry#5381
gmarcon wants to merge 11 commits into
fisharebest:mainfrom
gmarcon:telemetry

Conversation

@gmarcon
Copy link
Copy Markdown
Contributor

@gmarcon gmarcon commented May 22, 2026

This draft pull request addresses the proposal for a core telemetry system as discussed in #5377.

It implements an entirely opt-in/on-demand flow to collect anonymous configuration and usage metrics, helping development focus on real-world usage patterns

Key Features Implemented:

  • Opt-In Banner: Added a message and a "Send data" CTA in the Control Panel underneath the version check block.
  • Transparency Preview Page: Clicking the CTA routes the admin to a preview page showing the exact assembled JSON payload before any data is transmitted.
  • Resilient Database Setup: Built with flexibility in mind using a single PostgreSQL JSONB format on the backend, meaning we can add or drop metrics in the future without needing database schema migrations.

Sample payload structure

{
    "p_site_uuid": "removed",
    "p_metrics": {
        "php_version": "8.3.28",
        "php_memory_limit": "512M",
        "db_type": "mysql",
        "db_version": "8.4.3",
        "os_type": "Windows NT",
        "webtrees_version": "2.2.7-dev",
        "default_language": "en-US",
        "changes_count": 47129,
        "trees": [
            {
                "individuals_count": 6349,
                "families_count": 2343,
                "sources_count": 104,
                "repositories_count": 0,
                "media_count": 3957,
                "notes_count": 0,
                "places_count": 759,
                "user_permissions_count": {
                    "access": 5,
                    "edit": 19,
                    "admin": 1
                }
            }
        ],
        "default_theme": "_jc-theme-justlight_",
        "users_count": 25,
        "user_settings": [
            {
                "language": "it",
                "last_login_age_days": 1515,
                "account_age_days": 1565
            },
            {
                "language": "de",
                "last_login_age_days": 460,
                "account_age_days": 1776
            },
            {
                "language": "en-US",
                "last_login_age_days": 0,
                "account_age_days": 7047
            },
            {
                "language": "en-US",
                "last_login_age_days": -1,
                "account_age_days": 7047
            },
            ...
        ],
        "enabled_standard_modules": [
            "ahnentafel_report",
            ...
        ],
        "custom_modules": [
            "JustCarmen\\Webtrees\\Module\\JustlightTheme\\JustlightTheme",
            "JustCarmen\\Webtrees\\Module\\FancyTreeview\\FancyTreeviewModule",
            ...
        ]
    }
}

Database

Created on a dedicated organization/project in Supabase with the following table and function:

CREATE TABLE webtrees_telemetry (
    site_uuid TEXT PRIMARY KEY,
    metrics JSONB NOT NULL,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL
);
-- Protect the table from direct public read/write access
ALTER TABLE webtrees_telemetry ENABLE ROW LEVEL SECURITY;

CREATE OR REPLACE FUNCTION submit_webtrees_telemetry(p_site_uuid TEXT, p_metrics JSONB)
RETURNS VOID AS $$
BEGIN
  -- Upsert the site data. One site uuid = exactly one row of truth.
  INSERT INTO webtrees_telemetry (site_uuid, metrics, updated_at)
  VALUES (p_site_uuid, p_metrics, now())
  ON CONFLICT (site_uuid) 
  DO UPDATE SET metrics = EXCLUDED.metrics, updated_at = now();
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

These are some sample queries that could be run on the sample JSON structure implemented (non existing fields in the JSON are nulled):

-- Total individuals:
SELECT SUM((tree->>'individuals_count')::bigint) AS total_individuals FROM webtrees_telemetry,      jsonb_array_elements(metrics->'trees') AS tree WHERE updated_at > now() - INTERVAL '90 days';
-- Biggest tree:
SELECT MAX((tree->>'individuals_count')::bigint) AS total_individuals FROM webtrees_telemetry,      jsonb_array_elements(metrics->'trees') AS tree WHERE updated_at > now() - INTERVAL '90 days';
-- Number of sites by PHP version:
SELECT metrics->>'php_version' AS php_version, COUNT(*) as site_count FROM webtrees_telemetry GROUP BY metrics->>'php_version' ORDER BY site_count DESC;
-- Oldest site age in years:
SELECT ROUND(MAX((settings->>'account_age_days')::int) / 365.25, 1) AS oldest_account_age_years FROM webtrees_telemetry, jsonb_array_elements(metrics->'user_settings') AS settings WHERE updated_at > now() - INTERVAL '90 days';

A generic function can be written for retrieving data aggregated, filtered and grouped by in a specific way, for instance the function below, reserved for webtrees developers:

CREATE OR REPLACE FUNCTION get_dynamic_telemetry(
  p_group_by JSONB DEFAULT '[]'::jsonb,
  p_aggregates JSONB DEFAULT '[]'::jsonb,
  p_filters JSONB DEFAULT '[]'::jsonb
) RETURNS SETOF JSONB AS $$
DECLARE
  v_sql TEXT;
  v_select TEXT[] := '{}';
  v_groupby TEXT[] := '{}';
  v_where TEXT[] := ARRAY['updated_at > now() - INTERVAL ''90 days'''];
  v_item JSONB;
  v_field TEXT;
  v_op TEXT;
  v_val TEXT;
  v_type TEXT;
  v_alias TEXT;
  v_path_expr TEXT;
BEGIN
  -- 1. BUILD GROUP BY CLAUSES
  FOR v_item IN SELECT * FROM jsonb_array_elements_text(p_group_by) LOOP
    v_path_expr := format('metrics #>> %L::text[]', string_to_array(v_item, '.'));
    v_select := array_append(v_select, v_path_expr || format(' AS %I', replace(v_item, '.', '_')));
    v_groupby := array_append(v_groupby, v_path_expr);
  END LOOP;

  -- 2. BUILD AGGREGATION CLAUSES
  FOR v_item IN SELECT * FROM jsonb_array_elements(p_aggregates) LOOP
    v_field := v_item->>'field';
    v_op := lower(v_item->>'operator');
    v_alias := COALESCE(v_item->>'alias', v_op || '_' || replace(v_field, '.', '_'));

    IF v_op NOT IN ('max', 'min', 'avg', 'sum', 'count') THEN
      RAISE EXCEPTION 'Invalid aggregation operator: %', v_op;
    END IF;

    IF v_op = 'count' AND v_field = '*' THEN
      v_select := array_append(v_select, format('COUNT(*) AS %I', v_alias));
    ELSE
      v_path_expr := format('metrics #>> %L::text[]', string_to_array(v_field, '.'));
      IF v_op = 'count' THEN
        v_select := array_append(v_select, format('COUNT(%s) AS %I', v_path_expr, v_alias));
      ELSE
        v_select := array_append(v_select, format('%s((%s)::numeric) AS %I', upper(v_op), v_path_expr, v_alias));
      END IF;
    END IF;
  END LOOP;

  -- 3. BUILD FILTER CLAUSES
  FOR v_item IN SELECT * FROM jsonb_array_elements(p_filters) LOOP
    v_field := v_item->>'field';
    v_op := lower(v_item->>'operator'); -- added lower() to allow 'LIKE' or 'like'
    v_val := v_item->>'value';
    v_type := COALESCE(v_item->>'type', 'text');

    -- UPDATED WHITELIST: Added 'like' and 'ilike'
    IF v_op NOT IN ('=', '!=', '>', '<', '>=', '<=', 'like', 'ilike') THEN
      RAISE EXCEPTION 'Invalid filter operator: %', v_op;
    END IF;

    v_path_expr := format('metrics #>> %L::text[]', string_to_array(v_field, '.'));
    
    IF v_type = 'numeric' THEN
      v_where := array_append(v_where, format('(%s)::numeric %s %L::numeric', v_path_expr, v_op, v_val));
    ELSE
      v_where := array_append(v_where, format('%s %s %L', v_path_expr, upper(v_op), v_val));
    END IF;
  END LOOP;

  -- 4. CONSTRUCT AND EXECUTE FINAL QUERY
  v_sql := 'SELECT to_jsonb(t) FROM (SELECT ';
  
  IF array_length(v_select, 1) > 0 THEN
    v_sql := v_sql || array_to_string(v_select, ', ');
  ELSE
    v_sql := v_sql || '*'; 
  END IF;

  v_sql := v_sql || ' FROM webtrees_telemetry WHERE ' || array_to_string(v_where, ' AND ');

  IF array_length(v_groupby, 1) > 0 THEN
    v_sql := v_sql || ' GROUP BY ' || array_to_string(v_groupby, ', ');
  END IF;

  v_sql := v_sql || ') t';

  RETURN QUERY EXECUTE v_sql;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
REVOKE EXECUTE ON FUNCTION get_dynamic_telemetry(JSONB, JSONB, JSONB) FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION get_dynamic_telemetry(JSONB, JSONB, JSONB) FROM anon;
GRANT EXECUTE ON FUNCTION get_dynamic_telemetry(JSONB, JSONB, JSONB) TO service_role;

The free Supabase account allows for a 500 MB database. With 8,000 sites, this means we could collect around 60 KB per site. Since our sample payload is only about 9 KB, it easily fits well within the limits.

UI Screenshots

Click to view the settings page
SETTINGS
Click to view the control panel call to action
CONTROL PANEL
Click to view the preview page before sending
PREVIEW
Click to view the data sent confirmation
CONFIRM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 2.81690% with 138 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.40%. Comparing base (a64cc99) to head (308cc63).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
app/Services/TelemetryDataService.php 0.00% 78 Missing ⚠️
app/Http/RequestHandlers/TelemetrySubmitAction.php 0.00% 28 Missing ⚠️
app/Http/RequestHandlers/TelemetryPreviewPage.php 0.00% 10 Missing ⚠️
app/Http/RequestHandlers/TelemetrySettingsPage.php 0.00% 8 Missing ⚠️
...p/Http/RequestHandlers/TelemetrySettingsAction.php 0.00% 7 Missing ⚠️
app/Site.php 0.00% 6 Missing ⚠️
app/Services/UpgradeService.php 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5381      +/-   ##
============================================
- Coverage     35.49%   35.40%   -0.10%     
- Complexity    11211    11241      +30     
============================================
  Files          1166     1171       +5     
  Lines         48077    48214     +137     
============================================
+ Hits          17065    17068       +3     
- Misses        31012    31146     +134     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kiwi3685
Copy link
Copy Markdown
Contributor

I support all the comments about this in the various other topics about the same thing.

Performance and theoretical (statistically flawed) "popularity" of add-on modules is not the responsibility of webtrees. If module creators want this they should jointly create it among themselves as a stand-alone module.

Personally I will not have any part of this code on my server.

Opt-in/out is not a satisfactory solution.

@gmarcon
Copy link
Copy Markdown
Contributor Author

gmarcon commented May 26, 2026

I support all the comments about this in the various other topics about the same thing.

Performance and theoretical (statistically flawed) "popularity" of add-on modules is not the responsibility of webtrees. If module creators want this they should jointly create it among themselves as a stand-alone module.

Personally I will not have any part of this code on my server.

Opt-in/out is not a satisfactory solution.

Thank you for your feedback! Just a couple of quick clarifications:

  • The draft pull request is simply a functional contribution to help guide the discussion about issue Enhancement: installed custom modules statistics, include installed modules in the latest-version check #5377.
  • The scope of the original issue has evolved from tracking custom module installations into a more generic collection of core site metrics (like PHP/DB versions, tree scales, and layout choices) to help drive core development choices.
  • The proposed implementation does not send any data automatically or run in the background, it is actually not an opt-in/opt-out but fully on-demand, meaning nothing is ever transmitted unless an administrator explicitly chooses to click the button and approves the exact JSON payload shown on the preview screen.

Looking forward to other comments from the community!

@kiwi3685
Copy link
Copy Markdown
Contributor

But all of that can only result in statistically meaningless data. There is no way to judge "relative popularity" from it.

So enjoy the development, but I shall, as said, exclude the entire feature from my site.

@gmarcon
Copy link
Copy Markdown
Contributor Author

gmarcon commented May 26, 2026

But all of that can only result in statistically meaningless data. There is no way to judge "relative popularity" from it.

Why do you think the data would be "statistically meaningless"? Of course, the collected data would need to be weighed against the total number of active installations (which we already know thanks to the core upgrade checker).

So enjoy the development, but I shall, as said, exclude the entire feature from my site.

The benefit of the manual, "on-demand" approach is that, should this feature ever be included in a future webtrees version (which is entirely up for discussion), you would not need to patch the software or alter the code to remove it. You simply choose not to click the "Send data" button in the control panel, and absolutely nothing is ever transmitted (you would also have to confirm sending the data after the preview to actually send telemetry data...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants