This project monitors various aspects of internet connectivity for a list of HTTP(S) sites hosted in different countries. It supports local monitoring (with Zscaler client) and remote monitoring over SSH (unmodified internet).
See also: Design Criteria and Principles → DESIGN_CRITERIA.md, Architecture → ARCHITECTURE.md
Quick network performance + proxy/cache heuristics monitor + batch analyzer.
The binary has TWO modes now:
- Collection mode (default): actively probe all sites (optionally via per-IP task fanout) and append new lines, then analyze.
- Analyze-only: read existing
monitor_results.jsonl
batches, print summaries / deltas, emit alert JSON. Enable with--analyze-only=true
.
Recent changes:
- Per-IP-family aggregation: batch JSON summaries now include
ipv4
/ipv6
objects with the same metrics (averages & rates) restricted to that family. Console lines append succinctv4(...)/v6(...)
segments when present. - High-resolution speed fallback: ultra-fast tiny transfers (<1ms) now compute non-zero speeds (previously rounded to 0 when duration in whole ms was 0).
- Expanded proxy / CDN / enterprise security proxy detection: new fields
proxy_name
,proxy_source
,proxy_indicators
classify common CDNs (cloudfront, fastly, akamai, cloudflare, azurecdn, cachefly, google) plus enterprise proxies (zscaler, bluecoat, netskope, paloalto, forcepoint, varnish, squid) and record raw header tokens used for detection. - Environment proxy awareness & usage tracking: fields
env_proxy_url
,env_proxy_bypassed
,using_env_proxy
indicate proxy configuration and whether a proxy transport path was actually used;proxy_remote_ip
/proxy_remote_is_proxy
record the proxy endpoint address. - Origin IP candidate:
origin_ip_candidate
provides a best-effort origin server IP (direct connections: equalsremote_ip
; proxied: first DNS IP) to retain geographic/ASN context when tunneling through a proxy. - TLS certificate heuristics:
tls_cert_subject
,tls_cert_issuer
captured for corporate proxy MITM detection (zscaler, bluecoat, netskope, paloalto, forcepoint, etc.). - Batch proxy aggregation metrics: analysis now reports
env_proxy_usage_rate_pct
,classified_proxy_rate_pct
, perproxy_name_counts
&proxy_name_rate_pct
maps; console addsenv_proxy=
andproxy_classified=
percentages plus top proxy names for quick scan. - Overall multi-batch aggregation line: when more than one batch is analyzed, an additional
overall across <N> batches
line is printed, showing line-weighted aggregate metrics (including IPv4/IPv6 splits) across all displayed batches. Individual batch lines are now marked with(per-batch)
for clarity. - Console output units: analysis lines now include explicit units for clarity (kbps, ms, B, kbps/s).
- Calibration (default‑on in collection mode): On startup the monitor performs a local loopback speed calibration and embeds results into metadata. Targets default to 10/30 per decade up to the measured local max when not specified. CLI prints a tolerance summary ("within N%: X/Y") and logs "[calibration] samples per target: [..]" to show how many read iterations contributed per target. See also
README_analysis.md
(calibration fields) andREADME_iqmviewer.md
(Diagnostics view).
First time (no results yet) you likely want to collect data (this is now the default mode):
go run ./src/main.go --analyze-only=false --iterations 1 --parallel 2
Subsequent (analyze latest 10 batches only):
go run ./src/main.go
Collect 3 new batches (interleaving analysis after each) with 5 workers:
go run ./src/main.go --analyze-only=false --parallel 5 --iterations 3
Specify a custom output file for results & then analyze it:
go run ./src/main.go --analyze-only=false --out results_$(date +%Y%m%d).jsonl
go run ./src/main.go --out results_$(date +%Y%m%d).jsonl # analyze only
Pretty-print last result line:
tail -n 1 monitor_results.jsonl | jq '.'
What you get per site (JSON line):
- Phase timings: DNS / TCP / TLS / TTFB / connect
- Throughput stats: p50/p90/p95/p99, stddev, CoV, jitter, slope, plateaus
- Patterns + human-readable insights
- Cache/proxy indicators (Age, X-Cache, Via, IP mismatch, warm HEAD, range GET)
- Geo & ASN enrichment
- First RTT goodput
Key object: speed_analysis
(stats, patterns, plateaus, insights)
Exit after iterations; appends continuously to the output JSONL (default monitor_results.jsonl
) via an async single-writer queue.
Prerequisite: Go 1.22 or newer (see go.mod
). If go version
fails or shows an older version, install/update Go using one of the options below.
macOS (Homebrew):
brew update
brew install go
go version
Ubuntu/Debian:
- Easiest (Snap, usually latest stable):
sudo snap install go --classic
go version
- APT (may be older on some releases):
sudo apt-get update
sudo apt-get install -y golang-go
go version
- Official tarball (recommended if you need the very latest):
- Find the latest version at https://go.dev/dl/
- Download the tarball matching your CPU/arch (example below uses amd64 and 1.22.x):
cd /tmp
wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz # replace with the latest
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz
echo 'export PATH=/usr/local/go/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
go version
Windows:
- Using winget (Windows 10/11):
winget install GoLang.Go
go version
- Using Chocolatey:
choco install golang -y
go version
- Or download the MSI installer from https://go.dev/dl/ and follow the wizard (adds Go to PATH). Reopen your terminal and run
go version
to verify.
Quick repo smoke test after installing Go:
go build ./...
go test ./...
If both commands succeed, you’re ready to run the monitor.
The repository splits tests into fast pure units and opt-in heavier suites using Go build tags:
Tag | Included Tests | Excluded By Default | Notes |
---|---|---|---|
(none) | Pure helper / analysis unit tests (including uihelpers sizing logic) |
yes (crosshair, integration) | Fast CI signal |
crosshair |
Crosshair interaction & coordinate mapping tests in the viewer | no | Pure math / rendering logic, no file I/O |
integration |
Screenshot rendering, command builder tests (network command strings), PNG width determinism | no | Generates temp JSONL + images |
Examples:
# Fast default
go test ./cmd/iqmviewer/... -count=1
# Add crosshair logic
go test -tags=crosshair ./cmd/iqmviewer -run TestIndexModeMapping_CentersAndSelection -count=1
# Integration (screenshots, command builders)
go test -tags=integration ./cmd/iqmviewer -run TestScreenshotWidths_ -count=1
# Full viewer suite
go test -tags='crosshair integration' ./cmd/iqmviewer/... -count=1
Helper refactor (viewer): ComputeChartDimensions
and ComputeTableColumnWidths
moved to cmd/iqmviewer/uihelpers
so their tests run without spinning up GUI code.
Fedora:
sudo dnf install -y golang
go version
Arch Linux:
sudo pacman -Syu --needed go
go version
Nix / NixOS:
- One-off shell with Go available:
nix-shell -p go
# inside the shell
go version
- Or install into your user profile (persistent):
nix profile install nixpkgs#go
go version
Core timing & network phases:
- DNS resolution time, TCP connect time, TLS handshake time
- HTTP connect + granular phase timings via httptrace (DNS, connect, TLS, time-to-conn, TTFB)
Transfer performance:
- Full transfer time, size, instantaneous sampled speeds (100ms interval)
- Statistical speed analysis: min / max / average / stddev / coefficient of variation
- Percentiles: p50, p90, p95, p99
- Linear regression slope (kbps/s) to detect ramp-up or decline
- Jitter (mean absolute percent change between consecutive samples)
- Pattern heuristics (slow start, mid-transfer dip, gradual increase/decline, volatility, plateau etc.)
- Plateau detection (segments within ±10% of median throughput) with count, longest duration, stability flag
- First RTT goodput estimation (bytes delivered within 1 RTT and derived kbps)
Caching / proxy / CDN heuristics:
- Detects cache indicators (Age header, X-Cache HIT, Via)
- Prefetch suspicion (GET much faster than HEAD)
- IP mismatch (resolved DNS IPs vs remote connection IP)
- Secondary Range GET timing + header inspection
- Warm HEAD probe (connection reuse / cache speedup)
- Probe header echo detection
Geolocation & ASN enrichment:
- Country verification (configured vs GeoIP detected) using GeoLite2 then legacy GeoIP fallback
- ASN number & organization lookup
Output enrichment:
- Human-readable insights summarizing stability, variability, patterns, plateaus, slope, jitter
- All results appended as JSON Lines (JSONL) for easy streaming / ingestion
Operational conveniences:
- JSONC sites configuration (comments allowed)
- Parallel monitoring with bounded concurrency (--parallel)
- Asynchronous single-writer result queue (low contention, durable append)
- Multiple iterations (--iterations)
The monitor emits concise progress and stall logs during transfers. When the target size is known (via Content-Length or Content-Range), logs include current/target bytes and percentage; otherwise they show current/unknown.
- Periodic progress (primary GET):
- Debug: "[site ip] transfer progress current/target bytes (X.Y%)" or "current/unknown"
- Info: "[site ip] progress current/target bytes (MB, X.Y%)" or "current/unknown (MB)"
- Frequency: ~3s at debug; ~10s at info level.
- Watchdog (no recent progress):
- Known size: "watchdog: no progress for 10s (current/target bytes, X.Y%)"
- Unknown size: "watchdog: no progress for 10s (current/unknown bytes)"
- Stall abort (past stall-timeout without reads):
- Known size: "transfer stalled for 20s, aborting (current/target bytes, X.Y%)"
- Unknown size: "transfer stalled for 20s, aborting (current/unknown bytes)"
- Secondary Range GET progress:
- Uses Content-Range to determine the expected bytes (fallback to Content-Length if present).
- Logs similar progress lines: "range progress current/target bytes (X.Y%)" or "current/unknown".
Tip: Increase --log-level=debug
to see more frequent progress updates.
- Add sites to
sites.jsonc
- Run the monitor from your local PC or via SSH
Windows users
- If you have Git Bash or WSL installed, you can run the existing
.sh
scripts directly. - Alternatively, use the provided PowerShell runner
monitor_core.ps1
, which mirrors the.sh
behavior (collection + analysis steps):
# PowerShell
./monitor_core.ps1 -Situation "Home_CorporateLaptop_CorpProxy_SequencedTest" -Parallel 2 -Iterations 1 -AnalysisBatches 15
# Or via env vars
$env:SITUATION = "Home_WiFi"; $env:PARALLEL = "2"; ./monitor_core.ps1
PowerShell prerequisites: Go on PATH; PowerShell 7+ recommended. The Go binary itself remains cross‑platform.
PowerShell scenario wrappers (Windows)
- For each Bash wrapper listed below, a PowerShell equivalent exists with the same basename and a
.ps1
extension. - They call
monitor_core.ps1
and behave identically: a Monitor (collection) phase followed by an Analysis phase for the sameSITUATION
.
Examples:
# Run the corporate-at-home scenario wrapper (PowerShell)
./monitor_atHomeCorpLaptop.ps1
# Increase the analysis window to 25 batches for a one-off run
$env:ANALYSIS_BATCHES = "25"; ./monitor_atHomeCorpLaptop.ps1
Available curated PowerShell wrappers:
Script | Intended Situation | Proxy State | Concurrency Mode | Default PARALLEL | Default ITERATIONS | Notes |
---|---|---|---|---|---|---|
monitor_atHomeCorpLaptop.ps1 |
Corporate laptop at home | Corporate proxy enabled | Sequential | 1 | 1 | Baseline for controlled home + corp proxy path |
monitor_atHomeCorpLaptopNoCorpProxy.ps1 |
Corporate laptop at home | Corporate proxy disabled | Sequential | 1 | 1 | Contrast vs proxy-enabled to isolate proxy impact |
monitor_atOfficeCorpLaptop.ps1 |
Corporate laptop in office | Corporate proxy enabled | Sequential | 1 | 1 | On‑prem baseline |
monitor_atOfficeCorpLaptopNoCorpProxy.ps1 |
Corporate laptop in office | Proxy disabled | Sequential | 1 | 1 | Office direct path comparison |
monitor_atHomePrivateLaptop.ps1 |
Private (non‑corp) laptop at home | No corporate proxy | Sequential | 1 | 1 | Clean consumer environment |
Tip: Copy any of the above to a new filename (e.g., monitor_HotelWifi.ps1
) and adjust the Situation
default at the top to create your own repeatable scenario.
For consistent, comparable batches, use the provided wrapper scripts instead of manually exporting environment variables each time. Each wrapper sets a descriptive SITUATION
label and then execs monitor_core.sh
, which performs the collection using a single persistent results file (monitor_results.jsonl
by default). All situations append to the same file; the meta.situation
field lets you later filter / compare.
Available curated wrappers:
Script | Intended Situation | Proxy State | Concurrency Mode | Default PARALLEL | Default ITERATIONS | Notes |
---|---|---|---|---|---|---|
monitor_atHomeCorpLaptop.sh |
Corporate laptop at home | Corporate proxy enabled | Sequential | 1 | 1 | Baseline for controlled home + corp proxy path |
monitor_atHomeCorpLaptopNoCorpProxy.sh |
Corporate laptop at home | Corporate proxy disabled | Sequential | 1 | 1 | Contrast vs proxy-enabled to isolate proxy impact |
monitor_atOfficeCorpLaptop.sh |
Corporate laptop in office | Corporate proxy enabled | Sequential | 1 | 1 | On‑prem baseline |
monitor_atOfficeCorpLaptopNoCorpProxy.sh |
Corporate laptop in office | Proxy disabled | Sequential | 1 | 1 | Office direct path comparison |
monitor_atHomePrivateLaptop.sh |
Private (non‑corp) laptop at home | No corporate proxy | Sequential | 1 | 1 | Clean consumer environment |
monitor_bulk_example.sh |
Private laptop stress / capacity test | No corporate proxy | Parallel | 8 | 1 | Higher load; watch impact on local link |
Template / examples to clone & adapt:
monitor_scenario7.sh
, monitor_scenario8.sh
, monitor_scenario9.sh
– kept intentionally simple so you can copy, rename, and set a custom SITUATION
(e.g. Hotel_WiFi_VPNOn
, MobileHotspot
, Home_ISP_Failover
). Create as many as you need; all improvements in monitor_core.sh
automatically apply.
Quick start with a wrapper:
./monitor_atHomeCorpLaptop.sh
./monitor_atHomeCorpLaptopNoCorpProxy.sh
./monitor_bulk_example.sh # high parallel example
What happens when you run a wrapper
- The core runner (Bash:
monitor_core.sh
, PowerShell:monitor_core.ps1
) performs two phases:- Monitor/collection phase: collect a fresh batch and append to the results file
- Analysis phase: summarize the most recent batches for the same
SITUATION
- After a successful monitor phase, the analysis runs automatically and prints per‑batch lines and an overall aggregate across your most recent batches.
- Control the analysis window using the environment variable
ANALYSIS_BATCHES
(default: 15). Examples:
ANALYSIS_BATCHES=25 ./monitor_atHomeCorpLaptop.sh
$env:ANALYSIS_BATCHES = "25"; ./monitor_atHomeCorpLaptop.ps1
The monitor banner prints the selected parameters (parallel, iterations, situation, log level, sites, output path, analysis batches) so runs are self-describing in logs.
TTFB (Avg) and TTFB Percentiles (ms)
- TTFB is the time to first byte for a request. Avg TTFB shows the mean per batch (Overall/IPv4/IPv6).
- TTFB Percentiles show P50/P90/P95/P99 per batch. Expect P99 ≥ P95 ≥ P90 ≥ P50. Larger gaps mean heavier tail latency (e.g., slow DNS/TLS, cold caches, backend spikes).
Speed Percentiles
- P50/P90/P95/P99 of transfer speed. Wide gaps indicate jitter or unstable throughput.
Error Rate, Jitter, CoV
- Error Rate: percentage of requests with errors.
- Jitter (%): mean absolute relative change between consecutive sample speeds in a transfer; higher = more erratic.
- Coefficient of Variation (%): stddev/mean of speeds; higher = less consistent.
Plateau metrics
- Plateau Count: number of intra-transfer stable segments.
- Longest Plateau (ms): the longest such segment; long values can indicate stalls.
- Plateau Stable Rate (%): fraction of time spent in stable plateaus (smoother throughput).
Cache/Proxy signals
- Cache Hit Rate (%): fraction likely served from caches.
- Proxy Suspected Rate (%): requests likely traversing proxies.
- Warm Cache Suspected Rate (%): requests likely benefiting from warm caches.
Diagnostics (tail, deltas, and SLA)
- Tail Heaviness (P99/P50 Speed): Ratio of P99 to P50 throughput. Values > 2 suggest a heavy tail (more extreme slowdowns vs median). Useful to spot jittery or bursty networks.
- TTFB Tail Heaviness (P95/P50): Ratio of tail to median latency per batch. Values close to 1.0 indicate a tight latency distribution; higher values mean heavier tail latency and more outliers/spikes.
- Family Delta – Speed (IPv6−IPv4): Difference in average transfer speed between IPv6 and IPv4. Positive means IPv6 faster; negative means IPv4 faster.
- Family Delta – TTFB (IPv4−IPv6): Difference in average TTFB between IPv4 and IPv6. Positive means IPv4 slower (higher latency); negative means IPv6 slower.
- Family Delta – Speed % (IPv6 vs IPv4): Percent difference vs IPv4 baseline: (IPv6−IPv4)/IPv4 × 100%. Positive means IPv6 faster.
- Family Delta – TTFB % (IPv6 vs IPv4): Percent difference vs IPv6 baseline: (IPv4−IPv6)/IPv6 × 100%. Positive means IPv6 lower latency.
- SLA Compliance – Speed: Estimated percent of requests that meet or exceed the P50 speed target. Approximated from percentiles; higher is better. Threshold is user‑configurable.
- SLA Compliance – TTFB: Estimated percent of requests that meet or beat the P95 TTFB target (i.e., P95 ≤ threshold). Approximated from percentiles; higher is better. Threshold is user‑configurable.
- SLA Compliance Delta – Speed (pp): IPv6 compliance minus IPv4 compliance, in percentage points. Positive means IPv6 meets the speed SLA more often.
- SLA Compliance Delta – TTFB (pp): IPv6 compliance minus IPv4 compliance, in percentage points. Positive means IPv6 meets the TTFB SLA more often.
- TTFB P95−P50 Gap: Difference between tail and median latency (P95 minus P50) per batch. Larger gaps indicate heavier tail latency and more outliers/spikes.
TTFB Tail Heaviness (P95/P50)
- What it shows: The ratio of P95 TTFB to P50 TTFB per batch (Overall/IPv4/IPv6).
- How to read: ~1.0 means the tail is close to median (tight distribution). Higher ratios mean heavier tails and greater latency variability.
- Why it matters: Complements the absolute P95−P50 Gap by giving a dimensionless view that is comparable across sites and situations.
- Situation filter: Use the Situation dropdown in the toolbar to scope charts and the table to a single scenario or All.
- Titles: Chart titles are kept clean and do not include the situation label.
- Watermark: The active Situation is shown in a subtle bottom-right on-image watermark (e.g., "Situation: Home_WiFi"). This watermark is embedded into exported PNGs so context is preserved when sharing.
- X-Axis modes: Batch, RunTag, or Time (Settings → X‑Axis). Y-Scale: Absolute (zero baseline) or Relative (nice bounds) (Settings → Y‑Scale).
- SLA thresholds (configurable): Settings → “SLA Thresholds…” lets you set corporate‑friendly defaults or your own targets:
- SLA P50 Speed (kbps): default 10,000 kbps (≈10 Mbps). Used by the SLA Compliance – Speed chart and hover labels.
- SLA P95 TTFB (ms): default 200 ms. Used by the SLA Compliance – TTFB chart and hover labels. Values are persisted across sessions. Chart titles reflect the current thresholds and units.
- Exports: Individual export items mirror the on‑screen order. "Export All (One Image)" stitches charts together in the same order shown in the UI, maintaining the watermark per chart.
Rolling overlays (Avg Speed & Avg TTFB)
- Rolling window N: default 7 (persisted). Change via Settings → “Rolling Window…”. Smooths out short-term noise.
- Rolling Mean (μ): toggle to show a smoothed trend line across the last N batches.
- ±1σ Band: independent toggle to show a translucent band between μ−σ and μ+σ. Legend shows a single entry “Rolling μ±1σ (N)” per chart when enabled.
- Help dialogs on the Speed/TTFB charts include a short hint on what the band means and how changing N affects smoothing and band width.
Stability & quality (viewer)
- Low‑Speed Time Share (%): time spent below the Low‑Speed Threshold; highlights choppiness even when averages look fine.
- Stall Rate (%): share of requests that stalled; good as a reliability/quality symptom.
- Avg Stall Time (ms): average stalled time for stalled requests; higher means longer buffering events.
- Stalled Requests Count: quick absolute count derived as round(Lines × Stall Rate%). Has a dedicated export option.
Low‑Speed Threshold control
- Settings → “Low‑Speed Threshold…” sets the cutoff for Low‑Speed Time Share. Default 1000 kbps; persisted. Changing it re‑analyzes data on the fly. Stall metrics are independent of this threshold.
See also:
README_iqmviewer.md
for a compact, feature‑focused viewer overview and troubleshooting.README_analysis.md
for analysis thresholds, options, and derived metrics reference.
Quick screenshots (from the viewer):
More visuals available:
Micro‑stalls are short pauses during transfer where no bytes are received for a brief period, but the request later continues and completes (distinct from hard stalls that hit the stall‑timeout and abort).
- Default threshold: a micro‑stall is detected when the cumulative bytes do not increase for ≥500 ms between speed samples.
- Metrics produced by analysis:
- Transient Stall Rate (%): share of requests with ≥1 micro‑stall.
- Avg Transient Stall Time (ms): average total paused time per affected request.
- Avg Transient Stall Count: average number of micro‑stall events per affected request.
These metrics surface in the analysis summaries and are visualized in the viewer (with dedicated exports and inclusion in combined/exported screenshots).
Use the helper script to regenerate images from your latest results:
./update_screenshots.sh [RESULTS_FILE] [SITUATION] [THEME] [VARIANTS] [BATCHES] [LOW_SPEED_KBPS] [DNS_LEGACY]
Defaults: RESULTS_FILE=monitor_results.jsonl
, SITUATION=All
, THEME=auto
, VARIANTS=averages
, BATCHES=50
, LOW_SPEED_KBPS=1000
, DNS_LEGACY=false
.
Examples:
-
Auto theme (system), include action variants, last 50 batches:
./update_screenshots.sh monitor_results.jsonl All auto averages 50 1000
-
Force light theme, disable action variants, use 25 batches and 1500 kbps low-speed threshold:
./update_screenshots.sh monitor_results.jsonl Home_WiFi light none 25 1500
-
Include dashed legacy DNS overlay in DNS chart screenshots (7th arg):
./update_screenshots.sh monitor_results.jsonl All auto averages 50 1000 true
Or run headless directly via the viewer:
go build ./cmd/iqmviewer
./iqmviewer -file monitor_results.jsonl \
--screenshot \
--screenshot-outdir docs/images \
--screenshot-situation All \
--screenshot-batches 50 \
--screenshot-rolling-window 7 \
--screenshot-dns-legacy true
--screenshot-rolling-band \
--screenshot-theme auto \
--screenshot-variants averages \
--screenshot-low-speed-threshold-kbps 1000
Setup timing screenshots (written to docs/images
when enabled):
dns_lookup_time.png
tcp_connect_time.png
tls_handshake_time.png
Analyzing only recent batches for a specific situation:
go run ./src/main.go --out monitor_results.jsonl --analysis-batches 5 --situation Home_CorporateLaptop_CorpProxy_SequencedTest
Ad‑hoc custom run without a wrapper (still allowed):
SITUATION="Hotel_WiFi" PARALLEL=2 ITERATIONS=2 ./monitor_core.sh
Filtering results by situation with jq:
grep '"Home_CorporateLaptop_NoCorpProxy_SequentialTest"' monitor_results.jsonl | tail -n 3 | jq '.site_result.name, .site_result.transfer_speed_kbps'
Comparing two situations’ average speeds (last line of each situation):
last_home_proxy=$(grep '"Home_CorporateLaptop_CorpProxy_SequencedTest"' monitor_results.jsonl | tail -n 1)
last_home_noproxy=$(grep '"Home_CorporateLaptop_NoCorpProxy_SequentialTest"' monitor_results.jsonl | tail -n 1)
echo "$last_home_proxy" | jq '.meta.situation, .site_result.transfer_speed_kbps'
echo "---"
echo "$last_home_noproxy" | jq '.meta.situation, .site_result.transfer_speed_kbps'
- The built-in analysis phase groups and aggregates batches by the exact
SITUATION
label. Batches with different labels are not co-aggregated nor compared programmatically. - Cross-situation comparison (e.g., corporate laptop vs private laptop) is not implemented and may never be. Workaround: compare the printed analysis outputs manually and keep test conditions identical.
- Tips for fair comparisons:
- Use the same
SITES
,PARALLEL
,ITERATIONS
, andANALYSIS_BATCHES
for both scenarios. - Run each scenario wrapper back-to-back so the time window is similar, then compare the per-batch and
overall across
lines. - Optionally export results to separate files via different
OUT_BASENAME
values for side-by-side inspection.
- Use the same
Field note (anecdotal): In one setup, a 15-year-old ThinkPad T430 (Linux, i7‑3840QM) measured significantly better throughput and TTFB than a modern Mac M2 using a corporate proxy—about 3x in sequential and parallel (8x) modes. If proxying is part of your environment, it’s worth investigating its impact on speed and TTFB.
Guidelines for creating a new wrapper:
- Copy one of the
monitor_scenario*.sh
files to a meaningful name, e.g.monitor_hotelWifi.sh
. - Edit the banner comment to describe location/device/proxy state.
- Set a stable
SITUATION
string (avoid spaces; use underscores for parsing ease). - Optionally adjust
PARALLEL
,ITERATIONS
, orOUT_BASENAME
(if you want to isolate output). - Run it and later analyze by passing
--situation <label>
to the analyzer.
Rationale: Re-using identical labels provides clean before/after / environment-to-environment comparisons and enables simple grep/jq pipelines without extra bookkeeping.
From repository root (module directory):
go run ./src/main.go [--analyze-only=(true|false)] [--analysis-batches N] \\
[--sites <path>] [--iterations N] [--parallel M] [--out <file>] [--log-level level]
Flags:
--analyze-only
(bool, defaultfalse
): When true, no new measurements are collected; existing result batches are summarized & compared.--analysis-batches
(int, default10
): Maximum recent batches to parse when analyzing only (caps work; older batches ignored beyond this window).--sites
(string, default./sites.jsonc
when collecting): Path to JSONC site list (ignored in analyze-only mode).--iterations
(int, default1
): Sequential passes over the site list (collection mode only).--parallel
(int, default1
): Maximum concurrent site monitors (collection mode only).--out
(string, defaultmonitor_results.jsonl
): Output JSON Lines file (each line = root object{meta, site_result}
). Both modes read this path.--http-timeout
(duration, default120s
): Overall timeout per individual HTTP request (HEAD / GET / range / warm HEAD) including body transfer.--stall-timeout
(duration, default20s
): Abort an in-progress body transfer if no additional bytes arrive within this window (marks line withtransfer_stalled
).--site-timeout
(duration, default120s
): Overall budget per site (sequential mode) or per (site,IP) task (fanout) including DNS and all probes; aborts remaining steps if exceeded.--dns-timeout
(duration, default5s
): Default DNS lookup timeout used when--site-timeout
is 0. In IP fanout, each pre-resolve usesmin(--site-timeout, --dns-timeout)
.
Environment flags:
--pre-ttfb-stall
: Enables an optional pre‑TTFB stall watchdog for the primary GET. If no first byte arrives within--stall-timeout
, the request is canceled early and the line recordshttp_error = "stall_pre_ttfb"
. Default is disabled to preserve historical behavior.--max-ips-per-site
(int, default0
= unlimited): Limit probed IPs per site (first IPv4 + first IPv6 typical when set to 2) to prevent long multi-IP sites monopolizing workers.--ip-fanout
(bool, defaulttrue
): Pre-resolve all sites, build one task per selected IP, shuffle for fairness, then process concurrently. Disable with--ip-fanout=false
to use classic per-site sequential IP iteration.- Progress logging controls (collection mode):
--progress-interval
(duration, default5s
): Emit periodic worker status (0 disables).--progress-sites
(bool, defaulttrue
): Show active site/IP labels in progress lines.--progress-resolve-ip
(bool, defaulttrue
): In non-fanout mode, attempt short-timeout DNS to display first 1–2 IPs inline.
Notes:
- DNS lookups in the monitor are always context-aware. When
--site-timeout
is set, DNS is bounded by that value; otherwise it uses--dns-timeout
. - Progress inline IP resolution uses a fixed 1s DNS deadline to avoid blocking the progress logger.
--situation
(string, defaultUnknown
): Arbitrary label describing the current network context (e.g.Home
,Office
,VPN
,Hotel
). Stored in each result'smeta.situation
to segment and compare batches later.- Alert thresholds (percentages unless noted) to emit
[alert ...]
lines comparing the newest batch vs aggregate of prior batches:--speed-drop-alert
(default30
): Trigger if average speed decreased by at least this percent.--ttfb-increase-alert
(default50
): Trigger if average TTFB increased by at least this percent.--error-rate-alert
(default20
): Trigger if error lines (tcp/http) exceed this percentage of batch lines.--jitter-alert
(default25
): Trigger if average jitter (mean absolute % change) exceeds this percent.--p99p50-ratio-alert
(default2.0
): Trigger if p99/p50 throughput ratio equals or exceeds this value.--alerts-json <path>
: If set, writes a structured JSON alert report summarizing latest batch, deltas, thresholds, and triggered alerts.
Examples:
# Single pass, default sites file
go run ./src/main.go
# Three passes over custom file
go run ./src/main.go --sites ./sites.jsonc --iterations 3
# Parallel monitoring with 5 workers writing to a custom file
go run ./src/main.go --parallel 5 --out fast_run.jsonl
# Disable IP fanout (process each site's IPs sequentially inside its worker slot)
go run ./src/main.go --ip-fanout=false --parallel 4
# Verbose run (detailed phase logging + progress + queue counters every 3s)
go run ./src/main.go --parallel 2 --log-level debug --progress-interval 3s
# Collect then write JSON alert report explicitly named
go run ./src/main.go --analyze-only=false --parallel 4 --iterations 2 \
--speed-drop-alert 25 --ttfb-increase-alert 40 --error-rate-alert 10 \
--jitter-alert 20 --p99p50-ratio-alert 1.8 --alerts-json alerts_latest.json
# Inspect alert JSON
jq '.' alerts_latest.json
The program appends JSON objects to monitor_results.jsonl
(opened once; writes via queue):
tail -f monitor_results.jsonl | jq '.name, .transfer_speed_kbps'
# Last record pretty
tail -n 1 monitor_results.jsonl | jq '.'
# Filter for a site and show latest
grep '"name":"Google US"' monitor_results.jsonl | tail -n 1 | jq '.'
Expand field groups
Identity & config:
name
,url
,country_configured
,country_geoip
Geo / ASN:
asn_number
,asn_org
DNS / connection / TLS / httptrace:
dns_time_ms
,dns_ips
(array of resolved IP strings),tcp_time_ms
,ssl_handshake_time_ms
http_connect_time_ms
,trace_dns_ms
,trace_connect_ms
,trace_tls_ms
,trace_time_to_conn_ms
,trace_ttfb_ms
Headers / proxy / cache signals:
header_via
,header_x_cache
,header_age
cache_present
,proxy_suspected
,prefetch_suspected
,ip_mismatch
proxy_name
,proxy_source
,proxy_indicators
(classification + hints: via/x-cache/server or specialized headers like X-Zscaler-*)env_proxy_url
,env_proxy_bypassed
,using_env_proxy
proxy_remote_ip
,proxy_remote_is_proxy
,origin_ip_candidate
tls_cert_subject
,tls_cert_issuer
head_time_ms
,head_status
,head_error
,head_get_time_ratio
second_get_status
,second_get_time_ms
,second_get_header_age
,second_get_x_cache
,second_get_content_range
,second_get_cache_present
warm_head_time_ms
,warm_head_speedup
,warm_cache_suspected
Probe / connection reuse:
probe_header_value
,probe_echoed
,dial_count
,connection_reused_second_get
,remote_ip
,ip_family
(ipv4|ipv6),ip_index
(order among selected IPs),resolved_ip
Transfer stats:
transfer_time_ms
,transfer_size_bytes
,transfer_speed_kbps
transfer_speed_samples
(array of{time_ms, bytes, speed_kbps}
)content_length_header
,content_length_mismatch
first_rtt_bytes
,first_rtt_goodput_kbps
transfer_stalled
(bool) &stall_elapsed_ms
when a stall timeout aborts body download
Speed analysis object (speed_analysis
):
average_kbps
,stddev_kbps
,coef_variation
min_kbps
,max_kbps
,p50_kbps
,p90_kbps
,p95_kbps
,p99_kbps
slope_kbps_per_sec
,jitter_mean_abs_pct
patterns
(array of pattern strings)- Plateau metrics:
plateau_count
,longest_plateau_ms
,plateau_stable
,plateau_segments
(array of{start_ms,end_ms,duration_ms,avg_kbps}
) insights
(array of human-readable summary strings)
Errors:
dns_error
,tcp_error
,ssl_error
,http_error
,second_get_error
(present only on failures)
Expand key metrics guidance
- Low `coef_variation` (< 0.15) + high `plateau_stable` suggests steady throughput. - Large gap between `p99_kbps` and `p50_kbps` indicates bursty peaks. - Positive `slope_kbps_per_sec` implies ramp-up (e.g. congestion window growth / server warmup); negative indicates throughput decay. - `jitter_mean_abs_pct` > 0.15 (15%) indicates unstable delivery. - Multiple long plateau segments with gaps may show adaptive bitrate shifts or shaping.When --alerts-json
is provided, a JSON file is written after analysis (analysis now fully centralized in src/analysis
).
If you do NOT provide --alerts-json
:
- In analyze-only mode: an alert report for the LAST batch is written:
alerts_<last_run_tag>.json
(repo root). - In collection mode: a per-iteration report is written:
alerts_<run_tag>.json
(repo root).
alerts_<run_tag>.json
Examples: alerts_20250818_093154.json
, alerts_20250818_093154_i2.json
(2nd iteration) or in analyze-only mode just the newest batch tag.
Files are placed at the repository root even if you run from src/
(the code detects src
CWD and writes to parent). Provide an explicit --alerts-json
path if you want to control location/name or consolidate across iterations.
Example schema (multi-batch comparison):
{
"generated_at": "2025-08-18T12:34:56.123456Z",
"schema_version": 3,
"run_tag": "20250818_123456",
"batches_compared": 5,
"last_batch_summary": {
"lines": 42,
"avg_speed_kbps": 18500.4,
"median_speed_kbps": 18120.7,
"avg_ttfb_ms": 210.0,
"avg_bytes": 52428800,
"error_lines": 1,
"error_rate_pct": 2.38,
"first_rtt_goodput_kbps": 950.2,
"p50_kbps": 18200.3,
"p99_p50_ratio": 1.35,
"plateau_count": 2,
"longest_plateau_ms": 3400,
"jitter_mean_abs_pct": 8.4
},
"comparison": { // omitted and replaced by "single_batch": true when only one batch exists
"prev_avg_speed_kbps": 19200.1,
"prev_avg_ttfb_ms": 180.0,
"speed_delta_pct": -3.65,
"ttfb_delta_pct": 16.7
},
"alerts": [
"ttfb_increase 16.7% >= 15.0%"
],
"thresholds": {
"speed_drop_pct": 30,
"ttfb_increase_pct": 50,
"error_rate_pct": 20,
"jitter_pct": 25,
"p99_p50_ratio": 2
}
}
Single-batch schema (no comparison yet):
{ "generated_at": "2025-08-18T12:34:56.123456Z", "schema_version": 3, "run_tag": "20250818_123456", "batches_compared": 1, "last_batch_summary": { ... }, "single_batch": true, "alerts": [], "thresholds": { ... } }
If only one batch exists, single_batch: true
is included and no deltas are computed. The alerts
field is always an array (may be empty []
when no thresholds are exceeded).
Implementation notes:
- Analysis & aggregation logic lives in
analysis.AnalyzeRecentResultsFull
, replacing older duplicated logic that previously lived insidemain.go
. - Extended metrics (first RTT goodput, p50, p90, p95, p99, p99/p50 ratio, plateau stats, jitter, slope, coefficient of variation %, head/get ratio, cache & proxy related rates) are averaged or rate-derived per batch across site lines with data.
You can feed this JSON into automation or monitoring systems (e.g. ship as an event, create Grafana annotations, or trigger CI gates). Combine with cron to run periodically and inspect alerts
array length for gating decisions.
Expand batch summary glossary
For each batch the analyzer outputs a line with the following aggregated fields (JSON names in parentheses):Core:
- Average speed (avg_speed_kbps) / Median speed (median_speed_kbps)
- Average TTFB ms (avg_ttfb_ms)
- Average transferred bytes (avg_bytes)
- Error line count (error_lines) – lines containing tcp_error or http_error
Latency & initial delivery:
- First RTT goodput kbps (avg_first_rtt_goodput_kbps)
- HEAD/GET time ratio (avg_head_get_time_ratio) – mean HEAD latency divided by initial GET latency
Throughput distribution:
- p50/p90/p95/p99 averages (avg_p50_kbps, avg_p90_kbps, avg_p95_kbps, avg_p99_kbps)
- p99/p50 ratio (avg_p99_p50_ratio) – burstiness indicator ( >2 often volatile )
Variability & dynamics:
- Jitter mean absolute % (avg_jitter_mean_abs_pct)
- Coefficient of variation % (avg_coef_variation_pct)
- Slope kbps/s (avg_slope_kbps_per_sec) – linear regression gradient over samples
Plateaus & stability:
- Plateau count (avg_plateau_count)
- Longest plateau ms (avg_longest_plateau_ms)
- Stable plateau rate % (plateau_stable_rate_pct)
Caching / proxy / reuse rates (% of lines where condition true):
- Cache hit rate (cache_hit_rate_pct)
- Proxy suspected rate (proxy_suspected_rate_pct)
- Deprecated in the Viewer UI: The legacy combined "Proxy Suspected Rate (%)" chart is hidden/removed from the Viewer. Use the split charts "Enterprise Proxy Rate (%)" and "Server-side Proxy Rate (%)" instead.
- Backward compatibility: The analysis still emits
proxy_suspected_rate_pct
for downstream consumers and older tooling.
- IP mismatch rate (ip_mismatch_rate_pct)
- Prefetch suspected rate (prefetch_suspected_rate_pct)
- Warm cache suspected rate (warm_cache_suspected_rate_pct)
- Connection reuse rate (conn_reuse_rate_pct)
Use these to correlate: e.g. a rise in ip_mismatch_rate_pct
plus degraded avg_speed_kbps
may indicate path changes; increasing avg_head_get_time_ratio
with stable speed might highlight control plane latency growth.
Per-family batch objects (ipv4
, ipv6
): each, when present, mirrors the core metric names (e.g. avg_speed_kbps
, avg_ttfb_ms
, avg_p50_kbps
, rates like cache_hit_rate_pct
) restricted to that IP family. They are omitted if a family has zero lines in the batch. Alert JSON currently reports only the combined batch (family-level alerts not yet emitted).
Protocol/TLS/encoding rollups (to help diagnose proxy/network path issues):
- http_protocol_counts: number of lines per HTTP protocol, e.g. HTTP/1.1 vs HTTP/2.0
- http_protocol_rate_pct: share for each HTTP protocol within the batch
- avg_speed_by_http_protocol_kbps: average transfer speed per HTTP protocol
- stall_rate_by_http_protocol_pct: stall rate for each HTTP protocol
- error_rate_by_http_protocol_pct: error rate for each HTTP protocol
- tls_version_counts / tls_version_rate_pct: counts and shares per TLS version (e.g., TLS1.2, TLS1.3)
- alpn_counts / alpn_rate_pct: counts and shares per negotiated ALPN (e.g., h2, http/1.1)
- chunked_rate_pct: fraction of lines using chunked transfer encoding
These metrics are derived from the primary GET response and can be used to correlate performance or reliability differences between HTTP/1.1 and HTTP/2, TLS versions, or usage of chunked encoding.
Note: When the primary GET is unavailable (e.g., due to a timeout), the monitor now populates protocol/TLS/encoding fields from earlier or subsequent phases when possible:
- HEAD response: protocol/TLS/ALPN are captured on success and retained if GET later fails.
- Range (partial) GET response: protocol/TLS/ALPN are also captured here and typically match the primary GET connection. This ensures the viewer’s protocol/TLS charts don’t degrade to Unknown solely because the primary GET failed.
Show example console line
A typical per-batch analyzer line (fields may be omitted when zero) now looks like:
[batch 20250818_131129] (per-batch) lines=42 dur=33500ms avg_speed=18450.7 median=18210.0 ttfb=210 bytes=52428800 errors=1 first_rtt=950.2 p50=18200.3 p99/p50=1.35 jitter=8.4% slope=120.5 cov%=9.8 cache_hit=40.0% reuse=35.0% plateaus=2.0 longest_ms=3400 v4(lines=30 spd=19010.4 ttfb=205 p50=18800.1) v6(lines=12 spd=16880.2 ttfb=225 p50=16750.0)
When more than one batch is in scope (rolling or analyze-only with >1), a line-weighted overall line follows:
[overall across 5 batches] lines=210 avg_speed=18620.3 avg_ttfb=205 avg_bytes=52400000 first_rtt=955.1 p50=18310.4 p99/p50=1.37 jitter=8.6% slope=118.2 cov%=9.5 plateaus=1.9 longest_ms=3600 cache_hit=39.2% reuse=34.1% v4(lines=150 spd=19120.5 ttfb=198 p50=18850.2) v6(lines=60 spd=17110.7 ttfb=223 p50=16780.0)
Interpretation:
(per-batch)
lines summarize a single run_tag batch (one iteration or ip-fanout group).overall across
line aggregates all batches displayed (line-weighted) for a quick broader trend view; use this to observe slow drift vs previous collection windows.
Field order is optimized for quick visual scanning: core throughput & latency first, variability & stability next, then rates and IP-family splits.
Show extended JSON schema
last_batch_summary
in the alert report always includes the core fields; extended fields are present when data exists (non-zero or at least one contributing line):
{
"lines": 42,
"avg_speed_kbps": 18450.7,
"median_speed_kbps": 18210.0,
"avg_ttfb_ms": 210.0,
"avg_bytes": 52428800,
"error_lines": 1,
"error_rate_pct": 2.38, // derived during alert JSON generation
"avg_first_rtt_goodput_kbps": 950.2,
"avg_p50_kbps": 18200.3,
"avg_p90_kbps": 18340.1, // optional
"avg_p95_kbps": 18390.6, // optional
"avg_p99_kbps": 18550.9, // optional
"avg_p99_p50_ratio": 1.35,
"avg_plateau_count": 2.0,
"avg_longest_plateau_ms": 3400,
"avg_jitter_mean_abs_pct": 8.4,
"avg_slope_kbps_per_sec": 120.5, // optional
"avg_coef_variation_pct": 9.8, // optional
"avg_head_get_time_ratio": 0.62, // optional
"cache_hit_rate_pct": 40.0, // optional (percent of lines)
"proxy_suspected_rate_pct": 5.0, // optional
"ip_mismatch_rate_pct": 10.0, // optional
"prefetch_suspected_rate_pct": 2.5, // optional
"warm_cache_suspected_rate_pct": 7.5, // optional
"conn_reuse_rate_pct": 35.0, // optional
"plateau_stable_rate_pct": 45.0 // optional
}
Notes:
- Percent fields are already expressed as percentages (0–100) with one decimal typical.
- A field may be omitted (not just zero) in raw summaries when no contributing lines expose that signal; alert JSON focuses on non-zero metrics for brevity.
error_rate_pct
is computed only in the alert JSON (not stored in batch summaries) to keep the summary struct focused on raw counts & averages.
Ideas (see improvement doc) include anomaly flagging, adaptive sampling, rotating logs, exporting Prometheus metrics.
Expand repository structure
- `src/main.go`: Entry point - `src/monitor/monitor.go`: Monitoring logic - `src/types/types.go`: Type definitions - `sites.jsonc`: List of sites to monitorExpand platform portability
The collector is designed to run on Linux, macOS, and Windows with graceful degradation:
Linux (full detail):
- Load averages from
/proc/loadavg
- Uptime from
/proc/uptime
- Kernel version from
/proc/sys/kernel/osrelease
- Container detection via
/.dockerenv
and/proc/1/cgroup
macOS / Windows / other:
- Load averages omitted (fields absent)
- Uptime approximated using process start time (duration since program launch)
- Kernel version reported as
<GOOS>-unknown
- Container detection falls back to
false
(no Linux cgroup inspection)
Common (all platforms):
- Local outbound IP discovered via a short UDP dial to
8.8.8.8:80
(no packets exchanged beyond socket metadata) - Default interface derived by matching the local IP to enumerated interfaces (may be blank if not resolvable)
- Connection type heuristic (wifi vs ethernet) infers from interface name prefixes (
wl*
,wlan*
,wifi
,ath
, etc.); may returnunknown
if pattern not matched
Result meta object always includes only the fields successfully collected on the current platform to avoid placeholder or misleading values.
Two scheduling strategies exist for collection mode:
-
IP Fanout (default
--ip-fanout=true
):- All sites are DNS-resolved first; IPs may be limited by
--max-ips-per-site
. - A task list of (site, IP) pairs is created then shuffled (Fisher–Yates) to distribute multi-IP sites across workers.
- Each task enforces its own
--site-timeout
budget. - Debug mode (
--log-level debug
) prints task order before & after shuffle. - Progress line example:
Meaning: 3 busy of 8 workers, 42 tasks not yet started, 10 finished out of 55 total per-IP tasks.
[iteration 1 progress] workers_busy=3/8 remaining=42 done=10/55 active=[ExampleSite(203.0.113.10),Another(2001:db8::1),Third(198.51.100.7)]
- All sites are DNS-resolved first; IPs may be limited by
-
Sequential Site (
--ip-fanout=false
):- Each worker processes a whole site sequentially over its selected IP list.
- Optional inline short-DNS (1s) shows first IPs in progress lines when
--progress-resolve-ip
is true. - Remaining / done counts refer to sites, not (site,IP) tasks.
Accurate remaining/done counters are maintained via atomic counters (not len(channel)
which is always 0 for unbuffered channels).
Use these progress signals to identify stalls or pathological sites (e.g. one site repeatedly occupying a worker due to many slow IPs – mitigate via --max-ips-per-site
or enabling fanout).
The tool now records ONLY per-family public IP information (unified public_ip_candidates
/ public_ip_consensus
have been removed):
public_ipv4_candidates
/public_ipv4_consensus
public_ipv6_candidates
/public_ipv6_consensus
Consensus is the most frequently observed address in its candidate list (simple frequency tally). If only one address is fetched, it becomes the consensus. Absence of a list means no successful discovery for that family within the timeout.
Example extraction with jq
:
# Last line IPv4 consensus
tail -n 1 monitor_results.jsonl | jq '.meta.public_ipv4_consensus'
# Last line IPv6 consensus
tail -n 1 monitor_results.jsonl | jq '.meta.public_ipv6_consensus'
# Show both candidate arrays (safely handle nulls)
tail -n 1 monitor_results.jsonl | jq '{v4: .meta.public_ipv4_candidates, v6: .meta.public_ipv6_candidates}'
# Filter only runs where both families were detected
grep -F 'public_ipv4_consensus' monitor_results.jsonl | grep -F 'public_ipv6_consensus' | tail -n 5 | jq '.meta | {time: .timestamp_utc, v4: .public_ipv4_consensus, v6: .public_ipv6_consensus}'
Rationale: downstream processing often needs fast separation of IPv4 vs IPv6 without post-filtering a mixed array. Removing the legacy unified fields eliminates duplication and ambiguity when both families are present.
If you need consistent cross-platform fields, post-process by adding defaults where absent (e.g. set load_avg_*
to null explicitly in downstream tooling).
Expand GeoIP details
This project supports both modern and legacy GeoIP databases for country verification:
-
Modern (GeoLite2 MMDB):
- Uses the MaxMind GeoLite2-Country.mmdb file (free, but requires registration).
- Place the file at
/usr/share/GeoIP/GeoLite2-Country.mmdb
or update the path in the code. - The Go code uses the
github.com/oschwald/geoip2-golang
package.
-
Legacy (GeoIP.dat):
- Uses the legacy MaxMind GeoIP.dat and GeoIPv6.dat files.
- Place the files at
/usr/share/GeoIP/GeoIP.dat
and/usr/share/GeoIP/GeoIPv6.dat
. - The Go code uses the
github.com/oschwald/geoip
package as a fallback if the modern database is not available.
How it works:
- For each monitored site, the tool resolves the IP address and attempts to look up the country using the modern database first.
- If the modern database is unavailable, it falls back to the legacy database.
- Both the configured country (from your input) and the detected country (from GeoIP) are logged in the results.
Setup:
- Download the GeoLite2-Country.mmdb from MaxMind (https://dev.maxmind.com/geoip/geolite2-free-geolocation-data).
- Download the legacy GeoIP.dat from MaxMind or other sources if needed.
- Place the files in
/usr/share/GeoIP/
or update the code to use your preferred path. - Install the required Go packages:
go get github.com/oschwald/geoip2-golang go get github.com/oschwald/geoip
To use GeoIP2 (GeoLite2) databases for country verification, you can install and update them using Ubuntu packages:
-
Install the geoipupdate tool:
sudo apt install geoipupdate
-
Configure your MaxMind account and license key in
/etc/GeoIP.conf
(required for database downloads).- Register for a free MaxMind account at https://www.maxmind.com/en/geolite2/signup
- Add your account details and license key to
/etc/GeoIP.conf
.
-
Download and update the databases:
sudo geoipupdate
-
The databases (e.g.,
GeoLite2-Country.mmdb
) will be placed in/usr/share/GeoIP/
by default.
Your Go code will then be able to use these databases for GeoIP lookups.
Result:
- The output JSONL file will show both the configured and detected country for each site, helping you verify and troubleshoot geographic routing and CDN issues.
Common issues & resolutions
Symptom | Likely Cause | Resolution |
---|---|---|
"no records" error | Empty results file or schema_version mismatch | Run with --analyze-only=false to collect; ensure monitor.SchemaVersion matches lines |
Extended metrics all zero | Missing speed_analysis (errors or aborted transfers) |
Inspect recent lines; reduce errors; ensure successful GETs |
High error rate alert | Real network failures or strict thresholds | View last 20 errors: `grep -E 'tcp_error |
GeoIP fields empty | GeoLite2 DB not installed | Install via geoipupdate ; verify path /usr/share/GeoIP/GeoLite2-Country.mmdb |
Slow analysis | Very large results file | Limit with --analysis-batches ; rotate/archive old lines |
Memory concern | Many recent batches retained | Lower --analysis-batches (default 10) |
HEAD slower than GET (ratio >1) | Proxy / caching anomaly | Prefer the split metrics (enterprise_proxy_rate_pct , server_proxy_rate_pct ); proxy_suspected_rate_pct remains available for compatibility. Also inspect headers (Via , X-Cache ). |
Sudden p99/p50 spike | Bursty traffic or fewer samples | Validate sample count; look for plateau instability |
ip_mismatch_rate_pct spike | CDN path shift or proxy insertion | Compare ASN/org; correlate with enterprise_proxy_rate_pct / server_proxy_rate_pct (or legacy proxy_suspected_rate_pct ). |
Warm cache suspected w/ low cache hit rate | HEAD path cached, object not | Inspect header_age / x-cache and warm HEAD timings |
Need verbose analysis | Want batch grouping debug | ANALYSIS_DEBUG=1 go run ./src/main.go |
Need baseline only | Old data noisy | Move or delete results file; collect fresh single batch |
All speeds zero for tiny object | Sub-ms transfer (older versions) | Update to latest (uses high-res timing) or test with larger asset |
Quick focused test rerun:
go test ./src/analysis -run TestExtendedRateAndSlopeMetrics -count=1
See CHANGELOG.md
for a summary of recent changes across the analyzer and viewer.