Published public DNS and DoH resolver datasets, generated automatically from the crawler submodule.
This repository is the data/output repo. The crawler code lives in the submodule; this repo stores the generated assets. CSV exports are not currently generated by the crawler.
json/- published JSON exports from the crawler (accepted.json,validated.json)txt/- published text/config exports from the crawler (resolvers.txt,resolvers-doh.txt,dnsdist.conf,unbound-forward.conf)probe-corpus/- generated probe definitions used for validationmeta/- build metadata and historical run memory (history.duckdb)crawler/- git submodule with the generator/crawler code
Data is refreshed by GitHub Actions:
- daily on a schedule
- manually via the Actions tab
The workflow:
discover-and-split: checks out this repo and thecrawlersubmodule, generates and validates the probe corpus, discovers candidates, applies historical quarantine, and writes 10 shard inputsvalidate-shards: runs a 10-job matrix where each VM validates one shard with configurable per-VM validation parallelismmerge-and-publish: merges validated shards, materializes generic output files, updatesmeta/history.duckdb, regenerates the README stats section, and commits changes
- Latest run:
2026-04-07(run_id=local-sonar-20260407T203533Z) - Runs tracked:
1 - Latest totals:
7635accepted,2044candidate,113444rejected,0filtered - 30-day trend: accepted
+0, rejected+0 - Currently quarantined DNS hosts:
0
| Reason | Count |
|---|---|
timeout_or_error |
111238 |
timeout_rate_high |
105298 |
no_latency_samples |
102881 |
udp_only |
57453 |
latency_moderate |
7519 |
Hosts that are rejected for 14 consecutive daily runs are quarantined for 90 days before they are tested again.
Resolver candidates are gathered by the crawler submodule from:
- public-dns.info nameserver CSV: https://public-dns.info/nameservers.csv
- curl wiki DoH list: https://raw.githubusercontent.com/wiki/curl/curl/DNS-over-HTTPS.md
- AdGuard DNS providers list: https://raw.githubusercontent.com/AdguardTeam/KnowledgeBaseDNS/master/docs/general/dns-providers.md
- local manual seeds:
crawler/configs/manual-dns.txtcrawler/configs/manual-doh.toml
Consume the generated files directly from this repository.
For reproducible use, pin to a specific commit instead of following the latest repository state.
json/accepted.json- resolvers that passed validation strongly enough to be accepted for normal usejson/candidate.json- resolvers that are reachable but did not score high enough for accepted statusjson/rejected.json- resolvers that failed validation; only failed probes are retained, andall_probes_failedis set when every probe failedjson/filtered.json- candidates removed before validation, for example duplicates, invalid endpoints, low source reliability, or historical quarantinetxt/resolvers.txt- accepted plain DNS resolvers only, ashost:porttxt/resolvers-doh.txt- accepted DoH resolvers only, as HTTPS endpoint URLstxt/dnsdist.conf- dnsdist backend config for non-rejected resolvers, including candidate backends that dnsdist can still health-checktxt/unbound-forward.conf- accepted plain DNS forward-zone config for Unbound
Large JSON files may be split into *.part-XXXX.json files to stay below repository limits. When that happens, the part files together replace the unsplit file.
Each resolver receives a composite score (0-100) based on four weighted components:
| Component | Weight | Description |
|---|---|---|
| Correctness | 0-50 | DNS/TLS errors, answer mismatches, NXDOMAIN spoofing |
| Availability | 0-20 | Probe success rate (100% = 20 pts) |
| Performance | 0-20 | Latency penalties for p50, p95, and jitter |
| History | 0-10 | Stability rewards, flapping/failure penalties |
Score caps may be applied:
- 0-2 runs observed: max 90
- 3-6 runs: max 95
- 7-13 runs: max 98
- 14+ runs: no cap
Severe correctness issues (NXDOMAIN spoofing, TLS mismatch, answer mismatch) cap scores at ≤59.
Confidence score (0-100) reflects measurement certainty separately from quality. It considers probe count, latency samples, historical observations, and source reliability metadata.
Each resolver entry includes:
{
"status": "accepted",
"score": 87,
"score_breakdown": {
"correctness": 50,
"availability": 18,
"performance": 12,
"history": 7
},
"confidence_score": 65,
"score_caps_applied": ["insufficient_history"],
"derived_metrics": {
"p50_latency_ms": 45.2,
"p95_latency_ms": 120.5,
"jitter_ms": 75.3,
"latency_sample_count": 10,
"runs_seen_30d": 5,
"runs_seen_7d": 3,
"flaps_30d": 0,
"consecutive_success_days": 5,
"consecutive_fail_days": 0
},
"reasons": ["latency_high"]
}git submodule update --init --recursive
cd crawler
uv sync --group dev
uv run resolver-inventory generate-probe-corpus \
--config configs/probe-corpus.toml \
--output ../probe-corpus
uv run resolver-inventory validate-probe-corpus \
--config configs/probe-corpus.toml \
--input ../probe-corpus/probe-corpus.json
uv run resolver-inventory refresh \
--config configs/default.toml \
--probe-corpus ../probe-corpus/probe-corpus.json \
--output ../_buildFor server-side end-to-end runs without GitHub Actions stages, use scripts/local-deploy.sh. It supports per-run overrides such as --validation-parallelism 12 and --validate-jobs 10.
Large JSON outputs can also be chunked with --split-json-max-bytes (default in local deploy is 100000000 bytes).
Example:
bash scripts/local-deploy.sh \
--validation-parallelism 8 \
--validate-jobs 4