Skip to content

Fix AllTheLyrics false positive detection#2598

Open
SayanDey322 wants to merge 1 commit intosoxoj:mainfrom
SayanDey322:fix-allthelyrics-detection
Open

Fix AllTheLyrics false positive detection#2598
SayanDey322 wants to merge 1 commit intosoxoj:mainfrom
SayanDey322:fix-allthelyrics-detection

Conversation

@SayanDey322
Copy link
Copy Markdown
Contributor

Summary

Fix false positives for AllTheLyrics by switching from status-code detection to message-based detection.

What changed

  • Changed AllTheLyrics from checkType: "status_code" to checkType: "message"
  • Added page content markers for claimed and unclaimed profiles

Why

AllTheLyrics appears to return HTTP 200 for both valid and invalid usernames, which makes status-code detection unreliable and causes false positives.

Switching to message-based detection uses page content instead of HTTP status and prevents incorrect claimed results.

Closes #2574

@soxoj
Copy link
Copy Markdown
Owner

soxoj commented May 3, 2026

Hi @SayanDey322 β€” I'm batching feedback for #2596, #2599, #2600, #2601, #2602, #2604 and #2605, which all follow the same one-line pattern of just flipping disabled: true. In their current shape these PRs aren't mergeable. Three things need to change before I can review them seriously.

1. Disabling is a last resort, not a first response

The most valuable contribution to this project is keeping checks accurate, not pruning them. Before opening a PR that disables a site, please walk the diagnose flow described in CONTRIBUTING.md β€” typically:

maigret --self-check --site "SiteName" --diagnose --use-disabled-sites
python utils/site_check.py --site "SiteName" --diagnose
python utils/site_check.py --site "SiteName" --compare-methods

Most "broken" sites are fixable in 2–10 minutes β€” switching checkType, refreshing presenseStrs/absenceStrs after a redesign, adding tls_fingerprint protection, or updating the URL path. A disabled: true PR with no investigation log (what does the site actually return now? what did you try?) doesn't give a reviewer enough to act on. If the site really is unfixable, the PR description should say what you tried and what the failure mode is (DNS gone, full Cloudflare block with no API, login-walled, soft-404 with no markers, etc.). Without that context the right action is usually to leave the entry alone and let someone else fix it.

2. Every database edit needs to run the pre-commit hook

Editing maigret/resources/data.json without regenerating its sidecar files is a silent-corruption bug, not a cosmetic miss. Two files must stay in sync with data.json:

  • maigret/resources/db_meta.json β€” generated by utils/generate_db_meta.py. It carries the SHA-256 of data.json and is what the client uses to decide whether to pull a fresh database from GitHub. If you change data.json but leave db_meta.json stale, every Maigret install on your branch sees a hash mismatch and the auto-update path misbehaves.
  • sites.md β€” generated by utils/update_site_data.py (exposed as poetry run update_sitesmd). This is the canonical user-facing list of supported sites and statistics.

The repo ships a pre-commit hook that does both regenerations and re-stages the result automatically β€” you only need to enable it once after cloning:

git clone https://github.com/soxoj/maigret && cd maigret
poetry install --with dev
git config --local core.hooksPath .githooks/

The hook itself lives at .githooks/pre-commit β€” please read it once so you know what it does. After this, every git commit that touches data.json will auto-update db_meta.json and sites.md and stage them into the same commit. Don't bypass with --no-verify.

3. Please redo these PRs properly

Concretely, for each of the disable PRs:

  1. Run --self-check --diagnose --use-disabled-sites (or utils/site_check.py) against the target site and paste the relevant output into the PR description.
  2. Try to fix the check first. Open a PR with disabled: true only after a real attempt, and explain in the description what failed and why disabling is the only option.
  3. Enable the hook (step 2 above) and re-commit so db_meta.json and sites.md regenerate. PRs that touch data.json but don't touch those two files will be closed without further review.
  4. Batch related disables into a single PR with the title format the project already uses, e.g. Fix site checks: N fixed, N disabled β€” one PR per broken site is unnecessary churn.

I'd much rather have one well-investigated PR that fixes 3 sites and disables 2 with explanations than seven one-liners. Please update or close-and-resubmit. Thanks for the energy you're putting in β€” let's redirect it into fixes πŸ‘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Maigret bot] False-positive site probe: AllTheLyrics

2 participants