Skip to content

parse ABB info hash from detail rows#53

Merged
JeremiahM37 merged 1 commit into
JeremiahM37:mainfrom
sgerner:codex/abb-infohash-parser
Jun 2, 2026
Merged

parse ABB info hash from detail rows#53
JeremiahM37 merged 1 commit into
JeremiahM37:mainfrom
sgerner:codex/abb-infohash-parser

Conversation

@sgerner

@sgerner sgerner commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

This keeps AudioBookBay magnet resolution working after ABB changed the detail-page markup.

What changed:

  • Parse the Info Hash value from the detail table row instead of relying on a same-line regex.
  • Add regression coverage for the row-based HTML shape ABB currently serves.

Validation:

  • go test ./internal/search -run 'TestExtractABBInfoHash|TestResolveABBMagnetUsesInfoHashRow'

@sgerner sgerner marked this pull request as ready for review June 1, 2026 00:35
@JeremiahM37 JeremiahM37 self-assigned this Jun 1, 2026

@JeremiahM37 JeremiahM37 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to goquery row parsing is a nice upgrade over the same-line regex. One concern about removing the old path entirely — see inline.

@@ -206,3 +203,29 @@ func ResolveABBMagnet(ctx context.Context, client *http.Client, userAgent, abbPa
}
return "", fmt.Errorf("failed to resolve ABB magnet from all domains")
}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep the old same-line regex as a fallback after this? If any ABB mirror still serves the previous shape (Info Hash: ...HASH), this PR breaks that mirror with no recovery path. Try extractABBInfoHash first, then fall back to the old infoHashRe on htmlContent if it returns empty. A test for the old shape would also be worth adding alongside the new one.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Softening this — the diff confirms the old regex is removed, but you say ABB moved off that shape so the fallback is just defensive coding, not a real bug. Treat as a suggestion, not a blocker.

@JeremiahM37 JeremiahM37 dismissed their stale review June 1, 2026 02:39

Downgrading from CHANGES_REQUESTED — the fallback request is defensive engineering, not a verified bug. See reply on the inline.

@JeremiahM37 JeremiahM37 assigned sgerner and unassigned JeremiahM37 Jun 1, 2026
@sgerner sgerner force-pushed the codex/abb-infohash-parser branch from 52caa59 to 1023081 Compare June 1, 2026 04:42
@JeremiahM37 JeremiahM37 merged commit d64acc8 into JeremiahM37:main Jun 2, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants