Streamed HTML parsing + content sniffing

## Summary
Adopt streamed parsing for HTML to reduce allocations, and do early content-type sniffing to skip binary/large content unless configured.

## Motivation
- Lower memory usage during large crawls
- Skip non-HTML payloads by default

## Scope
- `internal/parse`:
  - Streaming parse (`net/html` and/or `goquery` on a `Reader`)
  - Extract absolute links (respect `base` tags)
  - Sniff Content-Type + size guardrails
- Config flag to allow binary downloads

## Acceptance Criteria
- Heap profile shows fewer allocations vs baseline
- Tests cover: base href, meta refresh, unusual encodings

## Tasks
- [ ] Implement streamed extraction
- [ ] Add content-type guards
- [ ] Unit tests with fixture pages


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streamed HTML parsing + content sniffing #77

Summary

Motivation

Scope

Acceptance Criteria

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streamed HTML parsing + content sniffing #77

Description

Summary

Motivation

Scope

Acceptance Criteria

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions