Skip to content

Commit 2e29385

Browse files
skeetoclaude
andcommitted
URLs: percent-encode non-ASCII bytes before launching the browser
URLs with non-ASCII characters (Cyrillic Reddit thread titles, CJK paths, accented hostnames) were passed verbatim through wxLaunchDefaultBrowser, which on macOS routes through NSURL or the `open` command — both want a strict RFC 3986 URI with ASCII only. The raw UTF-8 bytes failed validation with EILSEQ ("Error 92: Illegal byte sequence") and the browser never opened. The same URL pasted from the clipboard worked because browsers do the IRI-to-URI dance themselves when reading an address-bar input. iri_to_uri() in util walks the string and percent-encodes any byte >= 0x80 as %XX. ASCII passes through unchanged, which includes the URL-reserved set (/, ?, #, &, =) so the URL structure is preserved, and includes '%' so already-encoded URLs are idempotent on a second pass. Applied at every wxLaunchDefaultBrowser call site: * action_open_in_browser (`b` from the listing) * FeedsPanel right-click → Open in browser * EntryDetail's link-label click * EntryDetail's wxHtmlWindow link-click handler (inline links inside the preview body) Clipboard paths are untouched — action_copy_link still yanks the original IRI so the user can paste it wherever they want in the original Unicode shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1c9088a commit 2e29385

5 files changed

Lines changed: 43 additions & 4 deletions

File tree

src/entry_detail.cpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,9 @@ EntryDetail::EntryDetail(wxWindow *parent, Elfeed *app)
6868
// Route every link click to the system default browser.
6969
body_->Bind(wxEVT_HTML_LINK_CLICKED,
7070
[](wxHtmlLinkEvent &e) {
71-
wxLaunchDefaultBrowser(e.GetLinkInfo().GetHref());
71+
wxLaunchDefaultBrowser(wxString::FromUTF8(
72+
iri_to_uri(e.GetLinkInfo().GetHref()
73+
.utf8_string())));
7274
});
7375
// Re-render on system theme switch (light ↔ dark) so the
7476
// preview's <body> wrapper picks up the new system colors.
@@ -109,7 +111,8 @@ EntryDetail::EntryDetail(wxWindow *parent, Elfeed *app)
109111
void EntryDetail::on_link_click(wxMouseEvent &)
110112
{
111113
if (!link_url_.empty())
112-
wxLaunchDefaultBrowser(wxString::FromUTF8(link_url_));
114+
wxLaunchDefaultBrowser(
115+
wxString::FromUTF8(iri_to_uri(link_url_)));
113116
}
114117

115118
void EntryDetail::focus_body()

src/feeds_panel.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ void FeedsPanel::on_context_menu(wxDataViewEvent &e)
232232
// happen again.
233233
const std::string &target =
234234
r.canonical_url.empty() ? r.url : r.canonical_url;
235-
wxLaunchDefaultBrowser(wxString::FromUTF8(target));
235+
wxLaunchDefaultBrowser(wxString::FromUTF8(iri_to_uri(target)));
236236
}
237237
}
238238

src/main_frame.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1443,7 +1443,8 @@ void MainFrame::action_open_in_browser()
14431443
if (i < 0 || (size_t)i >= app_->entries.size()) continue;
14441444
Entry &e = app_->entries[(size_t)i];
14451445
if (!e.link.empty())
1446-
wxLaunchDefaultBrowser(wxString::FromUTF8(e.link));
1446+
wxLaunchDefaultBrowser(
1447+
wxString::FromUTF8(iri_to_uri(e.link)));
14471448
// Opening in the browser counts as reading it — match 'r'.
14481449
strip_unread(e, app_);
14491450
list_->refresh_row(i);

src/util.cpp

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,26 @@ std::string disambiguate_path(const std::string &dir,
314314
return path;
315315
}
316316

317+
std::string iri_to_uri(const std::string &iri)
318+
{
319+
std::string out;
320+
out.reserve(iri.size());
321+
for (unsigned char c : iri) {
322+
if (c < 0x80) {
323+
// ASCII — pass through. Includes `%`, so an already-
324+
// percent-encoded URL survives a round-trip unchanged.
325+
out += (char)c;
326+
} else {
327+
// Non-ASCII byte (always part of a UTF-8 multibyte
328+
// sequence in our inputs). Encode each byte as %XX.
329+
char buf[4];
330+
std::snprintf(buf, sizeof(buf), "%%%02X", c);
331+
out += buf;
332+
}
333+
}
334+
return out;
335+
}
336+
317337
// Compute UTC epoch seconds from Gregorian (Y, M, D, h, m, s) without
318338
// any platform's timegm(). Uses std::chrono's civil-calendar routines,
319339
// which are guaranteed UTC and need no timezone data.

src/util.hpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,21 @@ std::string disambiguate_path(const std::string &dir,
150150
const std::string &base,
151151
const std::string &ext);
152152

153+
// ---- URL handling ----
154+
155+
// Convert an IRI (RFC 3987 — URL containing UTF-8 multibyte
156+
// characters, e.g. a Reddit /r/foo/comments/.../модифицированный)
157+
// into a RFC 3986 URI by percent-encoding every byte ≥ 0x80.
158+
// ASCII bytes (including the URL-reserved set: %, /, ?, #, &, etc.)
159+
// pass through unchanged, which keeps the URL structure intact and
160+
// makes the operation idempotent on already-encoded input.
161+
//
162+
// wxLaunchDefaultBrowser on macOS routes through NSURL / `open`,
163+
// which insist on ASCII-only URIs. Most browsers do the IRI-to-URI
164+
// dance themselves when you paste a URL, but the platform-launch
165+
// APIs we hand them are stricter.
166+
std::string iri_to_uri(const std::string &iri);
167+
153168
// ---- Feed date parsing ----
154169

155170
// Parse an ISO 8601 date/datetime. Fields without a timezone are

0 commit comments

Comments
 (0)