Skip to content

A lot of pages have some links "screwed". We need a filter to hot-fix these somehow. #15

@Lisias

Description

@Lisias

I found these two URLS on my "ALL" report this month (not meaning they weren't there before, I just noticed them today):

https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/128696-killashley/
https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/42312-alexsheff/

Note the %7B___base_url___%7D substring, that unencoded gives us {___base_url___}. Almost surely is a missing $ after the opening curly braces.

Curious about the issue, and knowing that this kind of issue reproduce like rabbits :P I coded a quick report for all the occurrences on the current (and WIP) WARCs, and boy, I found a lot (note: file in CSV format, ignore anything starting with #): Uploading url_weirdities.csv…

The earliest thread with the problem is 278, and the biggest id is 209425.

'cat url_weirdities.csv | grep -Eo 'https://forum.kerbalspaceprogram.com/index\.php\?/topic/([0-9]+)-' | sed -E 's/^https://forum.kerbalspaceprogram.com/index.php?/topic/(.+?)-$/\1/g' | sort -n | uniq`

Fixing the problem in the WARC file is out of the question (the thing need to be exactly as I fetched them), so we need to find a way to work around these problems.

A filter on the playback machine to detect and fix these will do but, so, we will need a cache to keep the thing responsible - python is not exactly the fastest cookie in the jar.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions