I found these two URLS on my "ALL" report this month (not meaning they weren't there before, I just noticed them today):
https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/128696-killashley/
https://forum.kerbalspaceprogram.com/%7B___base_url___%7D/index.php?/profile/42312-alexsheff/
Note the %7B___base_url___%7D substring, that unencoded gives us {___base_url___}. Almost surely is a missing $ after the opening curly braces.
Curious about the issue, and knowing that this kind of issue reproduce like rabbits :P I coded a quick report for all the occurrences on the current (and WIP) WARCs, and boy, I found a lot (note: file in CSV format, ignore anything starting with #): Uploading url_weirdities.csv…
The earliest thread with the problem is 278, and the biggest id is 209425.
'cat url_weirdities.csv | grep -Eo 'https://forum.kerbalspaceprogram.com/index\.php\?/topic/([0-9]+)-' | sed -E 's/^https://forum.kerbalspaceprogram.com/index.php?/topic/(.+?)-$/\1/g' | sort -n | uniq`
Fixing the problem in the WARC file is out of the question (the thing need to be exactly as I fetched them), so we need to find a way to work around these problems.
A filter on the playback machine to detect and fix these will do but, so, we will need a cache to keep the thing responsible - python is not exactly the fastest cookie in the jar.
I found these two URLS on my "ALL" report this month (not meaning they weren't there before, I just noticed them today):
Note the
%7B___base_url___%7Dsubstring, that unencoded gives us{___base_url___}. Almost surely is a missing$after the opening curly braces.Curious about the issue, and knowing that this kind of issue reproduce like rabbits :P I coded a quick report for all the occurrences on the current (and WIP)
WARCs, and boy, I found a lot (note: file in CSV format, ignore anything starting with#): Uploading url_weirdities.csv…The earliest thread with the problem is
278, and the biggest id is209425.'cat url_weirdities.csv | grep -Eo 'https://forum.kerbalspaceprogram.com/index\.php\?/topic/([0-9]+)-' | sed -E 's/^https://forum.kerbalspaceprogram.com/index.php?/topic/(.+?)-$/\1/g' | sort -n | uniq`
Fixing the problem in the
WARCfile is out of the question (the thing need to be exactly as I fetched them), so we need to find a way to work around these problems.A filter on the playback machine to detect and fix these will do but, so, we will need a cache to keep the thing responsible - python is not exactly the fastest cookie in the jar.