Description
Unfortunately I'm using this library from JRuby, so either that layer will turn out to be responsible for my issue, or it will add a bit of difficulty in tracking things down.
Problem: I'm making a request. The response is 301 with a location header that contains non-latin characters (specifically, Thai). Something (either AHC, or some lib it uses, or something in the JRuby-java translation) is returning the header as an improperly decoded string.
If on the RequestBuilder instance I've called setFollowRedirect(false)
, I can see the improperly-decoded Location header:
java_import 'org.asynchttpclient.AsyncCompletionHandler'
java_import 'org.asynchttpclient.DefaultAsyncHttpClient'
java_import 'org.asynchttpclient.DefaultAsyncHttpClientConfig'
java_import 'org.asynchttpclient.DefaultAsyncHttpClientConfig$Builder'
class Handler < AsyncCompletionHandler
def initialize(options={})
super()
@on_throwable = options[:on_throwable]
@on_headers_received = options[:on_headers_received]
@on_completed = options[:on_completed]
end
def onThrowable(exception)
@on_throwable.call(exception)
end
def onHeadersReceived(headers)
@on_headers_received.call(headers)
end
def onCompleted(netty_response)
@on_completed.call(netty_response)
end
end
method = 'GET'
url = 'https://www.facebook.com/100005359001746'
request_builder = RequestBuilder.new(method).
setUrl(url).
setFollowRedirect(false)
request = request_builder.build
handler = Handler.new(
{
on_throwable: -> (e) { puts "throwable: #{e}" },
on_headers_received: -> (h) {
puts "headers_received: #{h}"
location = h.get('Location')
puts "Location: #{location} (encoding: #{location.encoding})"
},
on_completed: -> (res) { puts "completed: #{res.inspect}" }
}
)
builder = DefaultAsyncHttpClientConfig::Builder.new()
client = DefaultAsyncHttpClient.new(builder.build())
client.executeRequest(request, handler)
which outputs:
headers_received: DefaultHttpHeaders[Location: https://www.facebook.com/people/प�रव�श-��मार-सनातन�/100005359001746, Strict-Transport-Security: max-age=15552000; preload, Content-Type: text/html; charset="utf-8", X-FB-Debug: o8uQCGiu/Fq/ixVADK5b7YmL4laDuTKy4hT8CkC5SEc0UE5/shmYrFIsoySCMppLvJfrEtOPbjpDTOziteOyJw==, Date: Tue, 01 Dec 2020 17:15:08 GMT, Alt-Svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600, Connection: keep-alive, Content-Length: 0]
Location: https://www.facebook.com/people/प�रव�श-��मार-सनातन�/100005359001746 (encoding: UTF-8)
completed: #<Java::OrgAsynchttpclientNetty::NettyResponse:0x10cbb284>
If instead, I've called the RequestBuilder's setFollowRedirect(true)
, I get onThrowable
called with a MaxRedirectException
. The reason is that in the case of this URL, when the request for the poorly-encoded URL is received by the server, it responds with a 301 redirect to the properly-encoded URL, whereupon the same misinterpetation of the encoding is made, and an infinite loop of "request wrong URL; get redirected to right URL; misinterpret as wrong URL" occurs.
I can use curl to inspect the behavior of the server in fulfilling this request, to see that the redirect is taken successfully there:
$ curl -I -L --include "https://www.facebook.com/100016673803484"
HTTP/2 301
location: https://www.facebook.com/people/अभिषेक-सिंह/100016673803484
strict-transport-security: max-age=15552000; preload
content-type: text/html; charset="utf-8"
x-fb-debug: tcsOzDNjy2Y/SoHPvIsDfI1eEumptXWlOXlFHbxspemPIl+0B7YS7hzHg02uq+/+fFuVCNpIX6t/QGOmUprDxw==
content-length: 0
date: Tue, 01 Dec 2020 17:04:52 GMT
alt-svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600
HTTP/2 200
And likewise in Chrome.
I'm open to the possibility that the server at www.facebook.com isn't respecting some restriction on charsets allowed for the Location header (I'm not familiar with those RFCs), in which case:
- it would be nice to have this library decode the header in a way that curl and my browser lead me to expect
- it would be nice to have this library allow me to specify some option to decode the header in a way that curl and my browser lead me to expect