Skip to content

Decoding of response header "location" is incorrect #1750

Open
@bblack

Description

@bblack

Unfortunately I'm using this library from JRuby, so either that layer will turn out to be responsible for my issue, or it will add a bit of difficulty in tracking things down.

Problem: I'm making a request. The response is 301 with a location header that contains non-latin characters (specifically, Thai). Something (either AHC, or some lib it uses, or something in the JRuby-java translation) is returning the header as an improperly decoded string.

If on the RequestBuilder instance I've called setFollowRedirect(false), I can see the improperly-decoded Location header:

java_import 'org.asynchttpclient.AsyncCompletionHandler'
java_import 'org.asynchttpclient.DefaultAsyncHttpClient'
java_import 'org.asynchttpclient.DefaultAsyncHttpClientConfig'
java_import 'org.asynchttpclient.DefaultAsyncHttpClientConfig$Builder'

class Handler < AsyncCompletionHandler
  def initialize(options={})
    super()

    @on_throwable = options[:on_throwable]
    @on_headers_received = options[:on_headers_received]
    @on_completed = options[:on_completed]
  end

  def onThrowable(exception)
    @on_throwable.call(exception)
  end

  def onHeadersReceived(headers)
    @on_headers_received.call(headers)
  end

  def onCompleted(netty_response)
    @on_completed.call(netty_response)
  end
end

method = 'GET'
url = 'https://www.facebook.com/100005359001746'
request_builder = RequestBuilder.new(method).
  setUrl(url).
  setFollowRedirect(false)
request = request_builder.build
handler = Handler.new(
  {
    on_throwable: -> (e) { puts "throwable: #{e}" },
    on_headers_received: -> (h) {
      puts "headers_received: #{h}"
      location = h.get('Location')
      puts "Location: #{location} (encoding: #{location.encoding})"
    },
    on_completed: -> (res) { puts "completed: #{res.inspect}" }
  }
)
builder = DefaultAsyncHttpClientConfig::Builder.new()
client = DefaultAsyncHttpClient.new(builder.build())

client.executeRequest(request, handler)

which outputs:

headers_received: DefaultHttpHeaders[Location: https://www.facebook.com/people/प�रव�श-��मार-सनातन�/100005359001746, Strict-Transport-Security: max-age=15552000; preload, Content-Type: text/html; charset="utf-8", X-FB-Debug: o8uQCGiu/Fq/ixVADK5b7YmL4laDuTKy4hT8CkC5SEc0UE5/shmYrFIsoySCMppLvJfrEtOPbjpDTOziteOyJw==, Date: Tue, 01 Dec 2020 17:15:08 GMT, Alt-Svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600, Connection: keep-alive, Content-Length: 0]
Location: https://www.facebook.com/people/प�रव�श-��मार-सनातन�/100005359001746 (encoding: UTF-8)
completed: #<Java::OrgAsynchttpclientNetty::NettyResponse:0x10cbb284>

If instead, I've called the RequestBuilder's setFollowRedirect(true), I get onThrowable called with a MaxRedirectException. The reason is that in the case of this URL, when the request for the poorly-encoded URL is received by the server, it responds with a 301 redirect to the properly-encoded URL, whereupon the same misinterpetation of the encoding is made, and an infinite loop of "request wrong URL; get redirected to right URL; misinterpret as wrong URL" occurs.

I can use curl to inspect the behavior of the server in fulfilling this request, to see that the redirect is taken successfully there:

$ curl -I -L --include "https://www.facebook.com/100016673803484"
HTTP/2 301
location: https://www.facebook.com/people/अभिषेक-सिंह/100016673803484
strict-transport-security: max-age=15552000; preload
content-type: text/html; charset="utf-8"
x-fb-debug: tcsOzDNjy2Y/SoHPvIsDfI1eEumptXWlOXlFHbxspemPIl+0B7YS7hzHg02uq+/+fFuVCNpIX6t/QGOmUprDxw==
content-length: 0
date: Tue, 01 Dec 2020 17:04:52 GMT
alt-svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600

HTTP/2 200

And likewise in Chrome.

I'm open to the possibility that the server at www.facebook.com isn't respecting some restriction on charsets allowed for the Location header (I'm not familiar with those RFCs), in which case:

  • it would be nice to have this library decode the header in a way that curl and my browser lead me to expect
  • it would be nice to have this library allow me to specify some option to decode the header in a way that curl and my browser lead me to expect

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions