Down is a utility tool for streaming, flexible and safe downloading of remote
files. It can use open-uri + Net::HTTP, http.rb or HTTPX as the backend
HTTP library.
gem "down", "~> 5.0"The primary method is Down.download, which downloads the remote file into a
Tempfile:
require "down"
tempfile = Down.download("http://example.com/nature.jpg")
tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>The returned Tempfile has some additional attributes extracted from the
response data:
tempfile.content_type #=> "text/plain"
tempfile.original_filename #=> "document.txt"
tempfile.charset #=> "utf-8"When you're accepting URLs from an outside source, it's a good idea to limit
the filesize (because attackers want to give a lot of work to your servers).
Down allows you to pass a :max_size option:
Down.download("http://example.com/image.jpg", max_size: 5 * 1024 * 1024) # 5 MB
# Down::TooLarge: file is too large (max is 5MB)What is the advantage over simply checking size after downloading? Well, Down
terminates the download very early, as soon as it gets the Content-Length
header. And if the Content-Length header is missing, Down will terminate the
download as soon as the downloaded content surpasses the maximum size.
By default the remote file will be downloaded into a temporary location and
returned as a Tempfile. If you would like the file to be downloaded to a
specific location on disk, you can specify the :destination option:
Down.download("http://example.com/image.jpg", destination: "/path/to/destination")
#=> nilIn this case Down.download won't have any return value, so if you need a File
object you'll have to create it manually.
You can also keep the tempfile, but override the extension:
tempfile = Down.download("http://example.com/some/file", extension: "txt")
File.extname(tempfile.path) #=> ".txt"You can also override the default tempfile prefix:
tempfile = Down.download("http://example.com/image.jpg", tempfile_name: "custom-prefix")
File.basename(tempfile.path) #=> "custom-prefix20150925-55456-z7vxqz.jpg"Down.download and Down.open will automatically detect and apply HTTP basic
authentication from the URL:
Down.download("http://user:[email protected]")
Down.open("http://user:[email protected]")Down.download supports :content_length_proc, which gets called with the
value of the Content-Length header as soon as it's received, and
:progress_proc, which gets called with current filesize whenever a new chunk
is downloaded.
Down.download "http://example.com/movie.mp4",
content_length_proc: -> (content_length) { ... },
progress_proc: -> (progress) { ... }Down has the ability to retrieve content of the remote file as it is being
downloaded. The Down.open method returns a Down::ChunkedIO object which
represents the remote file on the given URL. When you read from it, Down
internally downloads chunks of the remote file, but only how much is needed.
remote_file = Down.open("http://example.com/image.jpg")
remote_file.size # read from the "Content-Length" header
remote_file.read(1024) # downloads and returns first 1 KB
remote_file.read(1024) # downloads and returns next 1 KB
remote_file.eof? #=> false
remote_file.read # downloads and returns the rest of the file content
remote_file.eof? #=> true
remote_file.close # closes the HTTP connection and deletes the internal TempfileThe following IO methods are implemented:
#read&#readpartial#gets#seek#pos&#tell#eof?#rewind#close
By default the downloaded content is internally cached into a Tempfile, so
that when you rewind the Down::ChunkedIO, it continues reading the cached
content that it had already retrieved.
remote_file = Down.open("http://example.com/image.jpg")
remote_file.read(1*1024*1024) # downloads, caches, and returns first 1MB
remote_file.rewind
remote_file.read(1*1024*1024) # reads the cached content
remote_file.read(1*1024*1024) # downloads the next 1MBIf you want to save on IO calls and on disk usage, and don't need to be able to
rewind the Down::ChunkedIO, you can disable caching downloaded content:
Down.open("http://example.com/image.jpg", rewindable: false)You can also yield chunks directly as they're downloaded via #each_chunk, in
which case the downloaded content is not cached into a file regardless of the
:rewindable option.
remote_file = Down.open("http://example.com/image.jpg")
remote_file.each_chunk { |chunk| ... }
remote_file.closeYou can access the response status and headers of the HTTP request that was made:
remote_file = Down.open("http://example.com/image.jpg")
remote_file.data[:status] #=> 200
remote_file.data[:headers] #=> { "Content-Type" => "image/jpeg", ... } (header names are normalized)
remote_file.data[:response] # returns the response objectNote that a Down::ResponseError exception will automatically be raised if
response status was 4xx or 5xx.
The Down.open performs HTTP logic and returns an instance of
Down::ChunkedIO. However, Down::ChunkedIO is a generic class that can wrap
any kind of streaming. It accepts an Enumerator that yields chunks of
content, and provides IO-like interface over that enumerator, calling it
whenever more content is needed.
require "down/chunked_io"
Down::ChunkedIO.new(...):chunks–Enumeratorthat yields chunks of content:size– size of the file if it's known (returned by#size):on_close– called when streaming finishes or IO is closed:data- custom data that you want to store (returned by#data):rewindable- whether to cache retrieved data into a file (defaults totrue):encoding- force content to be returned in specified encoding (defaults toEncoding::BINARY)
Here is an example of creating a streaming IO of a MongoDB GridFS file:
require "down/chunked_io"
mongo = Mongo::Client.new(...)
bucket = mongo.database.fs
content_length = bucket.find(_id: id).first[:length]
stream = bucket.open_download_stream(id)
io = Down::ChunkedIO.new(
size: content_length,
chunks: stream.enum_for(:each),
on_close: -> { stream.close },
)Down tries to recognize various types of exceptions and re-raise them as one of
the Down::Error subclasses. This is Down's exception hierarchy:
Down::ErrorDown::TooLargeDown::InvalidUrlDown::TooManyRedirectsDown::NotModifiedDown::ResponseErrorDown::ClientErrorDown::NotFound
Down::ServerError
Down::ConnectionErrorDown::TimeoutErrorDown::SSLError
The following backends are available:
- Down::NetHttp (default)
- Down::Http
- Down::Httpx
You can use the backend directly:
require "down/net_http"
Down::NetHttp.download("...")
Down::NetHttp.open("...")Or you can set the backend globally (default is :net_http):
require "down"
Down.backend :http # use the Down::Http backend
Down.download("...")
Down.open("...")The Down::NetHttp backend implements downloads using open-uri and
Net::HTTP standard libraries.
gem "down", "~> 5.0"require "down/net_http"
tempfile = Down::NetHttp.download("http://nature.com/forest.jpg")
tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
io = Down::NetHttp.open("http://nature.com/forest.jpg")
io #=> #<Down::ChunkedIO ...>Down::NetHttp.download is implemented as a wrapper around open-uri, and fixes
some of open-uri's undesired behaviours:
- uses
URI::HTTP#openorURI::HTTPS#opendirectly for security - always returns a
Tempfileobject, whereas open-uri returnsStringIOwhen file is smaller than 10KB - gives the extension to the
Tempfileobject from the URL - allows you to limit maximum number of redirects
On the other hand Down::NetHttp.open is implemented using Net::HTTP directly,
as open-uri doesn't support downloading on-demand.
Down::NetHttp#download turns off open-uri's following redirects, as open-uri
doesn't have a way to limit the maximum number of hops, and implements its own.
By default maximum of 2 redirects will be followed, but you can change it via
the :max_redirects option:
Down::NetHttp.download("http://example.com/image.jpg") # 2 redirects allowed
Down::NetHttp.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
Down::NetHttp.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
Down::NetHttp.open("http://example.com/image.jpg") # 2 redirects allowed
Down::NetHttp.open("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
Down::NetHttp.open("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowedAn HTTP proxy can be specified via the :proxy option:
Down::NetHttp.download("http://example.com/image.jpg", proxy: "http://proxy.org")
Down::NetHttp.open("http://example.com/image.jpg", proxy: "http://user:[email protected]")Timeouts can be configured via the :open_timeout and :read_timeout options:
Down::NetHttp.download("http://example.com/image.jpg", open_timeout: 5)
Down::NetHttp.open("http://example.com/image.jpg", read_timeout: 10)Request headers can be added via the :headers option:
Down::NetHttp.download("http://example.com/image.jpg", headers: { "Header" => "Value" })
Down::NetHttp.open("http://example.com/image.jpg", headers: { "Header" => "Value" })The :ssl_ca_cert and :ssl_verify_mode options are supported, and they have
the same semantics as in open-uri:
Down::NetHttp.open("http://example.com/image.jpg",
ssl_ca_cert: "/path/to/cert",
ssl_verify_mode: OpenSSL::SSL::VERIFY_PEER)If the URL isn't parseable by URI.parse, Down::NetHttp will
attempt to normalize the URL using Addressable::URI, URI-escaping
any potentially unescaped characters. You can change the normalizer
via the :uri_normalizer option:
# this skips URL normalization
Down::NetHttp.download("http://example.com/image.jpg", uri_normalizer: -> (url) { url })Any additional options passed to Down.download will be forwarded to
open-uri, so you can for example add basic authentication or a timeout:
Down::NetHttp.download "http://example.com/image.jpg",
http_basic_authentication: ['john', 'secret'],
read_timeout: 5You can also initialize the backend with default options:
net_http = Down::NetHttp.new(open_timeout: 3)
net_http.download("http://example.com/image.jpg")
net_http.open("http://example.com/image.jpg")The Down::Http backend implements downloads using the http.rb gem.
gem "down", "~> 5.0"
gem "http", "~> 5.0"require "down/http"
tempfile = Down::Http.download("http://nature.com/forest.jpg")
tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
io = Down::Http.open("http://nature.com/forest.jpg")
io #=> #<Down::ChunkedIO ...>Some features that give the http.rb backend an advantage over open-uri and
Net::HTTP include:
- Low memory usage (10x less than
open-uri/Net::HTTP) - Proper SSL support
- Support for persistent connections
- Global timeouts (limiting how long the whole request can take)
- Chainable builder API for setting default options
All additional options will be forwarded to HTTP::Client#request:
Down::Http.download("http://example.org/image.jpg", headers: { "Foo" => "Bar" })
Down::Http.open("http://example.org/image.jpg", follow: { max_hops: 0 })However, it's recommended to configure request options using http.rb's chainable API, as it's more convenient than passing raw options.
Down::Http.open("http://example.org/image.jpg") do |client|
client.timeout(connect: 3, read: 3)
endYou can also initialize the backend with default options:
http = Down::Http.new(headers: { "Foo" => "Bar" })
# or
http = Down::Http.new { |client| client.timeout(connect: 3) }
http.download("http://example.com/image.jpg")
http.open("http://example.com/image.jpg")By default Down::Http makes a GET request to the specified endpoint, but you
can specify a different request method using the :method option:
Down::Http.download("http://example.org/image.jpg", method: :post)
Down::Http.open("http://example.org/image.jpg", method: :post)
down = Down::Http.new(method: :post)
down.download("http://example.org/image.jpg")The Down::Httpx backend implements downloads using the HTTPX gem, which
supports the HTTP/2 protocol, in addition to many other features.
gem "down", "~> 5.0"
gem "httpx", "~> 1.0"require "down/httpx"
tempfile = Down::Httpx.download("http://nature.com/forest.jpg")
tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
io = Down::Httpx.open("http://nature.com/forest.jpg")
io #=> #<Down::ChunkedIO ...>It's implemented in much of the same way as Down::Http, so be sure to check
its docs for ways to pass additional options.
Tests require that a httpbin server is running locally, which you can do via Docker:
$ docker pull kennethreitz/httpbin
$ docker run -p 80:80 kennethreitz/httpbinThen you can run tests:
$ bundle exec rake test