Skip to content

Gerardo's solution #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
source "https://rubygems.org"

gem 'rspec'
gem 'nokogiri'
gem 'ferrum'
47 changes: 47 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
GEM
remote: https://rubygems.org/
specs:
addressable (2.8.7)
public_suffix (>= 2.0.2, < 7.0)
base64 (0.2.0)
concurrent-ruby (1.3.5)
diff-lcs (1.6.1)
ferrum (0.16)
addressable (~> 2.5)
base64 (~> 0.2)
concurrent-ruby (~> 1.1)
webrick (~> 1.7)
websocket-driver (~> 0.7)
nokogiri (1.18.7-x86_64-linux-gnu)
racc (~> 1.4)
public_suffix (6.0.1)
racc (1.8.1)
rspec (3.13.0)
rspec-core (~> 3.13.0)
rspec-expectations (~> 3.13.0)
rspec-mocks (~> 3.13.0)
rspec-core (3.13.3)
rspec-support (~> 3.13.0)
rspec-expectations (3.13.3)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.13.0)
rspec-mocks (3.13.2)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.13.0)
rspec-support (3.13.2)
webrick (1.9.1)
websocket-driver (0.7.7)
base64
websocket-extensions (>= 0.1.0)
websocket-extensions (0.1.5)

PLATFORMS
x86_64-linux

DEPENDENCIES
ferrum
nokogiri
rspec

BUNDLED WITH
2.4.10
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,47 @@ Add also to your array the painting thumbnails present in the result page file (
Test against 2 other similar result pages to make sure it works against different layouts. (Pages that contain the same kind of carrousel. Don't necessarily have to be paintings.)

The suggested time for this challenge is 4 hours. But, you can take your time and work more on it if you want.

## Solution

I tried to keep the solution in its simplest form. It's basically a file and a test file. I added the
Gemfile at the end for convenience.

The instructions only said that it must work for the same kind of carrousel, which is very specific
since other image searches show a different type of carrousel.

My focus was to reproduce the same result as the expected-array.json file and without making any
http requests. And of course the test suite which I want to elaborate a little:

- I think every test should have only 1 expect.
- Every test must have a clear goal, with only 1 scenario.

I think it is better appreciated in the following section:

```ruby
let(:artwork_with_empty_extensions) { result["artworks"].select{|a| a["name"]=="Sunflowers"}.first}
describe 'extensions' do
it "must be an Array, if present" do
expect(artwork["extensions"]).to be_an Array
end

it "can't contain empty Strings" do
expect(artwork["extensions"]).not_to include("")
end

it "should not be present if value is empty" do
expect(artwork_with_empty_extensions).not_to include("extensions")
end
end
```

There was an scenario where the extensions are not set, in the expected results the extensions were
not present, so I provided a specific scenario to the test so it can be properly tested.

### To test

Just run
```
bundle install
ruby test.rb
```
32 changes: 32 additions & 0 deletions files/claude-monet-paintings.html

Large diffs are not rendered by default.

383 changes: 383 additions & 0 deletions files/expected-array-1.json

Large diffs are not rendered by default.

311 changes: 311 additions & 0 deletions files/expected-array-2.json

Large diffs are not rendered by default.

32 changes: 32 additions & 0 deletions files/miguel-angel-sculptures.html

Large diffs are not rendered by default.

46 changes: 46 additions & 0 deletions scraper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
require 'nokogiri'
require 'ferrum'

class Scraper

HOST = "https://www.google.com"
DEFAULT_FILE = "./files/van-gogh-paintings.html"

def initialize(file_path=DEFAULT_FILE)
@file_path = file_path
end

def perform
file = File.new(@file_path)
result = {}
result["artworks"] = []
browser = Ferrum::Browser.new
browser.goto("file:///#{File.expand_path(file)}")
browser.network.wait_for_idle
document = Nokogiri::HTML(browser.body)
image_section = document.search("div.Cz5hV > div")
image_section.each do |image_item|
artwork = process_section(image_item)
result["artworks"] << artwork
end
browser.quit
result
end

private def process_section(image_item)
artwork = {}
artwork["name"] = image_item.search("div.pgNMRc").text
extensions = image_item.search("div.KHK6lb > div")
extensions = extensions.slice(1, extensions.count)
extensions_array = []
extensions.each do |ext|
extensions_array << ext.text unless ext.text.empty?
end
artwork["extensions"] = extensions_array unless extensions_array.empty?
artwork["link"] = "#{HOST}#{image_item.search("a").attribute("href").value}"
image_attr = image_item.search("a > img.taFZJe").attribute("data-src")
image_attr ||= image_item.search("a > img.taFZJe").attribute("src")
artwork["image"] = image_attr.value
artwork
end
end
83 changes: 83 additions & 0 deletions test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
require 'rspec/autorun'
require './scraper'
require 'json'

describe Scraper do

subject { Scraper.new }
let(:result) { subject.perform }
let(:another_search) { Scraper.new "./files/claude-monet-paintings.html"}
let(:mg_search) { Scraper.new "./files/miguel-angel-sculptures.html"}

it 'should return a Hash' do
expect(result).to be_a(Hash)
end

it 'should contain an artworks Array' do
expect(result["artworks"]).to be_an(Array)
end

describe 'artworks' do
let(:artwork) { result["artworks"][0] }
let(:artwork_with_empty_extensions) { result["artworks"].select{|a| a["name"]=="Sunflowers"}.first}

it "should contain a name" do
expect(artwork["name"]).to be_a String
end

it "should not have an empty name" do
expect(artwork["name"]).to_not be_empty
end

it "should contain link" do
expect(artwork["link"]).to be_a String
end

it "should not have an empty link" do
expect(artwork["link"]).to_not be_empty
end

it "should contain image" do
expect(artwork["image"]).to be_a String
end

it "should not contain an empty image" do
expect(artwork["image"]).to_not be_empty
end

describe 'extensions' do
it "must be an Array, if present" do
expect(artwork["extensions"]).to be_an Array
end

it "can't contain empty Strings" do
expect(artwork["extensions"]).not_to include("")
end

it "should not be present if value is empty" do
expect(artwork_with_empty_extensions).not_to include("extensions")
end
end
end

it 'should be equal to example file' do
file_data = File.open("./files/expected-array.json").read
content = JSON.parse(file_data)
expect(result).to eq(content)
end

it 'should work with other image search pages' do
another_result = another_search.perform
file_data = File.open("./files/expected-array-1.json").read
content = JSON.parse(file_data)
expect(another_result).to eq(content)
end

it 'should work with miguel angel too :)' do
mg_result = mg_search.perform
file_data = File.open("./files/expected-array-2.json").read
content = JSON.parse(file_data)
expect(mg_result).to eq(content)
end

end