Goldmine

Extract a wealth of information from lists.

Goldmine is especially helpful when working with source data that is difficult to query. e.g. CSV files, API results, etc...

Uses

Data mining
Data transformation
Data blending
Data visualization prep
CSV report generation

Quick Start

gem install goldmine

require "goldmine"

list = [1,2,3,4,5,6,7,8,9]

Goldmine(list)
  .pivot("< 5") { |i| i < 5 }
  .to_h

{
  [["< 5", true]]  => [1, 2, 3, 4],
  [["< 5", false]] => [5, 6, 7, 8, 9]
}

Array Value Pivots

users = [
  { :name => "Sally",   :favorite_colors => [:blue] },
  { :name => "John",    :favorite_colors => [:blue, :green] },
  { :name => "Stephen", :favorite_colors => [:red, :pink, :purple] },
  { :name => "Emily",   :favorite_colors => [:orange, :green] },
  { :name => "Joe",     :favorite_colors => [:red] }
]

Goldmine(users)
  .pivot(:favorite_color) { |record| record[:favorite_colors] }
  .to_h

{
  [:favorite_color, :blue]   => [{:name=>"Sally", :favorite_colors=>[:blue]}, {:name=>"John", :favorite_colors=>[:blue, :green]}],
  [:favorite_color, :green]  => [{:name=>"John", :favorite_colors=>[:blue, :green]}, {:name=>"Emily", :favorite_colors=>[:orange, :green]}],
  [:favorite_color, :red]    => [{:name=>"Stephen", :favorite_colors=>[:red, :pink, :purple]}, {:name=>"Joe", :favorite_colors=>[:red]}],
  [:favorite_color, :pink]   => [{:name=>"Stephen", :favorite_colors=>[:red, :pink, :purple]}],
  [:favorite_color, :purple] => [{:name=>"Stephen", :favorite_colors=>[:red, :pink, :purple]}],
  [:favorite_color, :orange] => [{:name=>"Emily", :favorite_colors=>[:orange, :green]}]
}

Chained pivots

users = [
  { :name => "Sally",   :age => 21 },
  { :name => "John",    :age => 28 },
  { :name => "Stephen", :age => 37 },
  { :name => "Emily",   :age => 32 },
  { :name => "Joe",     :age => 18 }
]

Goldmine(users)
  .pivot("'e' in name") { |user| !!user[:name].match(/e/i) }
  .pivot("21 or over") { |user| user[:age] >= 21 }
  .to_h

{
  [["'e' in name", false], ["21 or over", true]]  => [{:name=>"Sally", :age=>21}, {:name=>"John", :age=>28}],
  [["'e' in name", true],  ["21 or over", true]]  => [{:name=>"Stephen", :age=>37}, {:name=>"Emily", :age=>32}],
  [["'e' in name", true],  ["21 or over", false]] => [{:name=>"Joe", :age=>18}]
}

Rollups

Rollups provide an intuitive way to aggregate pivoted data into a report friendly format. Think computed columns.

Rollups are blocks that get executed once for each pivot entry. They can be also be chained.

list = [1,2,3,4,5,6,7,8,9]

Goldmine(list)
  .pivot("< 5") { |i| i < 5 }
  .pivot("even") { |i| i % 2 == 0 }
  .rollup("count", &:count)
  .to_h

{
  [["< 5", true],  ["even", false]] => [["count", 2]],
  [["< 5", true],  ["even", true]]  => [["count", 2]],
  [["< 5", false], ["even", false]] => [["count", 3]],
  [["< 5", false], ["even", true]]  => [["count", 2]]
}

Rollup Caching

Rollups can be computationally expensive. Optional caching can be used to reduce this computational overhead.

list = [1,2,3,4,5,6,7,8,9]

Goldmine(list, cache: true)
  .pivot(:less_than_5) { |i| i < 5 }
  .rollup(:count, &:count)
  .rollup(:evens) { |list| list.select { |i| i % 2 == 0 }.count }
  .rollup(:even_percentage) { |list| cache[:evens] / cache[:count].to_f }
  .to_h

{
  [[:less_than_5, true]]  => [[:count, 4], [:evens, 2], [:even_percentage, 0.5]],
  [[:less_than_5, false]] => [[:count, 5], [:evens, 2], [:even_percentage, 0.4]]
}

Rows

It's often helpful to flatten rollups into rows.

list = [1,2,3,4,5,6,7,8,9]

result = Goldmine(list, cache: true)
  .pivot(:less_than_5) { |i| i < 5 }
  .rollup(:count, &:count)
  .rollup(:evens) { |list| list.select { |i| i % 2 == 0 }.count }
  .rollup(:even_percentage) { |list| cache[:evens] / cache[:count].to_f }
  .result

result.to_rows

[
  [[:less_than_5, true], [:count, 4], [:evens, 2], [:even_percentage, 0.5]],
  [[:less_than_5, false], [:count, 5], [:evens, 2], [:even_percentage, 0.4]]
]

result.to_hash_rows

[
  {:less_than_5=>true, :count=>4, :evens=>2, :even_percentage=>0.5},
  {:less_than_5=>false, :count=>5, :evens=>2, :even_percentage=>0.4}
]

Tabular

Rollups can also be converted into tabular format.

list = [1,2,3,4,5,6,7,8,9]

Goldmine(list)
  .pivot(:less_than_5) { |i| i < 5 }
  .pivot(:even) { |i| i % 2 == 0 }
  .rollup(:count, &:count)
  .to_tabular

[
  [:less_than_5, :even, :count],
  [true, false, 2],
  [true, true, 2],
  [false, false, 3],
  [false, true, 2]
]

CSV

Rollups can also be converted into CSV format.

list = [1,2,3,4,5,6,7,8,9]

Goldmine(list)
  .pivot(:less_than_5) { |i| i < 5 }
  .pivot(:even) { |i| i % 2 == 0 }
  .rollup(:count, &:count)
  .to_csv

"less_than_5,even,count\ntrue,false,2\ntrue,true,2\nfalse,false,3\nfalse,true,2\n"

Example Apps

All examples are small Sinatra apps. They are designed to help communicate Goldmine use-cases.

Setup

git clone [email protected]:hopsoft/goldmine.git
cd /path/to/goldmine
bundle

New York Wifi Hotspots

Uses data from https://github.com/hopsoft/goldmine/blob/master/examples/new_york_wifi_hotspots/DOITT_WIFI_HOTSPOT_01_13SEPT2010.csv

In this example, we mine out the following information.

Total hotspots by city, zip, & area code
Free hotspots by city, zip, & area code
Paid hotspots by city, zip, & area code
Library hotspots by city, zip, & area code
Starbucks hotspots by city, zip, & area code
McDonalds hotspots by city, zip, & area code

ruby examples/new_york_wifi_hotspots/app.rb

curl http://localhost:3000/raw
curl http://localhost:3000/pivoted
curl http://localhost:3000/rolled_up
curl http://localhost:3000/rows
curl http://localhost:3000/tabular
curl http://localhost:3000/csv

Medicare Physician Comparison

Uses data from http://dev.socrata.com/foundry/#/data.medicare.gov/aeay-dfax

In this example, we mine out the following information.

Total doctors by state & specialty
Preferred doctors by state & specialty
Female doctors by state & specialty
Male doctors by state & specialty
Preferred female doctors by state & specialty
Preferred male doctors by state & specialty

ruby examples/medicare_physician_compare/app.rb

curl http://localhost:3000/raw
curl http://localhost:3000/pivoted
curl http://localhost:3000/rolled_up
curl http://localhost:3000/rows
curl http://localhost:3000/tabular
curl http://localhost:3000/csv

Performance

The Medicare dataset is large & works well for performance testing.

My Macbook Pro yields the following benchmarks.

3.1 GHz Intel Core i7
16 GB 1867 MHz DDR3

100,000 Records

                      user     system      total        real
pivoted           0.630000   0.030000   0.660000 (  0.670409)
rolled_up         0.570000   0.030000   0.600000 (  0.626413)
rows              0.010000   0.000000   0.010000 (  0.003258)
tabular           0.010000   0.000000   0.010000 (  0.010110)
csv               0.050000   0.000000   0.050000 (  0.057677)

1,000,000 Records

                      user     system      total        real
pivoted           7.270000   0.300000   7.570000 (  8.053166)
rolled_up         6.800000   0.830000   7.630000 (  8.051707)
rows              0.000000   0.000000   0.000000 (  0.003934)
tabular           0.010000   0.000000   0.010000 (  0.011825)
csv               0.210000   0.010000   0.220000 (  0.222752)

Summary

Goldmine makes data highly malleable. It allows you to combine the power of pivots, rollups, tabular data, & csv to construct deep insights with minimal effort.

Real world use cases include:

Build a better understanding of database data before canonizing reports in SQL
Create source data for building user interfaces & data visualizations
Transform CSV data from one format to another

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
examples		examples
lib		lib
test		test
.gitignore		.gitignore
.rubocop.yml		.rubocop.yml
.travis.yml		.travis.yml
Gemfile		Gemfile
README.md		README.md
Rakefile		Rakefile
goldmine.gemspec		goldmine.gemspec
license.md		license.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goldmine

Uses

Quick Start

Array Value Pivots

Chained pivots

Rollups

Rollup Caching

Rows

Tabular

CSV

Example Apps

Setup

New York Wifi Hotspots

Medicare Physician Comparison

Performance

100,000 Records

1,000,000 Records

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

hopsoft/goldmine

Folders and files

Latest commit

History

Repository files navigation

Goldmine

Uses

Quick Start

Array Value Pivots

Chained pivots

Rollups

Rollup Caching

Rows

Tabular

CSV

Example Apps

Setup

New York Wifi Hotspots

Medicare Physician Comparison

Performance

100,000 Records

1,000,000 Records

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages