|
| 1 | +# Bigboy |
| 2 | +Extract data from SQL Server, PostgreSQL, or MySQL, transforming SQL-to-JSON or JSON-to-JSON. |
| 3 | + |
| 4 | +Written by Dave Templin |
| 5 | + |
| 6 | +# Overview |
| 7 | +Bigboy is a tool that extracts data from SQL Server, PostgreSQL, or MySQL databases and transforms SQL-to-JSON or JSON-to-JSON; basically performing the **E** and the **T** part of **ETL** *(Extract/Transform/Load)*. The tool provides a simple model for configuring SQL extraction queries and optionally Javascript functions for transformations. A simple but powerful command-line interface (CLI) makes it easy to perform both adhoc and batch processing scenarios (BASH, CRON, etc.). The tool is also designed to maximize available local compute resources to extract and transform massive data volumes in a time-efficient way. |
| 8 | + |
| 9 | +## Features |
| 10 | +* Extract data from SQL Server, PostgreSQL, or MySQL |
| 11 | +* Perform SQL-to-JSON or JSON-to-JSON transformations |
| 12 | +* Nest rows to form complex hierarchical (or document-oriented) data |
| 13 | +* Leverage Javascript functions to perform arbitrarily complex data transformations |
| 14 | +* Define command driven parameters to create dynamic queries and scripts |
| 15 | +* Combine data from multiple different database sources |
| 16 | +* Apply timezone to dates stored without a timezone |
| 17 | +* Configure the tool to maximize local compute resources and minimize processing time |
| 18 | + |
| 19 | +## Quickstart |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +# Concepts |
| 24 | + |
| 25 | +## Connections |
| 26 | + |
| 27 | +## Targets |
| 28 | + |
| 29 | +## Fetching and Prefetching |
| 30 | +fetch, prefetch |
| 31 | + |
| 32 | +## Transforms |
| 33 | +nest, script, split, timezone |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +# Reference |
| 38 | + |
| 39 | +## Command Arguments |
| 40 | + |
| 41 | +* `-e` Maximum overall number of errors before aborting *(default=100)* |
| 42 | +* `-n` Include nulls in output *(default=false)* |
| 43 | +* `-o` Output directory *(creates \"out\" directory if not specified)* |
| 44 | +* `-p` Number of rows per page extracted *(default=1000)* |
| 45 | +* `-q` Supress informational output *(default=false)* |
| 46 | +* `-r` Number of consecutive errors before aborting *(default=3)* |
| 47 | +* `-v` Print version info about bigboy and exit |
| 48 | +* `-w` Number of background workers *(default=4)* |
| 49 | + |
| 50 | +> Above defaults can also be configured in the `config.json` file. |
| 51 | +
|
| 52 | +## config.json |
| 53 | +This section describes the `config.json` file format. |
| 54 | + |
| 55 | +| Name | Description | |
| 56 | +| --- | --- | |
| 57 | +| `connections` | ... | |
| 58 | +| `errors` | ... | |
| 59 | +| `nulls` | ... | |
| 60 | +| `page` | ... | |
| 61 | +| `quiet` | ... | |
| 62 | +| `retries` | ... | |
| 63 | +| `workers` | ... | |
| 64 | + |
| 65 | +### connections |
| 66 | +| Name | Description | |
| 67 | +| --- | --- | |
| 68 | +| `driver` | ... | |
| 69 | +| `server` | ... | |
| 70 | +| `database` | ... | |
| 71 | +| `dsn` | ... | |
| 72 | +| `port` | ... | |
| 73 | +| `user` | ... | |
| 74 | +| `password` | ... | |
| 75 | +| `max` | ... | |
| 76 | +| `timezone` | ... | |
| 77 | + |
| 78 | + |
| 79 | +## target.json |
| 80 | +This section describes the `target.json` file format. |
| 81 | + |
| 82 | +| Name | Description | |
| 83 | +| --- | --- | |
| 84 | +| `connection` | ... | |
| 85 | +| `fetch` | ... | |
| 86 | +| `params` | ... | |
| 87 | +| `prefetch` | ... | |
| 88 | +| `nest` | ... | |
| 89 | +| `script` | ... | |
| 90 | +| `split` | ... | |
| 91 | +| `timezone` | ... | |
| 92 | + |
| 93 | +### nest |
| 94 | +| Name | Description | |
| 95 | +| --- | --- | |
| 96 | +| `connection` | ... | |
| 97 | +| `childKey` | ... | |
| 98 | +| `parentKey` | ... | |
| 99 | +| `fetch` | ... | |
| 100 | +| `timezone` | ... | |
| 101 | + |
| 102 | +### param |
| 103 | +| Name | Description | |
| 104 | +| --- | --- | |
| 105 | +| `name` | ... | |
| 106 | +| `type` | ... | |
| 107 | +| `default` | ... | |
| 108 | + |
| 109 | +### split |
| 110 | +| Name | Description | |
| 111 | +| --- | --- | |
| 112 | +| `by` | ... | |
| 113 | +| `value` | ... | |
| 114 | + |
| 115 | + |
| 116 | +## Date Format |
| 117 | + |
| 118 | +All dates are assumed to be in GMT unless a timezone is specified. |
| 119 | +If a time is not specified then midnight GMT is assumed. |
| 120 | +Examples below illustrate various scenarios of specifying a date or date-range. |
| 121 | + |
| 122 | +The following examples assume there is a target named `log` with a single paramter of type `date` representing a start date for the extraction. |
| 123 | + |
| 124 | +| Example | Comments |
| 125 | +| ------------------------------------- | ------------------------------------------------------- | |
| 126 | +| `bigboy log 2017-07-21` | 7/21/2017 at midnight GMT |
| 127 | +| `bigboy log "2017-07-21 15:00:00"` | 7/21/2017 at 3pm GMT |
| 128 | +| `bigboy log today` | Midnight GMT of the current day |
| 129 | +| `bigboy log yesterday` | Midnight GMT of the previous day |
| 130 | + |
| 131 | +The following examples assume there is a target named `sales` with two paramters of type `date` representing a date range for the extraction. |
| 132 | + |
| 133 | +| Example | Comments |
| 134 | +| ------------------------------------- | ------------------------------------------------------- | |
| 135 | +| `bigboy sales 2017-07-21 2017-07-23` | From 7/21/2017 to 7/23/2017 midnight-to-midnight GMT |
| 136 | +| `bigboy sales 2017-07-21 2d` | Midnight GMT of the previous day. |
| 137 | + |
| 138 | + |
| 139 | +> The time zone database needed by LoadLocation may not be present on all systems, especially non-Unix systems. LoadLocation looks in the directory or uncompressed zip file named by the ZONEINFO environment variable, if any, then looks in known installation locations on Unix systems, and finally looks in $GOROOT/lib/time/zoneinfo.zip. |
| 140 | +
|
| 141 | + |
| 142 | + |
| 143 | +# Build |
| 144 | + |
| 145 | +Install [golang](https://golang.org/dl/) |
| 146 | + |
| 147 | +``` |
| 148 | +$ go get github.com/denisenkom/go-mssqldb |
| 149 | +$ go get github.com/lib/pq |
| 150 | +$ go get github.com/go-sql-driver/mysql |
| 151 | +$ git clone https://github.com/davetemplin/bigboy.git |
| 152 | +$ go build |
| 153 | +``` |
| 154 | + |
| 155 | +## Cross compile |
| 156 | +``` |
| 157 | +$ build windows |
| 158 | +$ build linux |
| 159 | +$ build mac |
| 160 | +``` |
| 161 | + |
| 162 | + |
| 163 | + |
| 164 | +# References |
| 165 | + |
| 166 | +There are lots of ways to approach ETL, and lots of vendors that want to sell you a solution! |
| 167 | +Here are some additional references that may be helpful... |
| 168 | + |
| 169 | +* [Wikipedia article on ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) |
| 170 | +* [Performing ETL from a Relational Database into BigQuery](https://cloud.google.com/solutions/performing-etl-from-relational-database-into-bigquery) |
| 171 | +* [ETL Software: Top 63](https://www.predictiveanalyticstoday.com/top-free-extract-transform-load-etl-software/) |
0 commit comments