@@ -39,45 +39,81 @@ in JSON format on the STDOUT. This schema file can be fed back into the **bq
3939load** tool to create a table that is more compatible with the data fields in
4040the input dataset.
4141
42+ ## Installation
43+
44+ Install from [ PyPI] ( https://pypi.python.org/pypi ) repository using:
45+ ```
46+ $ pip3 install bigquery_schema_generator
47+ ```
48+
4249## Usage
4350
4451The ` generate_schema.py ` script accepts a newline-delimited JSON data file on
4552the STDIN. (CSV is not supported currently.) It scans every record in the
4653input data file to deduce the table's schema. It prints the JSON formatted
47- schema file on the STDOUT:
54+ schema file on the STDOUT. There are at least 3 ways to run this script:
55+
56+ If you installed using ` pip3 ` , then it should have installed a small helper
57+ script named ` generate-schema ` in your local ` ./bin ` directory of your current
58+ environment (depending on whether you are using a virtual environment).
59+
4860```
49- $ generate_schema.py < file.data.json > file.schema.json
61+ $ generate-schema < file.data.json > file.schema.json
5062```
5163
52- The schema file can be used in the ** bq** command using:
64+ You can invoke the module directly using:
65+ ```
66+ $ python3 -m bigquery_schema_generator.generate_schema < file.data.json > file.schema.json
67+ ```
68+
69+ If you retrieved this code from its [ GitHub
70+ repository] ( https://github.com/bxparks/bigquery-schema-generator ) , then you can invoke
71+ the Python script directly:
72+ ```
73+ $ ./generate_schema.py < file.data.json > file.schema.json
74+ ```
75+
76+ The resulting schema file can be used in the ** bq load** command using the
77+ ` --schema ` flag:
5378```
5479$ bq load --schema file.schema.json mydataset.mytable file.data.json
5580```
5681
5782where ` mydataset.mytable ` is the target table in BigQuery.
5883
59- A useful flag for ** bq load** is ` --ignore_unknown_values ` , which causes ` bq load `
84+ A useful flag for ** bq load** is ` --ignore_unknown_values ` , which causes ** bq load**
6085to ignore fields in the input data which are not defined in the schema. When
6186` generate_schema.py ` detects an inconsistency in the definition of a particular
6287field in the input data, it removes the field from the schema definition.
6388Without the ` --ignore_unknown_values ` , the ** bq load** fails when the
6489inconsistent data record is read.
6590
6691After the BigQuery table is loaded, the schema can be retrieved using:
92+
6793```
6894$ bq show --schema mydataset.mytable | python -m json.tool
6995```
96+
7097(The ` python -m json.tool ` command will pretty-print the JSON formatted schema
7198file.) This schema file should be identical to ` file.schema.json ` .
7299
73100### Options
74101
75102The ` generate_schema.py ` script supports a handful of command line flags:
76103
104+ * ` --help ` Prints the usage with the list of supported flags.
77105* ` --keep_nulls ` Print the schema for null values, empty arrays or empty records.
78106* ` --debugging_interval lines ` Number of lines between heartbeat debugging messages. Default 1000.
79107* ` --debugging_map ` Print the metadata schema map for debugging purposes
80108
109+ #### Help
110+
111+ Print the built-in help strings:
112+
113+ ```
114+ $ ./generate_schema.py --help
115+ ```
116+
81117#### Null Values
82118
83119Normally when the input data file contains a field which has a null, empty
@@ -122,7 +158,7 @@ With the ``keep_nulls``, the resulting schema file will be:
122158Example:
123159
124160```
125- $ generate_schema.py --keep_nulls < file.data.json > file.schema.json
161+ $ ./ generate_schema.py --keep_nulls < file.data.json > file.schema.json
126162```
127163
128164#### Debugging Interval
@@ -132,7 +168,7 @@ every 1000 lines of input data. This interval can be changed using the
132168` --debugging_interval ` flag.
133169
134170```
135- $ generate_schema.py --debugging_interval 1000 < file.data.json > file.schema.json
171+ $ ./ generate_schema.py --debugging_interval 1000 < file.data.json > file.schema.json
136172```
137173
138174#### Debugging Map
@@ -143,7 +179,7 @@ various fields and theirs types that was inferred using the data file. This
143179flag is intended to be used for debugging.
144180
145181```
146- $ generate_schema.py --debugging_map < file.data.json > file.schema.json
182+ $ ./ generate_schema.py --debugging_map < file.data.json > file.schema.json
147183```
148184
149185## Examples
@@ -212,7 +248,7 @@ $ cat file.schema.json
212248## System Requirements
213249
214250This project was developed on Ubuntu 17.04 using Python 3.5. It is likely
215- compatible with other python environments but I have not yet verified those.
251+ compatible with other Python environments but I have not yet verified those.
216252
217253## Author
218254
0 commit comments