11# BigQuery Schema Generator
22
3+ [ ![ BigQuery Schema Generator CI] ( https://github.com/bxparks/bigquery-schema-generator/actions/workflows/pythonpackage.yml/badge.svg )] ( https://github.com/bxparks/bigquery-schema-generator/actions/workflows/pythonpackage.yml )
4+
35This script generates the BigQuery schema from the newline-delimited data
46records on the STDIN. The records can be in JSON format or CSV format. The
57BigQuery data importer (` bq load ` ) uses only the first 100 lines when the schema
@@ -12,7 +14,7 @@ $ generate-schema < file.data.json > file.schema.json
1214$ generate-schema --input_format csv < file.data.csv > file.schema.json
1315```
1416
15- ** Version** : 1.4 (2020-12-09 )
17+ ** Version** : 1.4.1 (2021-08-23 )
1618
1719** Changelog** : [ CHANGELOG.md] ( CHANGELOG.md )
1820
@@ -37,14 +39,17 @@ $ generate-schema --input_format csv < file.data.csv > file.schema.json
3739 * [ Ignore Invalid Lines (` --ignore_invalid_lines ` )] ( #IgnoreInvalidLines )
3840 * [ Existing Schema Path (` --existing_schema_path ` )] ( #ExistingSchemaPath )
3941 * [ Using as a Library] ( #UsingAsLibrary )
42+ * [ ` SchemaGenerator.run() ` ] ( #SchemaGeneratorRun )
43+ * [ ` SchemaGenerator.deduce_schema() ` ] ( #SchemaGeneratorDeduceSchema )
4044* [ Schema Types] ( #SchemaTypes )
4145 * [ Supported Types] ( #SupportedTypes )
4246 * [ Type Inferrence] ( #TypeInferrence )
4347* [ Examples] ( #Examples )
4448* [ Benchmarks] ( #Benchmarks )
4549* [ System Requirements] ( #SystemRequirements )
46- * [ Authors] ( #Authors )
4750* [ License] ( #License )
51+ * [ Feedback and Support] ( #Feedback )
52+ * [ Authors] ( #Authors )
4853
4954<a name =" Background " ></a >
5055## Background
@@ -290,7 +295,8 @@ Generate BigQuery schema from JSON or CSV file.
290295optional arguments:
291296 -h, --help show this help message and exit
292297 --input_format INPUT_FORMAT
293- Specify an alternative input format (' csv' , ' json' )
298+ Specify an alternative input format (' csv' , ' json' ,
299+ ' dict' )
294300 --keep_nulls Print the schema for null values, empty arrays or
295301 empty records
296302 --quoted_values_are_strings
@@ -312,7 +318,20 @@ optional arguments:
312318< a name=" InputFormat" ></a>
313319# ### Input Format (`--input_format`)
314320
315- Specifies the format of the input file, either ` json` (default) or ` csv` .
321+ Specifies the format of the input file as a string. It must be one of ` json`
322+ (default), ` csv` , or ` dict` :
323+
324+ * ` json`
325+ * a " file-like" object containing newline-delimited JSON
326+ * ` csv`
327+ * a " file-like" object containing newline-delimited CSV
328+ * ` dict`
329+ * a ` list` of Python ` dict` objects corresponding to list of
330+ newline-delimited JSON, in other words ` List[Dict[str, Any]]`
331+ * applies only if ` SchemaGenerator` is used as a library through the
332+ `run ()` or ` deduce_schema()` method
333+ * useful if the input data (usually JSON) has already been read into memory
334+ and parsed from newline-delimited JSON into native Python dict objects.
316335
317336If ` csv` file is specified, the ` --keep_nulls` flag is automatically activated.
318337This is required because CSV columns are defined positionally, so the schema
@@ -531,6 +550,12 @@ more details.
531550<a name="UsingAsLibrary"></a>
532551### Using As a Library
533552
553+ The `SchemaGenerator` class can be used programmatically as a library from a
554+ larger Python application.
555+
556+ <a name="SchemaGeneratorRun"></a>
557+ #### `SchemaGenerator.run()`
558+
534559The `bigquery_schema_generator` module can be used as a library by an external
535560Python client code by creating an instance of `SchemaGenerator` and calling the
536561`run(input, output)` method:
@@ -551,6 +576,17 @@ generator = SchemaGenerator(
551576generator.run(input_file=input_file, output_file=output_file)
552577```
553578
579+ The `input_format` is one of `json`, `csv`, and `dict` as described in the
580+ [Input Format](#InputFormat) section above. The `input_file` must match the
581+ format given by this parameter.
582+
583+ See the `TestSchemaGeneratorDeduce.test_run_with_input_and_output()` test
584+ case in [examples/test_generate_schema.py](examples/test_generate_schema.py) for
585+ an example of an `input_file` of type `json`.
586+
587+ <a name="SchemaGeneratorDeduceSchema"></a>
588+ #### `SchemaGenerator.deduce_schema()`
589+
554590If you need to process the generated schema programmatically, use the
555591`deduce_schema()` method and process the resulting `schema_map` and `error_log`
556592data structures like this:
@@ -583,12 +619,12 @@ schema_map2, error_logs = generator.deduce_schema(
583619)
584620```
585621
586- When using the `SchemaGenerator` object directly, the `input_format` parameter
587- supports `dict` as a third input format in addition to the `json` and `csv`
588- formats. The `dict` input format tells `SchemaGenerator.deduce_schema()` to
589- accept a list of Python dict objects as the `input_data`. This is useful if the
590- input data (usually JSON) has already been read into memory and parsed from
591- newline-delimited JSON into native Python dict objects .
622+ The `input_data` must match the `input_format` given in the constructor. The
623+ format is described in the [Input Format](#InputFormat) section above.
624+
625+ See the `TestSchemaGeneratorDeduce.test_deduce_schema_with_dict_input()` test
626+ case in [examples/test_generate_schema.py](examples/test_generate_schema.py) for
627+ an example of an `input_data` of type `dict` .
592628
593629<a name="SchemaTypes"></a>
594630## Schema Types
@@ -864,6 +900,22 @@ and 3.8.
864900
865901Apache License 2.0
866902
903+ < a name=" Feedback" ></a>
904+ # # Feedback and Support
905+
906+ If you have any questions, comments and other support questions about how to
907+ use this library, use the
908+ [GitHub Discussions](https://github.com/bxparks/bigquery-schema-generator/discussions)
909+ for this project. If you have bug reports or feature requests, file a ticket in
910+ [GitHub Issues](https://github.com/bxparks/bigquery-schema-generator/issues).
911+ I' d love to hear about how this software and its documentation can be improved.
912+ I can' t promise that I will incorporate everything, but I will give your ideas
913+ serious consideration.
914+
915+ Please refrain from emailing me directly unless the content is sensitive. The
916+ problem with email is that I cannot reference the email conversation when other
917+ people ask similar questions later.
918+
867919< a name=" Authors" ></a>
868920# # Authors
869921
0 commit comments