@@ -14,7 +14,7 @@ $ generate-schema < file.data.json > file.schema.json
1414$ generate-schema --input_format csv < file.data.csv > file.schema.json
1515```
1616
17- ** Version** : 1.4.1 (2021-08-23 )
17+ ** Version** : 1.5 (2021-11-14 )
1818
1919** Changelog** : [ CHANGELOG.md] ( CHANGELOG.md )
2020
@@ -38,6 +38,8 @@ $ generate-schema --input_format csv < file.data.csv > file.schema.json
3838 * [ Sanitize Names (` --sanitize_names ` )] ( #SanitizedNames )
3939 * [ Ignore Invalid Lines (` --ignore_invalid_lines ` )] ( #IgnoreInvalidLines )
4040 * [ Existing Schema Path (` --existing_schema_path ` )] ( #ExistingSchemaPath )
41+ * [ Preserve Input Sort Order
42+ (` --preserve_input_sort_order ` )] ( #PreserveInputSortOrder )
4143 * [ Using as a Library] ( #UsingAsLibrary )
4244 * [ ` SchemaGenerator.run() ` ] ( #SchemaGeneratorRun )
4345 * [ ` SchemaGenerator.deduce_schema() ` ] ( #SchemaGeneratorDeduceSchema )
@@ -289,6 +291,7 @@ usage: generate-schema [-h] [--input_format INPUT_FORMAT] [--keep_nulls]
289291 [--debugging_map] [--sanitize_names]
290292 [--ignore_invalid_lines]
291293 [--existing_schema_path EXISTING_SCHEMA_PATH]
294+ [--preserve_input_sort_order]
292295
293296Generate BigQuery schema from JSON or CSV file.
294297
@@ -313,6 +316,11 @@ optional arguments:
313316 File that contains the existing BigQuery schema for a
314317 table. This can be fetched with: ` bq show --schema
315318 < project_id> :< dataset> :< table_name>
319+ --preserve_input_sort_order
320+ Preserve the original ordering of columns from input
321+ instead of sorting alphabetically. This only impacts
322+ ` input_format` of json or dict
323+
316324` ` `
317325
318326< a name=" InputFormat" ></a>
@@ -547,6 +555,89 @@ See discussion in
547555[PR #57](https://github.com/bxparks/bigquery-schema-generator/pull/57) for
548556more details.
549557
558+ <a name="PreserveInputSortOrder"></a>
559+ #### Preserve Input Sort Order (`--preserve_input_sort_order`)
560+
561+ By default, the order of columns in the BQ schema file is sorted
562+ lexicographically, which matched the original behavior of `bq load
563+ --autodetect`. If the `--preserve_input_sort_order` flag is given, the columns
564+ in the resulting schema file is not sorted, but preserves the order of
565+ appearance in the input JSON data. For example, the following JSON data with
566+ the `--preserve_input_sort_order` flag will produce:
567+
568+ ```bash
569+ $ generate-schema --preserve_input_sort_order
570+ { "s": "string", "i": 3, "x": 3.2, "b": true }
571+ ^D
572+ [
573+ {
574+ "mode": "NULLABLE",
575+ "name": "s",
576+ "type": "STRING"
577+ },
578+ {
579+ "mode": "NULLABLE",
580+ "name": "i",
581+ "type": "INTEGER"
582+ },
583+ {
584+ "mode": "NULLABLE",
585+ "name": "x",
586+ "type": "FLOAT"
587+ },
588+ {
589+ "mode": "NULLABLE",
590+ "name": "b",
591+ "type": "BOOLEAN"
592+ }
593+ ]
594+ ```
595+
596+ It is possible that each JSON record line contains only a partial subset of the
597+ total possible columns in the data set. The order of the columns in the BQ
598+ schema will then be the order that each column was first *seen* by the
599+ script:
600+
601+ ```bash
602+ $ generate-schema --preserve_input_sort_order
603+ { "s": "string", "i": 3 }
604+ { "x": 3.2, "s": "string", "i": 3 }
605+ { "b": true, "x": 3.2, "s": "string", "i": 3 }
606+ ^D
607+ [
608+ {
609+ "mode": "NULLABLE",
610+ "name": "s",
611+ "type": "STRING"
612+ },
613+ {
614+ "mode": "NULLABLE",
615+ "name": "i",
616+ "type": "INTEGER"
617+ },
618+ {
619+ "mode": "NULLABLE",
620+ "name": "x",
621+ "type": "FLOAT"
622+ },
623+ {
624+ "mode": "NULLABLE",
625+ "name": "b",
626+ "type": "BOOLEAN"
627+ }
628+ ]
629+ ```
630+
631+ **Note**: In Python 3.6 (the earliest version of Python supported by this
632+ project), the order of keys in a `dict` was the insertion-order, but this
633+ ordering was an implementation detail, and not guaranteed. In Python 3.7, that
634+ ordering was made permanent. So the `--preserve_input_sort_order` flag
635+ **should** work in Python 3.6 but is not guaranteed.
636+
637+ See discussion in
638+ [PR #75](https://github.com/bxparks/bigquery-schema-generator/pull/75) for
639+ more details.
640+
550641<a name="UsingAsLibrary"></a>
551642### Using As a Library
552643
@@ -572,6 +663,7 @@ generator = SchemaGenerator(
572663 debugging_map=debugging_map,
573664 sanitize_names=sanitize_names,
574665 ignore_invalid_lines=ignore_invalid_lines,
666+ preserve_input_sort_order=preserve_input_sort_order,
575667)
576668generator.run(input_file=input_file, output_file=output_file)
577669```
@@ -903,14 +995,14 @@ Apache License 2.0
903995< a name=" Feedback" ></a>
904996# # Feedback and Support
905997
906- If you have any questions, comments and other support questions about how to
907- use this library, use the
908- [GitHub Discussions](https://github.com/bxparks/bigquery-schema-generator/discussions)
909- for this project. If you have bug reports or feature requests, file a ticket in
910- [GitHub Issues](https://github.com/bxparks/bigquery-schema-generator/issues).
911- I ' d love to hear about how this software and its documentation can be improved.
912- I can ' t promise that I will incorporate everything, but I will give your ideas
913- serious consideration .
998+ If you have any questions, comments, or feature requests for this library,
999+ please use the [GitHub
1000+ Discussions](https://github.com/bxparks/bigquery-schema-generator/discussions)
1001+ for this project. If you have bug reports, please file a ticket in [GitHub
1002+ Issues](https://github.com/bxparks/bigquery-schema-generator/issues). Feature
1003+ requests should go into Discussions first because they often have alternative
1004+ solutions which are useful to remain visible, instead of disappearing from the
1005+ default view of the Issue tracker after the ticket is closed .
9141006
9151007Please refrain from emailing me directly unless the content is sensitive. The
9161008problem with email is that I cannot reference the email conversation when other
@@ -936,4 +1028,6 @@ people ask similar questions later.
9361028 by Austin Brogle (abroglesc@) and Bozo Dragojevic (bozzzzo@).
9371029* Allow `SchemaGenerator.deduce_schema ()` to accept a list of native Python
9381030 ` dict` objects, by Zigfrid Zvezdin (ZiggerZZ@).
939-
1031+ * Make the column order in the BQ schema file match the order of appearance in
1032+ the JSON data file using the ` --preserve_input_sort_order` flag. By Kevin
1033+ Deggelman (kdeggelman@).
0 commit comments