|
1 | 1 | # Atlas File Format |
2 | 2 |
|
3 | | -The Atlas format is a JSONL (JSON Lines) format adapted hold district maps from redistricting efforts. The Atlas format is a simple extension of the JSONL format that allows for the storage of maps and their associated data in a single file. The [AtlasIO.jl Julia library](https://github.com/jonmjonm/AtlasIO.jl/) provides the ability to read and write Atlas files. The [AtlasIO.py Python library](https://github.com/jonmjonm/AtlasIO.jl/tree/main/PythonReader), from the same git repository, provides the read but not the ability to write files. This format was developed by the [Duke Quantifying Gerrymandering Group](https://sites.duke.edu/quantifyinggerrymandering/). |
| 3 | +The Atlas format is a JSONL (JSON Lines) format adapted hold district maps from redistricting efforts. The Atlas format is a simple extension of the JSONL format that allows for the storage of maps and their associated data in a single file. The [AtlasIO.jl Julia library](https://github.com/jonmjonm/AtlasIO.jl/) provides the ability to read and write Atlas files. The [AtlasIO.py Python library](https://github.com/jonmjonm/AtlasIO.jl/tree/main/PythonReader), from the same git repository, provides the ability to read but not the ability to write files. This format was developed by the [Duke Quantifying Gerrymandering Group](https://sites.duke.edu/quantifyinggerrymandering/). |
4 | 4 | ## Structure of an Atlas File |
5 | 5 | Each individual line of an Atlas file is JSON object. As such they can be read line by line unlike a single JSON. |
6 | 6 | * This first line is a comment that identifies the file as an Atlas of maps and describes the Atlas format. |
7 | 7 | * The second line is a JSON object that describes the basic information of the collection of maps saved in this Atlas. |
8 | 8 | * The third line is a JSON object that describes the extra data assigned to each map. It can be adapted to the particular setting. In particular, it gives that data times and key names associated to the additional data. |
9 | | -* Each of the following lines, starting with the 4th line, is a JSON object a JSON object that describes a map and its associated data. |
| 9 | +* Each of the following lines, starting with the 4th line, is a JSON object that describes a map and its associated data. |
10 | 10 |
|
11 | 11 | ## File Extension and Compression |
12 | 12 |
|
13 | | -Atlas files the file extension `.jsonl` if the file in Atlas is plan, uncompressed text. If the Atlas is compressed it will either use the file extension `.jsonl.gz` or `.jsonl.bz2`. |
| 13 | +Atlas files use the file extension `.jsonl` if the file in Atlas is plain, uncompressed text. If the Atlas is compressed it will either use the file extension `.jsonl.gz` or `.jsonl.bz2`. |
14 | 14 |
|
15 | 15 | The `.gz` extension signifies the use of the standard [**Gnu Zip tools**](https://en.m.wikipedia.org/wiki/Gzip) (`gzip`, `gunzip`, `zcat`) and can be read by a number of libraries and command line tools. These tools use the standard Deflate algorithm to compress data. |
16 | 16 |
|
17 | 17 | The `.bz2` extension signifies the use of the standard |
18 | 18 | [**BZip2 tools**](https://en.m.wikipedia.org/wiki/Bzip2) (`bzip2`, `bzcat`) and also can be read by a number of libraries and command line tools. These tools use the standard Burrows–Wheeler algorithm to compress data. |
19 | 19 |
|
20 | | -The Bzip2 compression format typically results is smaller file that the Gzip compression format. However, the Bzip2 compression is slower to compress and uncompressed. We also explored saving files by saving the incremental changes in the maps. However, it was decided that the advantage of using standard compression tools was significant in light of the very high compression rations they delivered out of the box. |
| 20 | +The Bzip2 compression format typically results in a smaller file than the Gzip compression format. However, the Bzip2 compression is slower to compress and uncompress. We also explored saving files by saving the incremental changes in the maps. However, it was decided that the advantage of using standard compression tools was significant in light of the very high compression ratios they delivered out of the box. |
21 | 21 |
|
22 | | -## Work Directly with Compress Files |
| 22 | +## Work Directly with Compressed Files |
23 | 23 |
|
24 | 24 | One nice feature of the AtlasIO libraries, both in Julia and Python, is that they can read and write compressed files directly. This both increase the speed of writing and decreases the size of the Atlas files significantly. |
25 | 25 |
|
|
0 commit comments