JSONL formatted input

Note that JSON was already requested in #51 

I would like to suggest that we implement JSONL input format as I believe it shouldn't be too hard to do.

# Mapping XML schema to JSONL schema

The DynaML XML format is already a tree, so it would map 1:1. A station in XML:
```
  <DnaStation>
    <Name>ALICE</Name>
    <Constraints>CCC</Constraints>
    <Type>LLH</Type>
    <StationCoord>
      <Name>ALICE</Name>
      <XAxis>-23.6701</XAxis>
      <YAxis>133.8855</YAxis>
      <Height>603.35</Height>
    </StationCoord>
    <Description>Alice Springs</Description>
  </DnaStation>
```
becomes:
```
  {"DnaStation":{"Name":"ALICE","Constraints":"CCC","Type":"LLH","StationCoord":{"Name":"ALICE","XAxis":"-23.6701","YAxis":"133.8855","Height":"603.35"},"
  Description":"Alice Springs"}}
```

Instead of trying to debate what should be the new schema, I suggest that the element and field names stay identical to the XML format. This means no new schema to learn and no mapping ambiguity.

# Business logic / data tests

Keeping the same schema will allow us to refactor the code easily and apply the same data validation rules to both the XML and JSON input formats. E.g.,

- There must be a positive number of directions
- You can't have blank Type
- Are the standard deviations valid

and so forth.

# JSONL: Line-oriented JSON

Having worked with GeoJSON and GeoJSONL (line-oriented version) on previous projects. I would like to suggest that we use JSONL (line-oriented JSON) instead of standard JSON (See #51). This means that we would have one line per station or measurement.

## Why is JSONL better than JSON?

Although JSON is clean and widely used and there exists rich tooling (`jq`, Python `json` module, etc), there are some disadvantages:

- It is still a single document (you can't concatenate: `cat vic.json nsw.json > vic+nsw.json`)
- You must parse the whole file in to memory to work with it
- No comments allowed so we would loose survey metadata comments

By using JSONL instead, we would get:
- Concatenation is easy: `cat survey1.jsonl survey2.jsonl > combined.jsonl` works!
- On concatenation there are no headers to strip, no closing tags, which makes batch workflows easier
- Line-oriented streaming/reading: read a line, parse it, discard. Memory usage stays flat regardless of file size. Same performance profile as the current (XML) SAX parser but with 20x simpler code
- `jq` handles it natively: `jq '.DnaStation.Name' < stations.json` works on JSONL without flags. `jq -s survey.jsonl` slurps it into an array if you need that
- `grep` works: `grep '"Type":"G"' measurements.json` finds all GPS baselines. You can't do that with XML (tags span lines unpredictably) or JSON.
- `wc -l` gives you record count (minus header line)
- `head -n 100` gives you the first 99 records. `tail`, `split`, `shuf` all work
- Easy to generate: no need to track document state, just print one JSON object per line
- `python -c "import json, sys; [print(json.dumps(r)) for r in records]"` is all you need to parse it in Python.
- Parallel processing: split file by line count, process chunks independently

Some cons:
- No comments either (same as JSON). Survey metadata would need to go in the header object or a separate metadata line.
- Less human-readable than pretty-printed JSON/XML for single records (everything on one line), but you should be able to `jq -s < survey.json` to get that?
- Not as universally known as JSON or XML. Some people haven't seen this format before

# Interesting use cases

Extract all GPS baselines from measurements and import
```
grep '"Type":"G"' measurements.json > gps.jsonl
dnaimport gps.jsonl
```

# Dependencies

We could use:
- `simdjson` (super fast!)
- `nlohmann/json` (fast)

# Considerations

We could probably do:
- JSON, and
- JSONL


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSONL formatted input #347

Mapping XML schema to JSONL schema

Business logic / data tests

JSONL: Line-oriented JSON

Why is JSONL better than JSON?

Interesting use cases

Dependencies

Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JSONL formatted input #347

Description

Mapping XML schema to JSONL schema

Business logic / data tests

JSONL: Line-oriented JSON

Why is JSONL better than JSON?

Interesting use cases

Dependencies

Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions