Skip to content

⭐ parse.xml #5423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 23, 2025
Merged

⭐ parse.xml #5423

merged 4 commits into from
Apr 23, 2025

Conversation

arlimus
Copy link
Member

@arlimus arlimus commented Apr 6, 2025

Add support for parsing XML files, either by providing the file or by providing the XML contents directly.

Using inline data:

> parse.xml(content: '<root />').params
parse.xml.params: {
  root: {}
}

Via file example.xml:

<root>
  <box>
    <hello a="1"/>
  </box>
  <box>
    <world b="2">
      <c>3</c>
      4
    </world>
  </box>
  <box>🌎</box>
</root>

Running this in MQL:

> parse.json('example.xml').params
parse.xml.params: {
  root: {
    box: [
      0: {
        hello: {
          @a: "1"
        }
      }
...

Since the conversion from XML to a flat (JSON-like) structure isn't standardized, we took an approach very similar to other formatters (e.g. xq, jsonformatter). This means:

  • single items are added as a flat child element, e.g. the first <box/> element has a flat field hello above
  • once there are multiple elements, they are turned into a list, e.g. root.box is an array above
  • attributes are added as @attribute fields in the element
  • if an element only has text contents, the value is just set to the text, e.g. <root>val</root> is turned into {"root": "val"}
  • if an element has a mix of child elements and text contents, all text contents are added to the __text child field

Future consideration are to add access to the entire XML structure, which is more complex to traverse, but may be required for some use-cases. We will also handle streaming data separately (for all of these formats)

Add support for parsing XML files, either by providing the file or by providing the XML contents directly.

Signed-off-by: Dominik Richter <[email protected]>
Copy link
Contributor

github-actions bot commented Apr 6, 2025

Test Results

3 705 tests  +11   3 701 ✅ +11   2m 16s ⏱️ +29s
  400 suites ± 0       4 💤 ± 0 
   30 files   ± 0       0 ❌ ± 0 

Results for commit 82e7121. ± Comparison against base commit da56dae.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@afiune afiune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit and one comment that needs to get in before merging (map re-initialization), besides that, this looks good to me!

child := x.children[i]
data, isElem, params := child._params()

// text data is added flat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I wonder if we should separate the text content into a slice to simplify this code, maybe adding the slice to the xmlElem and append it in UnmarshalXML() for the xml.CharData case.

@chris-rock chris-rock merged commit 03bc935 into main Apr 23, 2025
17 checks passed
@chris-rock chris-rock deleted the dom/parse.xml branch April 23, 2025 13:54
@github-actions github-actions bot locked and limited conversation to collaborators Apr 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants