Skip to content

⭐ parse.xml #5423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

⭐ parse.xml #5423

wants to merge 3 commits into from

Conversation

arlimus
Copy link
Member

@arlimus arlimus commented Apr 6, 2025

Add support for parsing XML files, either by providing the file or by providing the XML contents directly.

Using inline data:

> parse.xml(content: '<root />').params
parse.xml.params: {
  root: {}
}

Via file example.xml:

<root>
  <box>
    <hello a="1"/>
  </box>
  <box>
    <world b="2">
      <c>3</c>
      4
    </world>
  </box>
  <box>🌎</box>
</root>

Running this in MQL:

> parse.json('example.xml').params
parse.xml.params: {
  root: {
    box: [
      0: {
        hello: {
          @a: "1"
        }
      }
...

Since the conversion from XML to a flat (JSON-like) structure isn't standardized, we took an approach very similar to other formatters (e.g. xq, jsonformatter). This means:

  • single items are added as a flat child element, e.g. the first <box/> element has a flat field hello above
  • once there are multiple elements, they are turned into a list, e.g. root.box is an array above
  • attributes are added as @attribute fields in the element
  • if an element only has text contents, the value is just set to the text, e.g. <root>val</root> is turned into {"root": "val"}
  • if an element has a mix of child elements and text contents, all text contents are added to the __text child field

Future consideration are to add access to the entire XML structure, which is more complex to traverse, but may be required for some use-cases. We will also handle streaming data separately (for all of these formats)

Add support for parsing XML files, either by providing the file or by providing the XML contents directly.

Signed-off-by: Dominik Richter <[email protected]>
Copy link
Contributor

github-actions bot commented Apr 6, 2025

Test Results

3 705 tests  +11   3 701 ✅ +11   1m 51s ⏱️ +4s
  400 suites ± 0       4 💤 ± 0 
   30 files   ± 0       0 ❌ ± 0 

Results for commit d3f4a33. ± Comparison against base commit da56dae.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@afiune afiune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit and one comment that needs to get in before merging (map re-initialization), besides that, this looks good to me!

if len(a) == 0 {
return
}
x.attributes = map[string]string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always re-initializes the map.

Suggested change
x.attributes = map[string]string{}
if x.attributes == nil {
x.attributes = map[string]string{}
}

child := x.children[i]
data, isElem, params := child._params()

// text data is added flat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I wonder if we should separate the text content into a slice to simplify this code, maybe adding the slice to the xmlElem and append it in UnmarshalXML() for the xml.CharData case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants