XML to dictionary decoding

I would like to propose a change on the way XML data is decoded in `pyxdf`.

The current way that XML is decoded in `pyxdf` is to create `defaultdict` with a default factory constructor that creates a list. Attributes are not handled at all (which can be ok, but may restrict users) but the specification does not mention any restriction concerning the XML content.
More importantly, using a `defaultdict` with a list as a default creates an unnecessary deeper structure that has an important impact on how the 'info' header is handled. For example, my user code, and the `pyxdf` code has many of the following idiomatic examples:

```python
self.nchns = int(xml['info']['channel_count'][0])
self.srate = round(float(xml['info']['nominal_srate'][0]))
logger.debug('  found stream ' + hdr['info']['name'][0])
``` 

In these examples, one has two systematically add `[0]` to get the element. This can get messy with deeper XML contents. For example, I have something like:
```xml
<info>
       <desc>
                <channels>
                        <channel>
                                <label>L1</label>
                        </channel>
                        <channel>
                                <label>L2</label>
                        </channel>
                        <channel>
                                <label>L3</label>
                        </channel>
                </channels>
        </desc>
</info>
```
To get the name of the first channel I need to do:
```python
streams[1]['info']['desc'][0]['channels'][0]['channel'][0]
```

The main problem of converting XML to a dictionary is that this is a complicated task and it would ideally need a schema. Converters from XML to JSON exist, which can be useful here, but there is an ambiguity on the attributes because JSON does not support them. There are several "standard" conventions to solve this problem, with different benefits/disadvantages (see http://wiki.open311.org/JSON_and_XML_Conversion)

What I propose is to two ideas:

* Let the user provide a function to convert XML to whatever format they choose. The user may have some knowledge on what to expect on their streams, but currently, they cannot handle the XML directly.
* Change the current XML parsing in favor of another library that handles this well. XML parsing is hard and it would probably be sensible to let someone else do that. I propose to replace the current `_load_xml` function for one that adopts the [Parker convention](http://wiki.open311.org/JSON_and_XML_Conversion/#the-parker-convention) as follows:
```
from xmljson import parker  # https://github.com/sanand0/xmljson


def _load_xml(t):
    """Convert an attribute-less etree.Element into a dict."""
    return parker.data(t, preserve_root=True)
```

In this proposal, it would be easier to traverse the information dictionary. The complicated example mentioned above becomes:
```python
streams[1]['info']['desc']['channels']['channel'][0] 
```

One point that needs discussion in my proposal is that the temporary per-stream data object (amongst other cases) would need to be updated; it would not need to index `[0]` everywhere to get the contents:
```python
    class StreamData:
        """Temporary per-stream data."""
        def __init__(self, xml):
            """Init a new StreamData object from a stream header."""
            ...
            # number of channels
            self.nchns = int(xml['info']['channel_count'])
            # nominal sampling rate in Hz
            self.srate = round(float(xml['info']['nominal_srate']))
            # format string (int8, int16, int32, float32, double64, string)
            self.fmt = xml['info']['channel_format']
```

More importantly, if the user provides a function, it might become their responsibility to provide this information object, which complicates the usage. Maybe we could leave the current `_load_xml` but fill the "info" element of each stream with the provided user function (which defaults to the current one). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

XML to dictionary decoding #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

XML to dictionary decoding #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions