-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I would like to propose a change on the way XML data is decoded in pyxdf.
The current way that XML is decoded in pyxdf is to create defaultdict with a default factory constructor that creates a list. Attributes are not handled at all (which can be ok, but may restrict users) but the specification does not mention any restriction concerning the XML content.
More importantly, using a defaultdict with a list as a default creates an unnecessary deeper structure that has an important impact on how the 'info' header is handled. For example, my user code, and the pyxdf code has many of the following idiomatic examples:
self.nchns = int(xml['info']['channel_count'][0])
self.srate = round(float(xml['info']['nominal_srate'][0]))
logger.debug(' found stream ' + hdr['info']['name'][0])In these examples, one has two systematically add [0] to get the element. This can get messy with deeper XML contents. For example, I have something like:
<info>
<desc>
<channels>
<channel>
<label>L1</label>
</channel>
<channel>
<label>L2</label>
</channel>
<channel>
<label>L3</label>
</channel>
</channels>
</desc>
</info>To get the name of the first channel I need to do:
streams[1]['info']['desc'][0]['channels'][0]['channel'][0]The main problem of converting XML to a dictionary is that this is a complicated task and it would ideally need a schema. Converters from XML to JSON exist, which can be useful here, but there is an ambiguity on the attributes because JSON does not support them. There are several "standard" conventions to solve this problem, with different benefits/disadvantages (see http://wiki.open311.org/JSON_and_XML_Conversion)
What I propose is to two ideas:
- Let the user provide a function to convert XML to whatever format they choose. The user may have some knowledge on what to expect on their streams, but currently, they cannot handle the XML directly.
- Change the current XML parsing in favor of another library that handles this well. XML parsing is hard and it would probably be sensible to let someone else do that. I propose to replace the current
_load_xmlfunction for one that adopts the Parker convention as follows:
from xmljson import parker # https://github.com/sanand0/xmljson
def _load_xml(t):
"""Convert an attribute-less etree.Element into a dict."""
return parker.data(t, preserve_root=True)
In this proposal, it would be easier to traverse the information dictionary. The complicated example mentioned above becomes:
streams[1]['info']['desc']['channels']['channel'][0] One point that needs discussion in my proposal is that the temporary per-stream data object (amongst other cases) would need to be updated; it would not need to index [0] everywhere to get the contents:
class StreamData:
"""Temporary per-stream data."""
def __init__(self, xml):
"""Init a new StreamData object from a stream header."""
...
# number of channels
self.nchns = int(xml['info']['channel_count'])
# nominal sampling rate in Hz
self.srate = round(float(xml['info']['nominal_srate']))
# format string (int8, int16, int32, float32, double64, string)
self.fmt = xml['info']['channel_format']More importantly, if the user provides a function, it might become their responsibility to provide this information object, which complicates the usage. Maybe we could leave the current _load_xml but fill the "info" element of each stream with the provided user function (which defaults to the current one).