-
-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preliminary XML support #224
base: master
Are you sure you want to change the base?
Conversation
This library has had a joke in the documentation that "xml will never be supported". :) I enjoy this joke, and it would entertain me to see it remain true. But, it can easily be removed instead. |
I haven't executed the code, but the approach looks relatively sensible. Do you think having xml in/out will be useful, even though the various forms it takes are so variable? |
Haha, I am aware of the joke. We will try to find another format to denigrate once this is done (let's say RDF)! Actually most datasets provided in XML format come in two pretty simple flavours: 1) One record per line, fields as attributes (like Stack Exchange data dump), and 2) Records as elements, fields as sub-elements. In my experience many XML data dumps are either already in one of these formats or can be reduced to one of these with a simple XPath. My objective, if I have time, is to support these two formats first. Then we can add XPath support in a later version. That is exactly what Google Sheets does, and I have found it largely sufficient for most data imports. For now the data read code snippet is a bit buggy, but it reads 70-80 percent of the files I have tried in the two formats I just mentioned. The XML writer must be more robust -- I spent some time on it this morning. I sincerely hate dealing with XML files, and that is why I am writing this: I just want to be able to turn them into other less finicky formats with as little hassle as possible. I am more of a data analyst than a programmer, and I think such a tool can be very useful for people like me. Try it with some data from Stack Exchange. It doesn't work with every dataset yet, but the outcomes is pretty cool. |
We support RDF too! Don't worry, we'll think of something ;) This is great work – I'm excited about it. If there's anything I can do to support the process, please let me know! |
Just let me know if I have to respect some conventions that you have abided by in the code up to now. I am pretty excited too: Finally I will have one data interchange package for all my needs (or most of them at least...)! |
This commit adds preliminary support for XML dataset import and export (no databook yet). The code uses only Python's internal libraries and works on both Python 2 and 3. It supports reading XML datasets with data saved as element or as attributes. I am sure the code has a lot of room for improvement, but I prefer to get some early feedback before finishing up.