Skip to content

Feature/File Type Conversion/XML to JSON #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions ea_airflow_util/callables/file_type.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question about the naming conventions. Because we already have a translate_csv_file_to_jsonl() method in the jsonl callable submodule, I'd argue we should put it there.

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import json
import xmltodict
import os
import logging

def xml_to_json(
xml_path: str,
output_path: str = None
):
"""
Transform an XML file into a JSON format.
"""

# Open the input XML file and read data in form of python dictionary using the "xmltodict" module.
try:
with open(xml_path) as xml_file:
data_dict = xmltodict.parse(xml_file.read())
except FileNotFoundError as error:
logging.error(f"Error: {str(error)} (XML file not found)")
except Exception as error:
logging.error(f"Error: {str(error)}")

# Generate the json_data object using json.dumps().
json_data = json.dumps(data_dict)

# Check if output_path is provided, otherwise set it to a default value (original XML folder path).
if output_path is None:
output_path = os.path.dirname(xml_path)
output_directory = f'{output_path}/json'
else:
output_directory = f'{output_path}/json'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an output path is explicitly provided, will we still need to add the JSON subfolder?


# Set the name of the json file to the name of the XML file provided.
file_name = os.path.splitext(os.path.basename(xml_path))[0]

# Create the output directory if it doesn't exist.
if not os.path.exists(output_directory):
os.makedirs(output_directory)

# Write the contents of the JSON file into the output folder path.
file_path = os.path.join(output_directory, f'{file_name}.json')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. Should a non-empty output_path be a file or a directory?

try:
with open(file_path, "w") as json_file:
json_file.write(json_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with XML. Will this write out rows that are bounded by brackets ([])? If so, we'll want to write out JSON lines by iterating over the dictionary rows and writing them out with a newline at the end.

except Exception as error:
logging.error(f"Error: {str(error)}")

return json_data