Skip to content

2. Azure Synchronization

Danny Collier edited this page May 31, 2025 · 20 revisions

Overview

The ultimate deliverables for our application will be a set of data files (xml, txt, csv) and media files delivered to Azure file storage. (Note that this is Azure file storage and not an Azure storage blob.) SkillRX will store and manage its media files in its own file storage. A process within SkillRX will manage synchronization with Azure.

The Azure storage includes folders for delivery to these devices:

  • Mini computers (CSV and media)
  • Raspberry Pi Devices (XML, TXT, and media)
  • USB Storage (XML, TXT, and media)

The systems we are replacing do not deliver CSV files, so the launch of the mini computers in the first week of August 2025 is dependent on SkillRX going live. As of this writing, we are not planning to support USB storage devices.

Core Directories and Core Directory Archives

Our Azure storage is arranged by device and language. The root directory contains:

  • a directory for all mini computers (CMES-Mini)
  • a directory for english language Raspberry Pi content (CMES-Pi)
  • a directory for archived english language Raspberry Pi content (CMES-Pi_Archive)
  • a directory and matching archival directory for Raspberry Pi content for every other language that has content (currently only spanish). These have Language.code and an underscore prepended to their names. E.g. "SP_CMES-Pi_Archive"

For these requirements, we will refer to these as "core directories" or "core directory archives".

Each core directory contains subdirectories into which we will be placing the relevant data files and media. Media storage is consistent across core directories. In all cases, the uploaded media files go into "[core directory]/assets/content". Archived media files go into the root of the relevant core directory archive. E.g. an archived training material PDF from "SP_CMES-Pi/assets/content" will be moved into "SP_CMES-Pi_archive".

The file names for media will follow this pattern: "[topic_id]_[filename_with_extension]". We are using the same naming convention when storing these files in S3 for SkillRX and we are maintaining the Topic ID values when we import the data from CMES-Pi.

Mini Computer Data Files

The mini computers rely on a set of .csv files which they import to their local database. All text fields should be sanitized for anything that could disrupt a CSV import. (The Tags.csv example file for currently on staging contains problematic tags.)

Location in Azure

The files are uploaded to CMES-mini/assets/csv

Files and Contents

Author.csv

This file exists in the Azure repository but we are not supporting it. We can disregard this file.

CmesExtra.csv

I believe we can disregard this file. [need to check with stakeholders]

File.csv

Information about training materials.

Fields:

  • TopicID. The ID of the topic with which the training material media is associated.
  • FileName. The full name of the file following the naming conventions described in our requirements for training material uploads. [INSERT LINK TO REQUIREMENTS]
  • FileType. 2 for MP3. 1 for PDF. [Check with stakeholders for other values.]
  • FileSize. The size of the file.

Tag.csv.

Fields:

  • TagID. The ID in SkillRX.
  • Tag. The text of the tag.

Topic

Fields:

  • TopicID. The ID in SkillRX.
  • TopicName. Topic.title.
  • TopicVolume. Topic.published_at.year.
  • TopicIssue. [I believe we decided not to include this. Check requirements.]
  • TopicYear. Topic.published_at.year.
  • TopicMonth. Topic.published_at.month.
  • ContentProvider. Topic.provider.name.

TopicAuthor

We are not managing authors. Disregard this file.

TopicTag

The association between topics and tags.

Fields:

  • TopicID. The ID in SkillRX of the topic with which the tag is associated.
  • TagID. The ID in SkillRX of the tag.

Raspberry Pi XML Files

The mini computers get all of their data from one set of .csv files. The Raspberry Pi device data delivery is more complex. There are multiple file types, multiple files of some types, and even some redundancy due to differences in the configuration of different generations of client software on the Raspberry Pi devices currently in the field.

For instance, there are two versions of the XML files that contain topics: Provider and Legacy. These terms can be confusing, but the two files follow the exact same structure with only one difference: the "legacy" file is a single file containing the topics for all providers. The "provider" files each contain only the topics for one provider.

Storage Paths

We will generate and deliver one set of files for each language, delivered to the root storage path for the environment plus the paths and filenames specified here:

  • Legacy XML for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix]Server_XML.xml
  • Provider XML for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix][provider name].xml for every provider
  • New topics for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix]New_Uploads.xml
  • We will not generate Top topics for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix]Top_Topics.xml as these depend on collecting stats from remote devices.

For More Information

These file names and locations follow the patterns described in the XML Generation application.

See the XML Generation application for details. You may need to ask us to request access from the stakeholders. Here are some excerpts from that documentation:

From the XML File Generation app documentation: image

"Provider XML" structure from the XML Generation app documentation: image

"Legacy XML" structure from the XML Generation app documentation: image

Tag Files

In addition to the XML and the media files, the Raspberry PI core directories will receive .txt files containing tag information.

We will generate and deliver one set of files for each language, delivered to the root storage path for the environment plus the paths and filenames specified here:

  • Tag file for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/Tags/[language.file_storage_prefix]tags.txt
  • Tags and title file for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/Tags/[language.file_storage_prefix]tagsAndTitle.txt

Azure Client

Our Azure interface uses Azure File Shares, a gem written by Ruby for Good volunteer Dmitry Trager.

We have not found a viable way to set up Azure file storage for local development environments, so we are using a shared Azure file storage created by our stakeholders. We can provide the name of this environment and access details to developers who take on work related to Azure storage. Since this is a shared environment, local development environments will all be writing to the same storage, so developers will need to coordinate.

The Azure client will be used to:

  • Authenticate
  • Add individual media, csv, txt, and xml files to Azure.
  • Archive files when a topic is archived. There is no api for moving files, so we will delete the file and upload it to the archive location * Delete files when a topic is deleted

Synchronization

Synchronization between SkillRX and the Azure File Storage has two main aspects: training material media and generated files. The media can be synchronized in real time but file generation and synchronization will need to be handled asynchronously.

Training Materials

With training materials, we are maintaining two separate repositories with the same content but with different organizational structures. Files are added, replaced, moved, or--in rare cases and only by admins--they can be deleted.

We will handle these updates in real time. When an editor makes changes to a topic that affect that topic's training materials, we will complete those changes in SkillRX then synchronize those changes to the appropriate places in Azure.

Generated Files

With the generated xml, csv, and txt files, each change within SkillRX can result in changes in multiple files. A series of changes during an editing session can result in a cascading set of updates. We will handle file generation as a scheduled task, executed at intervals (TBD). We will also provide a way for admins to trigger a file generation/sync through the user interface.