Skip to content

Comments

Draft: Feature/tx wip#331

Draft
ladsmund wants to merge 26 commits intomainfrom
feature/tx_wip
Draft

Draft: Feature/tx wip#331
ladsmund wants to merge 26 commits intomainfrom
feature/tx_wip

Conversation

@ladsmund
Copy link
Contributor

@ladsmund ladsmund commented Apr 22, 2025

Restructuring the email, iridium and binary payload parsing functionalities

With focus on

  • Isolating functionalities between the individual layers: payload, iridium message, gmail
  • Simplifying the parser implementations
  • Keeping the current encoding versions

The most important new files are:

  • gmail_client.py: Module for gmail server interaction
  • iridium.py: Module for parsing iridium messages in order to extract the relevant metadata and payload attachment
  • payload_decoder.py: Module for decoding the different kind of binary payload formats described in payload_formats.csv
  • process.py: Example script combining the three modules above for fetching and decoding data messages transmitted via iridium. It is not finished and it might also be implemented around a cache or multiple steps in order monitor transmissions and avoid slow redownload in the future.

@ladsmund ladsmund requested a review from PennyHow April 22, 2025 20:32
@ladsmund ladsmund marked this pull request as draft April 22, 2025 20:32
ladsmund added 2 commits May 19, 2025 13:24
Drafting a main script for examplifying main components. This include:
* gmail client
* iridium message parsing
* Binary payload parsing

wip tx processing
* Implemented fully functional binary payload decoder
* Added parsing exception with handling examples

Added various comments
@github-actions
Copy link

github-actions bot commented Oct 28, 2025

Dataset Comparison Report

Differences have been found between the datasets produced using the PR branch and the main branch.
If you did not expect changes to be made to the dataset from your PR then please check this report andupdate your branch accordingly.

Variables missing in PR dataset

Variable
precip_u

Variables missing in Main dataset

Variable
z_boom_cor_u
z_stake_cor

Variable differences

Variable Issue
dsr Data values differ
albedo Data values differ
lon Data values differ
t_i_10m Data values differ
usr_cor Data values differ
d_t_i_6 Data values differ
wspd_x_u Data values differ
z_stake Data values differ
usr Data values differ
snow_height Data values differ
d_t_i_5 Data values differ
d_t_i_7 Data values differ
wspd_y_u Data values differ
z_boom_u Data values differ
d_t_i_4 Data values differ
wdir_u Data values differ
d_t_i_8 Data values differ
dsr_cor Data values differ
lat Data values differ
gps_lon Data values differ
z_ice_surf Data values differ
d_t_i_1 Data values differ
gps_alt Data values differ
d_t_i_2 Data values differ
z_surf_combined Data values differ
d_t_i_3 Data values differ
alt Data values differ
gps_lat Data values differ

Dataset attribute differences

Original dataset attributes (main branch)

['acknowledgements', 'altitude', 'altitude_origin', 'attribute', 'bedrock', 'cdm_data_type', 'comment', 'contributor_name', 'contributor_role', 'conventions', 'creator_email', 'creator_institution', 'creator_name', 'creator_type', 'creator_url', 'date_created', 'date_issued', 'date_metadata_modified', 'date_modified', 'detected_file_type', 'featureType', 'filename', 'format', 'geospatial_bounds', 'geospatial_bounds_crs', 'geospatial_lat_extents_match', 'geospatial_lat_max', 'geospatial_lat_min', 'geospatial_lat_resolution', 'geospatial_lat_units', 'geospatial_lon_extents_match', 'geospatial_lon_max', 'geospatial_lon_min', 'geospatial_lon_resolution', 'geospatial_lon_units', 'geospatial_vertical_max', 'geospatial_vertical_min', 'geospatial_vertical_positive', 'geospatial_vertical_resolution', 'geospatial_vertical_units', 'history', 'id', 'institution', 'instrument', 'instrument_vocabulary', 'keywords', 'keywords_vocabulary', 'latitude', 'latitude_origin', 'level', 'license', 'location_type', 'logger_type', 'longitude', 'longitude_origin', 'metadata_link', 'naming_authority', 'number_of_booms', 'platform', 'platform_vocabulary', 'processing_level', 'product_status', 'product_version', 'program', 'project', 'publisher_email', 'publisher_institution', 'publisher_name', 'publisher_type', 'publisher_url', 'references', 'references_bib', 'site_type', 'source', 'standard_name_vocabulary', 'station_id', 'summary', 'time_coverage_duration', 'time_coverage_end', 'time_coverage_resolution', 'time_coverage_start', 'title']

New dataset atrributes (PR branch)

['acknowledgements', 'altitude', 'altitude_origin', 'attribute', 'bedrock', 'cdm_data_type', 'comment', 'contributor_name', 'contributor_role', 'conventions', 'creater_email', 'creater_url', 'creator_institution', 'creator_name', 'creator_type', 'date_created', 'date_issued', 'date_metadata_modified', 'date_modified', 'featureType', 'format', 'geospatial_bounds', 'geospatial_bounds_crs', 'geospatial_lat_extents_match', 'geospatial_lat_max', 'geospatial_lat_min', 'geospatial_lat_resolution', 'geospatial_lat_units', 'geospatial_lon_extents_match', 'geospatial_lon_max', 'geospatial_lon_min', 'geospatial_lon_resolution', 'geospatial_lon_units', 'geospatial_vertical_max', 'geospatial_vertical_min', 'geospatial_vertical_positive', 'geospatial_vertical_resolution', 'geospatial_vertical_units', 'history', 'id', 'institution', 'instrument', 'instrument_vocabulary', 'keywords', 'keywords_vocabulary', 'latitude', 'latitude_origin', 'level', 'license', 'location_type', 'logger_type', 'longitude', 'longitude_origin', 'metadata_link', 'naming_authority', 'number_of_booms', 'platform', 'platform_vocabulary', 'processing_level', 'product_status', 'product_version', 'program', 'project', 'publisher_email', 'publisher_institution', 'publisher_name', 'publisher_type', 'publisher_url', 'references', 'references_bib', 'site_type', 'source', 'standard_name_vocabulary', 'station_id', 'summary', 'time_coverage_duration', 'time_coverage_end', 'time_coverage_resolution', 'time_coverage_start', 'title']

Coordinate differences

None

@PennyHow
Copy link
Member

So I have made a REST API client that should work alongside the IMAP client. The main things are:

  1. BaseGmailClient
  • Pure abstract interface.
  • Ensures both IMAP and REST clients have consistent methods (iter_messages_since, get_latest_uid).
  1. Concrete clients (IMAPClient, RestAPIClient)
  • Focuses on client-specific logic
  • Implements interface methods
  • Doesn't touch UID persistence
  1. uid_handler.py
  • Independent of transport.
  • Purely handles reading/writing last UID/historyId.
  • Can be reused in IMAP or REST workflow.
                 ┌─────────────────────────┐
                 │    uid_handler.py       │
                 │-------------------------│
                 │ read_uid_from_file()    │
                 │ write_uid_to_file()     │
                 │-------------------------│
                 │  Pure UID persistence   │
                 │  logic (independent of  │
                 │  transport)             │
                 └────────────┬────────────┘
                              │
                              │ used by
                              ▼
             ┌─────────────────────────────────┐
             │        BaseGmailClient          │
             │---------------------------------│
             │  Abstract base class / interface│
             │                                 │
             │  Abstract methods:              │
             │   - iter_messages_since(uid)    │
             │   - get_latest_uid()            │
             │                                 │
             │  No UID persistence here        │
             └────────────┬────────────────────┘
                          │
        ┌─────────────────┴─────────────────┐
        │                                   │
        ▼                                   ▼
 ┌───────────────┐                   ┌───────────────┐
 │   IMAPClient  │                   │ RestAPIClient │
 │───────────────│                   │───────────────│
 │  Implements   │                   │  Implements   │
 │  iter_messages│                   │  iter_messages│
 │  _since(uid)  │                   │  _since(uid)  │
 │  get_latest_uid│                  │  get_latest_uid│
 │               │                   │               │
 │ Uses Gmail IMAP│                  │ Uses Gmail    │
 │ protocol       │                  │ REST API      │
 └───────────────┘                   └───────────────┘
        │                                   │
        └─────────────┬─────────────────────┘
                      │
                      ▼
           ┌─────────────────────────────────────────┐
           │          AWS Message Processing         │
           │-----------------------------------------│
           │  - Uses client.iter_messages_since(uid) │
           │  - Uses uid_handler.read/write_uid      │
           │  - Handles parsing, filtering, saving   │
           └─────────────────────────────────────────┘

So the workflow would look something like this:

  1. Read last UID from file using uid_handler
  2. Pass UID to Gmail client (IMAPClient or RestAPIClient), along with credentials for client access
  3. Client yields messages since that UID
  4. Decode messages
  5. Write latest UID back using uid_handler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants