This is a Singer tap that produces JSON-formatted data following the Singer spec.
This tap:
-
Pulls raw data from the Sparkpost API.
-
Extracts the following resources:
-
Outputs the schema for each resource
-
Incrementally pulls data based on the input state
- Data Key = results
- Primary keys: ['event_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['id']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['domain']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['domain']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['recipient']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['id']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['customer_id']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['timestamp']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['id']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['id']
- Replication strategy: FULL_TABLE
- Data Key = results
- Primary keys: ['timestamp', 'domain']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'sending_ip']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'ip_pool']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'sending_domain']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'subaccount_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'campaign_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'template_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'subject_campaign']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'watched_domain']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'mailbox_provider']
- Replication strategy: INCREMENTAL
metrics_mailbox_provider_region
- Data Key = results
- Primary keys: ['timestamp', 'mailbox_provider_region']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp']
- Replication strategy: INCREMENTAL
- Supports precision parameter: Controls aggregation level (1min, 5min, 15min, hour, 12hr, day, week, month)
- Data Key = results
- Primary keys: ['timestamp', 'reason', 'classification_id']
- Replication strategy: INCREMENTAL
metrics_bounce_reason_by_domain
- Data Key = results
- Primary keys: ['timestamp', 'reason', 'domain', 'classification_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'classification_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'reason', 'rejection_category_id']
- Replication strategy: INCREMENTAL
metrics_rejection_reason_by_domain
- Data Key = results
- Primary keys: ['timestamp', 'reason', 'domain', 'rejection_category_id']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'reason']
- Replication strategy: INCREMENTAL
metrics_delay_reason_by_domain
- Data Key = results
- Primary keys: ['timestamp', 'reason', 'domain']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'link_name']
- Replication strategy: INCREMENTAL
- Data Key = results
- Primary keys: ['timestamp', 'attempt']
- Replication strategy: INCREMENTAL
-
Install
Clone this repository, and then install using setup.py. We recommend using a virtualenv:
> virtualenv -p python3 venv > source venv/bin/activate > python setup.py install OR > cd .../tap-sparkpost > pip install -e .
-
Dependent libraries. The following dependent libraries were installed.
> pip install singer-python > pip install target-stitch > pip install target-json
-
Create your tap's
config.jsonfile. The tap config file for this tap should include these entries:api_key(string, required): Your SparkPost API keystart_date(string, required): The default value to use if no bookmark exists for an endpoint (rfc3339 date string). Example:"2019-01-01T00:00:00Z"request_timeout(integer, optional): Max time in seconds for request to wait for response. Default:300precision(string, optional): Time-series metrics only. Controls aggregation level for metrics_time_series endpoint. Default:"day"
Precision Parameter Values:
The
precisionparameter is only applicable to the metrics_time_series stream. It controls how data is aggregated across time:"1min": 1-minute aggregation - Returns metrics aggregated in 1-minute intervals"5min": 5-minute aggregation - Returns metrics aggregated in 5-minute intervals"15min": 15-minute aggregation - Returns metrics aggregated in 15-minute intervals"hour": Hourly aggregation - Returns metrics aggregated in 1-hour intervals"12hr": 12-hour aggregation - Returns metrics aggregated in 12-hour intervals"day": Daily aggregation (default) - Returns metrics aggregated per day"week": Weekly aggregation - Returns metrics aggregated per week"month": Monthly aggregation - Returns metrics aggregated per month
Important Notes:
- Precision parameter is NOT supported by other metrics endpoints (metrics_recipient_domain, metrics_sending_ip, etc.)
- Once a sync begins with a specific precision, do not change it during the sync to avoid mixed aggregation levels
- Smaller precision values (1min, 5min) will return more granular data but may impact API performance
- Reference: SparkPost Time-Series Metrics API
Example config.json:
{ "api_key": "your_sparkpost_api_key_here", "start_date": "2019-01-01T00:00:00Z", "request_timeout": 300, "precision": "day" }Optionally, also create a
state.jsonfile.currently_syncingis an optional attribute used for identifying the last object to be synced in case the job is interrupted mid-stream. The next run would begin where the last job left off.{ "currently_syncing": "dummy_stream1", "bookmarks": { "dummy_stream1": "2019-09-27T22:34:39.000000Z", "dummy_stream2": "2019-09-28T15:30:26.000000Z", "dummy_stream3": "2019-09-28T18:23:53Z" } } -
Run the Tap in Discovery Mode This creates a catalog.json for selecting objects/fields to integrate:
tap-sparkpost --config config.json --discover > catalog.jsonSee the Singer docs on discovery mode here.
-
Run the Tap in Sync Mode (with catalog) and write out to state file
For Sync mode:
> tap-sparkpost --config tap_config.json --catalog catalog.json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To load to json files to verify outputs:
> tap-sparkpost --config tap_config.json --catalog catalog.json | target-json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To pseudo-load to Stitch Import API with dry run:
> tap-sparkpost --config tap_config.json --catalog catalog.json | target-stitch --config target_config.json --dry-run > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
-
Test the Tap While developing the sparkpost tap, the following utilities were run in accordance with Singer.io best practices: Pylint to improve code quality:
> pylint tap_sparkpost -d missing-docstring -d logging-format-interpolation -d too-many-locals -d too-many-argumentsPylint test resulted in the following score:
Your code has been rated at 9.67/10
To check the tap and verify working:
> tap_sparkpost --config tap_config.json --catalog catalog.json | singer-check-tap > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
Unit tests may be run with the following.
python -m pytest --verboseNote, you may need to install test dependencies.
pip install -e .'[dev]'
Copyright © 2019 Stitch