[SAC-30315] - Discovery, Sync Implementation#25
[SAC-30315] - Discovery, Sync Implementation#25mittal-tushar wants to merge 28 commits intomaster-rewritefrom
Conversation
…ylint 10/10 - Modularized streams into dedicated files (suppression, marketing, contacts, senders, suppression_groups, templates) with a backward-compatible core.py shim - Added 3 new streams: senders, marketing_contacts_count, marketing_field_definitions - Removed deprecated campaigns stream (API returns 401 at account level) - All 16 stream endpoints return HTTP 200 on the live SendGrid API - Added full JSON schemas for all 16 streams - Added unit tests (21 tests, 100% pass rate) covering client, discovery, sync, abstracts, streams registry, and entrypoint - Added integration test scaffolding in tests/base.py for all 16 streams - Achieved pylint 10.00/10 with full docstrings across all modules - Rewrote README in standard Singer tap format with per-stream details - Updated setup.py with extras_require[dev] and glob-based package_data - Added CircleCI config using stitch-tap-tester-uv image with uv, coverage, and daily build schedule - Added .gitignore, MANIFEST.in, CHANGELOG.md, LICENSE, PR template, and Copilot instructions
…sendgrid into SAC-30315-feature/discovery-sync
Schema changes: - Remove root-level 'additionalProperties' from all 13 affected schemas - Add 'format: date-time' to campaigns.created_at/updated_at, templates.updated_at, single_sends.send_at - Normalise all schema files to consistent 2-space JSON indent Bug fixes in abstracts.py: - IncrementalStream.sync(): fix TypeError when cursor_type='datetime' (was calling datetime.fromtimestamp on an ISO string) - IncrementalStream.sync() & FullTableStream.sync(): fix record count always returning 0 because Singer's Counter resets counter.value to 0 on __exit__; introduce separate record_count variable Unit tests: - tests/unittests/test_schemas.py: 23 parametrised tests covering valid JSON, root-level additionalProperties absent, date-time format, nullable types, UTF-8 encoding - tests/unittests/test_sync_behavior.py: 9 tests for datetime cursor, unix cursor, full-table count accuracy, discovery metadata structure
…ntal schemas The 'created' field in blocks, bounces, spam_reports, invalid_emails, global_suppressions and suppression_group_members is typed as [null, integer, string]. Since the union includes 'string' (used when a bookmark ISO value flows through), Singer spec and project conventions require format:date-time to be present. Also strengthen test_schemas.py: add 'created' to DATETIME_STRING_FIELD_NAMES, and add test_incremental_suppression_created_has_datetime_format to explicitly assert the annotation is present on all 6 streams.
…in integration tests; install tap_sendgrid and pytest in CI integration step
- Remove format:date-time from suppression created fields (API returns
Unix integer timestamps, not ISO strings; caused SchemaMismatch crash)
- Change suppression created type from [null,integer,string] to [null,integer]
- Remove created field from suppression_group_members (ASM endpoint returns
email strings with no timestamp)
- Fix segments.json: rename contact_count->contacts_count; add missing
fields: query_version, created_at, updated_at, sample_updated_at,
next_sample_update, parent_list_id, status (verified against real API)
- Fix senders.json: verified is an object {status, reason}, not boolean
- Fix marketing_contacts_count.json: identifier_counts keys AID/EID/EXTID/PHNID
match actual API response (not email/alternate_emails/phone_number_id/...)
- Remove orphan campaigns.json (no stream class; endpoint requires scopes
not available; new Marketing Campaigns API covered by single_sends)
- Fix SuppressionGroupMembers: override get_records to handle email-string
list response from /v3/asm/groups/{id}/suppressions
- Update unit tests to match new expected schema contracts: 171 passing
- Pylint 10.00/10 maintained
…ited attrs, drop redundant overrides - abstracts.py: add replication_method/replication_keys defaults to FullTableStream and IncrementalStream so subclasses don't repeat them - suppression.py: extract _SuppressionIncrementalStream base; each of the 5 suppression classes now only declares tap_stream_id and path - marketing.py: remove replication_method/replication_keys from all 5 classes (now inherited from FullTableStream) - contacts.py: same inherited-attr removal; drop next_page_params overrides that only returned None (CursorPagedStream already handles non-cursor responses by returning None) - senders.py: same cleanup; unused Any/Dict/Optional imports removed - templates.py: inherited-attr removal - suppression_groups.py: inherited-attr removal; drop redundant next_page_params on both classes; drop get_path override on SuppressionGroupMembers (identical to BaseStream.get_path) No logic changes. Pylint 10/10, 171/171 unit tests pass, discovery 16 streams, sync 16 SCHEMA + 36 RECORD + 31 STATE.
abstracts.py: - BaseStream.page_size = 500 (suppression endpoints support up to 500; add comment explaining the split) - FullTableStream.page_size = 50 overrides the base value (most SendGrid marketing/transactional APIs cap page_size at 100; 50 is a safe conservative default that works for all full-table endpoints) - FullTableStream.default_params now uses self.page_size consistently, so any subclass can tune its page size via a class attribute tests/base.py: - Add inline comment above expected_metadata explaining that API_LIMIT is the page_size value tap-tester injects during the pagination test run (not a ceiling on total records returned); set to 1 so pagination is validated even when the test account holds as few as 2 records Pylint 10/10, 171/171 unit tests, 16 SCHEMA + 36 RECORD + 31 STATE.
…andling, page_size
sync.py _order_streams_for_resume:
Replace bookmark-presence categorisation with circular ordering based on
STREAMS dict position. Starting from currently_syncing and continuing
in original dict order (wrapping at end) correctly handles the case
where tap-tester merges the first-sync state with manipulate_state(),
which puts all streams in bookmarks and caused 'already_completed'
ordering to put blocks before global_suppressions.
Fixes test_interrupted_sync_stream_order assertSetEqual failure.
schema.py write_schema:
- Remove streams_to_sync parameter (was always set(STREAMS.keys()) i.e.
all streams, making the guard meaningless).
- Guard against catalog.get_stream(child) == None (defensive: prevents
AttributeError crash when tap-tester supplies a trimmed catalog).
- Use child_obj.is_selected() to decide whether to add child to
parent.child_to_sync. Prevents suppression_groups making API calls
when neither it nor suppression_group_members is selected.
streams/abstracts.py FullTableStream.default_params:
Use self.page_size unconditionally (fixed at 50). Reading page_size
from config is dangerous: if a user sets page_size=500, marketing
endpoints that cap at 100 return HTTP 400. Incremental streams still
read page_size from config (suppression endpoints support up to 500).
tests/test_all_fields.py:
MISSING_FIELDS {} removed stale {segments: {contact_count}} entry.
Field was renamed contacts_count in schema and IS returned by the API.
tests/unittests/test_schema_extended.py:
Update write_schema call to match new signature (remove streams_to_sync).
.circleci/config.yml:
- mkdir -p test_output before pytest
- --junitxml=test_output/report.xml so CircleCI test-results tab works
- coverage report for inline text summary
- store_test_results: path: test_output (directory, not a single file)
README.md: clarify page_size applies to incremental streams only
CHANGELOG.md: add 0.2.0 entry
setup.py: bump version to 0.2.0
Pylint 10/10, 171/171 unit tests, 16 SCHEMA + 37 RECORD + 31 STATE.
…t branch Ensures integration tests run against the in-memory backend version of tap-tester (without a live target), matching the environment used during local development and CI validation of this PR.
…unts, fix interrupted sync order assertion
…ISO string, not unix int)
- Move _schema_type and _generate_value into SendgridBaseTest for reuse - Remove standalone SendgridDynamicMockGenerationTest class; fold mock validation into test_all_fields directly - Expand bookmark test to cover all 5 incremental streams (was 1) - Remove all module-level helper functions from test files - Fix import order (third-party before local) for pylint compliance - Fix W0221 arguments-differ in test_start_date (class attrs vs properties) - Remove trailing newlines in all test files - All 7 integration tests pass; pylint 10/10
There was a problem hiding this comment.
Pull request overview
Implements a Singer-compatible tap-sendgrid with discovery + sync orchestration, stream implementations, JSON schemas, and a full unit/integration test suite to validate pagination, bookmarking, and schema correctness.
Changes:
- Added core tap runtime: HTTP client (retry/rate-limit), discovery catalog building, and sync/resume orchestration.
- Implemented SendGrid streams (incremental suppression, full-table marketing/templates/senders, suppression group parent/child) plus registry/shims.
- Added JSON schemas, extensive unit tests, tap-tester integration tests, and CI configuration/docs.
Reviewed changes
Copilot reviewed 57 out of 60 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unittests/test_sync_behavior.py | Unit coverage for sync/bookmark/count behavior across cursor types and discovery metadata expectations. |
| tests/unittests/test_streams.py | Validates stream registry and suppression group member normalization behavior. |
| tests/unittests/test_schemas.py | Validates schema file presence/shape and key field constraints. |
| tests/unittests/test_schema_sync.py | Unit tests for schema loading and sync orchestration helpers. |
| tests/unittests/test_schema_extended.py | Tests child-stream schema setup behavior. |
| tests/unittests/test_entrypoint.py | Tests CLI entrypoint dispatch for discover vs sync. |
| tests/unittests/test_discover.py | Smoke test for discovery catalog containing expected streams. |
| tests/unittests/test_client_extended.py | Additional client/error-handling and mock-mode tests. |
| tests/unittests/test_client.py | Tests retry-after parsing, GET success path, and rate-limit error path. |
| tests/unittests/test_abstracts.py | Tests abstract stream base behavior, pagination helpers, unix cursor params, and bookmark write. |
| tests/unittests/init.py | Marks unittest package. |
| tests/test_start_date.py | tap-tester integration test for start_date behavior. |
| tests/test_pagination.py | tap-tester integration test for pagination behavior. |
| tests/test_interrupted_sync.py | tap-tester integration test for resume/interrupted sync recovery. |
| tests/test_discovery.py | tap-tester integration test for discovery expectations. |
| tests/test_bookmark.py | tap-tester integration test for incremental bookmarking behavior. |
| tests/test_automatic_fields.py | tap-tester integration test for automatic fields (PK) behavior. |
| tests/test_all_fields.py | tap-tester integration test for all-fields selection behavior. |
| tests/base.py | Base tap-tester harness for metadata expectations and value generation/parsing. |
| tests/init.py | Marks integration test package. |
| tap_sendgrid/sync.py | Implements sync loop with resume ordering + state writes. |
| tap_sendgrid/streams/templates.py | Templates full-table stream with generations param. |
| tap_sendgrid/streams/suppression_groups.py | Suppression groups parent + members child stream normalization. |
| tap_sendgrid/streams/suppression.py | Incremental suppression streams using unix cursor. |
| tap_sendgrid/streams/senders.py | Senders full-table stream with timestamp normalization to ISO. |
| tap_sendgrid/streams/marketing.py | Marketing full-table streams (lists/segments/single sends/stats). |
| tap_sendgrid/streams/core.py | Backward-compatible re-export shim for stream classes. |
| tap_sendgrid/streams/contacts.py | Marketing contacts count + field definitions streams with custom parsing. |
| tap_sendgrid/streams/abstracts.py | Base stream framework: pagination, incremental/full-table sync, bookmarking, counters. |
| tap_sendgrid/streams/init.py | Stream registry + ordering warning for parent/child sync. |
| tap_sendgrid/schemas/templates.json | Templates schema. |
| tap_sendgrid/schemas/suppression_groups.json | Suppression groups schema. |
| tap_sendgrid/schemas/suppression_group_members.json | Suppression group members schema. |
| tap_sendgrid/schemas/stats_automations.json | Stats automations schema. |
| tap_sendgrid/schemas/spam_reports.json | Spam reports schema. |
| tap_sendgrid/schemas/single_sends.json | Single sends schema. |
| tap_sendgrid/schemas/single_send_stats.json | Single send stats schema. |
| tap_sendgrid/schemas/senders.json | Senders schema (timestamps as date-time strings). |
| tap_sendgrid/schemas/segments.json | Segments schema (contacts_count + datetime fields). |
| tap_sendgrid/schemas/marketing_field_definitions.json | Marketing field definitions schema. |
| tap_sendgrid/schemas/marketing_contacts_count.json | Marketing contacts count schema. |
| tap_sendgrid/schemas/lists.json | Lists schema. |
| tap_sendgrid/schemas/invalid_emails.json | Invalid emails schema. |
| tap_sendgrid/schemas/global_suppressions.json | Global suppressions schema. |
| tap_sendgrid/schemas/bounces.json | Bounces schema. |
| tap_sendgrid/schemas/blocks.json | Blocks schema. |
| tap_sendgrid/schema.py | Loads schemas + constructs Singer metadata; sets up child schemas. |
| tap_sendgrid/exceptions.py | Defines typed exception hierarchy + status-code mapping. |
| tap_sendgrid/discover.py | Builds Singer Catalog from schemas/metadata. |
| tap_sendgrid/client.py | HTTP client with auth, backoff, rate-limit handling, and mock-data mode. |
| tap_sendgrid/init.py | Tap entrypoint: discover + main() CLI dispatch. |
| setup.py | Packaging + dependencies + console script entrypoint. |
| README.md | Tap documentation: streams, auth scopes, config, usage, testing. |
| MANIFEST.in | Includes license and schema JSONs in sdist. |
| LICENSE | Adds AGPLv3 license text. |
| CHANGELOG.md | Adds initial changelog entry. |
| .gitignore | Adds Python/test/build artifacts ignores. |
| .github/pull_request_template.md | Adds PR template with AI-generated code checkbox/link. |
| .github/copilot-instructions.md | Repo-specific implementation guidelines for Singer taps/targets. |
| .circleci/config.yml | CI: lint, unit tests, coverage artifacts, and tap-tester integration tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Unit tests for JSON schema files. | ||
|
|
||
| Validates that all schema files are: | ||
| - Valid JSON | ||
| - Properly structured (type, properties) | ||
| - Free of root-level additionalProperties | ||
| - Using date-time format on all datetime string fields | ||
| - Using consistent 2-space indentation | ||
| """ |
There was a problem hiding this comment.
The module docstring says schemas are "Free of root-level additionalProperties", but test_schema_no_root_additional_properties enforces that additionalProperties (when present) must be true. Please reconcile the docs/comments so the intended rule is unambiguous.
There was a problem hiding this comment.
Fixed:
setup.py
Outdated
| author="Stitch", | ||
| url="http://singer.io", | ||
| classifiers=["Programming Language :: Python :: 3 :: Only"], | ||
| py_modules=["tap_sendgrid"], |
There was a problem hiding this comment.
py_modules=["tap_sendgrid"] points to a top-level tap_sendgrid.py module, but this repo only has the tap_sendgrid/ package directory. This can break packaging/install builds. Remove py_modules (since packages=find_packages() already covers the package) or add the missing module file if that was intended.
| py_modules=["tap_sendgrid"], |
tap_sendgrid/client.py
Outdated
| if stripped.lstrip("-").isdigit(): | ||
| return int(stripped) | ||
| dt = datetime.fromisoformat(stripped.replace("Z", "+00:00")) | ||
| return int(dt.replace(tzinfo=timezone.utc).timestamp()) |
There was a problem hiding this comment.
to_unix_timestamp() uses dt.replace(tzinfo=timezone.utc) which changes the instant for inputs with a non-UTC offset (e.g. ...-05:00). This will compute the wrong epoch and can cause incorrect incremental start_time/bookmark behavior. Convert with dt.astimezone(timezone.utc) (or dt.timestamp() on an aware datetime) instead of replacing tzinfo.
| return int(dt.replace(tzinfo=timezone.utc).timestamp()) | |
| if dt.tzinfo is None: | |
| dt = dt.replace(tzinfo=timezone.utc) | |
| else: | |
| dt = dt.astimezone(timezone.utc) | |
| return int(dt.timestamp()) |
| bookmark = self.get_start_value(state) | ||
| if self.cursor_type == "unix" and isinstance(bookmark, str): | ||
| bookmark = to_unix_timestamp(bookmark) | ||
| current_max = bookmark | ||
| record_count = 0 | ||
|
|
||
| with metrics.record_counter(self.tap_stream_id) as counter: | ||
| params = self.get_params_for_sync(bookmark) | ||
| for record in self.get_records(params=params, parent_obj=parent_obj): | ||
| transformed_record = transformer.transform( | ||
| copy.deepcopy(record), self.schema, self.metadata | ||
| ) | ||
| record_value = self.normalize_record_cursor(transformed_record) | ||
| if record_value is None or record_value < bookmark: | ||
| continue | ||
|
|
||
| if self.is_selected(): | ||
| write_record(self.tap_stream_id, transformed_record) | ||
| counter.increment() | ||
| record_count += 1 | ||
|
|
||
| current_max = max(current_max, record_value) | ||
| for child in self.child_to_sync: | ||
| child.sync(state=state, transformer=transformer, parent_obj=record) | ||
|
|
||
| if self.cursor_type == "unix": | ||
| bk_str = datetime.fromtimestamp(current_max, tz=timezone.utc).isoformat() | ||
| else: | ||
| bk_str = str(current_max) | ||
| write_bookmark(state, self.tap_stream_id, self.replication_keys[0], bk_str) |
There was a problem hiding this comment.
For cursor_type == "datetime", bookmark filtering and current_max = max(...) rely on raw string comparison (record_value < bookmark). RFC3339 strings are not safely orderable lexicographically across equivalent-but-different representations (e.g. Z vs +00:00, different offsets), which can lead to skipped or duplicated records and incorrect bookmarks. Parse both bookmark and record cursor to timezone-aware datetime (or normalize to a canonical UTC string) before comparisons/max.
There was a problem hiding this comment.
Fixed:
| bookmark_format = "%Y-%m-%dT%H:%M:%S%z" | ||
| initial_bookmarks = { | ||
| "bookmarks": { | ||
| "blocks": {"created": "2020-01-01T00:00:00.000000Z"}, | ||
| "bounces": {"created": "2020-01-01T00:00:00.000000Z"}, | ||
| "spam_reports": {"created": "2020-01-01T00:00:00.000000Z"}, | ||
| "invalid_emails": {"created": "2020-01-01T00:00:00.000000Z"}, | ||
| "global_suppressions":{"created": "2020-01-01T00:00:00.000000Z"}, | ||
| } |
There was a problem hiding this comment.
bookmark_format is %Y-%m-%dT%H:%M:%S%z, but initial_bookmarks values include fractional seconds and a trailing Z (e.g. 2020-01-01T00:00:00.000000Z). This mismatch is likely to fail tap-tester parsing/validation. Either adjust the values to match the declared format (e.g. ...+0000 without microseconds) or update bookmark_format to match the actual timestamp shape used.
There was a problem hiding this comment.
Fixed:
tap-sendgrid/tests/test_bookmark.py
Line 17 in a933890
RushiT0122
left a comment
There was a problem hiding this comment.
Some request changes:
- In schema files last lines are missing
- Fix truncated changelog
Since your base branch is non-master branch, not able to identify breaking changes. Please highlight breaking changes and whether we need major/minor release.
Description of Changes
Ticket: https://qlik-dev.atlassian.net/browse/SAC-30315
Results: https://qlik-dev.atlassian.net/browse/SAC-30315?focusedCommentId=1056700
Breaking Changes (v2.0.0 — Major Release)
Breaking Changes
Streams migrated:
listsandsegmentsmoved to v3 Marketing API endpoints (/v3/marketing/lists,/v3/marketing/segments)Streams removed (deprecated v2 API endpoints):
contactslists_memberssegments_memberscampaignsNew streams added:
single_sendssingle_send_statsstats_automationssendersmarketing_contacts_countmarketing_field_definitionsUpdated dependencies:
singer-python5.13.2 → 6.1.1backoff==2.2.1addedpendulumandpytzremovedManual QA Steps
Rollback Steps
Revert the PR merge if needed.