Skip to content

Do a full_table refresh using version messages#10

Open
cdilga wants to merge 7 commits intosinger-io:masterfrom
cdilga:master
Open

Do a full_table refresh using version messages#10
cdilga wants to merge 7 commits intosinger-io:masterfrom
cdilga:master

Conversation

@cdilga
Copy link
Copy Markdown
Contributor

@cdilga cdilga commented Sep 25, 2019

On the assignments schema, which fails to delete old messages
See #9 and #5 for clarification around the issues, and a different tap but related issue in singer-io/tap-harvest#16

In order to allow any given target to update their target, a version number is required. All old versions are now removed, and the state of the warehouse no longer includes the old deleted records from the assignments table.

On the assignments schema, which fails to delete old messages
@cdilga
Copy link
Copy Markdown
Contributor Author

cdilga commented Oct 9, 2019

Testing setup:

See the updated README.MD for a description of the setup, as the old README was outdated and didn't work.

To test a code was generated from the developer section of harvest for oauth and formatted as specified in the README.md with client id, oauth and refresh tokens retrieved from https://id.getharvest.com/oauth2/authorize?client_id={OAUTH_CLIENT_ID}&response_type=code

Then, the code was run locally as per the getting started guide

The tap was tested with tap-postgres and tap-singer in a local postgres environment in docker, and then with our Panoply data warehouse.

To recreate the behavior, first create a user and project in Harvest Forecast, and assign that user to a project.
Run the tap and target.
Then, delete the assignment and rerun the tap and target.
Now, with the original code as per #9 and #5 the assignment persists in the target.
The changes in this pull request mean that all of the changes that the deletion of the assignment will be reflected in the target

Since this pull request has been submitted, a laptop has been running a WSL Bash script every 30 minutes scheduled through windows task scheduler to push new changes through to Stitch and then the Panoply data warehouse

@cdilga
Copy link
Copy Markdown
Contributor Author

cdilga commented Nov 18, 2019

Description of change

It was noticed that the assignments schema fails to delete old messages.

See #9 and #5 for clarification around the issues, and a different tap but related issue in singer-io/tap-harvest#16

The problem is that source deletes do not cause corresponding deletes in the target.
In order to allow any given target to update their target, a version number is required. All old versions are now removed, and the state of the warehouse no longer includes the old deleted records from the assignments table.

This change simply introduces the versioning messages and sets the active version upon completion to point to new data. Due to limitations in the source, the updated_at field which is present on most (notably excluding the placeholders schema) is invalid as a reference key as the delete actions are not exposed by the source api with a corresponding updated_at field of newer date to the old refresh.

Manual QA steps

See the updated README.MD for a description of the setup, as the old README was outdated and didn't work.

To test a code was generated from the developer section of harvest for oauth and formatted as specified in the README.md with client id, oauth and refresh tokens retrieved from https://id.getharvest.com/oauth2/authorize?client_id={OAUTH_CLIENT_ID}&response_type=code

Then, the code was run locally as per the getting started guide

The tap was tested with tap-postgres and tap-singer in a local postgres environment in docker, and then with our Panoply data warehouse.

To recreate the behavior, first create a user and project in Harvest Forecast, and assign that user to a project.
Run the tap and target.
Then, delete the assignment and rerun the tap and target.
Now, with the original code as per #9 and #5 the assignment persists in the target.
The changes in this pull request mean that all of the changes that the deletion of the assignment will be reflected in the target

Since this pull request has been submitted, a laptop has been running a WSL Bash script every 30 minutes scheduled through windows task scheduler to push new changes through to Stitch and then the Panoply data warehouse

Risks

  • Performance issues on full table refresh for every single table

Rollback steps

  • revert this branch

@robink
Copy link
Copy Markdown

robink commented Jun 30, 2020

Hey there, do you still use this tap and do you still encounter the updated_at error ?

@lochlmond
Copy link
Copy Markdown

Hi there,
Can this tap with full_table refresh be used through stitch?

@cdilga
Copy link
Copy Markdown
Contributor Author

cdilga commented Jan 31, 2021

Yep, we use this!
It definitely fixes the issue with hard deletes not being propagated.

@cdilga
Copy link
Copy Markdown
Contributor Author

cdilga commented Jan 31, 2021

@lochlmond if this PR is merged, everyone on Stitch will automatically start using it.

Until this is merged, please share this PR with stitch support to encourage integration.

Currently, the answer is no

@cdilga
Copy link
Copy Markdown
Contributor Author

cdilga commented Apr 21, 2021

I'll check out these merge conflicts and hopefully it'll get approved again

@cdilga
Copy link
Copy Markdown
Contributor Author

cdilga commented Apr 21, 2021

I believe that this was actually included when we merged #17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants