Skip to content

WIP DMS: Pass timeout through function calls#37136

Closed
nijave wants to merge 3 commits into
hashicorp:mainfrom
nijave:nv-dms-passthru-timeout
Closed

WIP DMS: Pass timeout through function calls#37136
nijave wants to merge 3 commits into
hashicorp:mainfrom
nijave:nv-dms-passthru-timeout

Conversation

@nijave
Copy link
Copy Markdown
Contributor

@nijave nijave commented Apr 26, 2024

These hard coded timeouts in waiters are being exceeded and there's currently no way to configure them. This passes through remaining time

Description

More configurable DMS task timeouts.

Relations

Closes #37026

References

Output from Acceptance Testing

% make testacc TESTS=TestAccXXX PKG=ec2

...

TODO
- Update DMS task resource so the timeout {} block works
- Switch to using ctx.WithTimeout instead of passing around time.Duration

  • Tests

@github-actions
Copy link
Copy Markdown
Contributor

Community Note

Voting for Prioritization

  • Please vote on this pull request by adding a 👍 reaction to the original post to help the community and maintainers prioritize this pull request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

For Submitters

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • For new resources and data sources, use skaff to generate scaffolding with comments detailing common expectations.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions Bot added size/S Managed by automation to categorize the size of a PR. service/dms Issues and PRs that pertain to the dms service. labels Apr 26, 2024
@terraform-aws-provider terraform-aws-provider Bot added the needs-triage Waiting for first response or review from a maintainer. label Apr 26, 2024
@github-actions github-actions Bot added size/M Managed by automation to categorize the size of a PR. and removed size/S Managed by automation to categorize the size of a PR. labels Apr 26, 2024
@nijave nijave changed the title WIP Pass timeout through function calls WIP DMS: Pass timeout through function calls Apr 26, 2024
@nijave nijave force-pushed the nv-dms-passthru-timeout branch 3 times, most recently from b09397a to 559b0cb Compare April 27, 2024 00:49
@justinretzolk justinretzolk added enhancement Requests to existing resources that expand the functionality or scope. timeouts Pertains to timeout increases. and removed needs-triage Waiting for first response or review from a maintainer. labels Apr 30, 2024
@nijave
Copy link
Copy Markdown
Contributor Author

nijave commented Apr 30, 2024

Did a manual test in our environment and looks like I'm not hitting the same errors as before. Here's timings

aws_dms_replication_task.this["8aabfa60a1b702e"]: Creation complete after 2m27s
aws_dms_replication_task.this["522c2b69a3d9dd5"]: Creation complete after 2m28s
aws_dms_replication_task.this["5482d767bbe155d"]: Destruction complete after 2m47s
aws_dms_replication_task.this["3434cb15ee8ba41"]: Destruction complete after 3m11s
aws_dms_replication_task.this["ecbaf063f420459"]: Destruction complete after 3m32s
aws_dms_replication_task.this["b872cc0298b7f0f"]: Creation complete after 4m19s
aws_dms_replication_task.this["46c1ca1e675936e"]: Creation complete after 4m59s
aws_dms_replication_task.this["993888985ffc7fb"]: Destruction complete after 5m10s
aws_dms_replication_task.this["32e9b9111b4c84c"]: Destruction complete after 5m50s
aws_dms_replication_task.this["ef0f02a57e4f5ea"]: Destruction complete after 6m4s
aws_dms_replication_task.this["e001486c347f9d7"]: Creation complete after 3m22s
aws_dms_replication_task.this["f145f7ba992c01c"]: Destruction complete after 6m51s
aws_dms_replication_task.this["05e56d6f53c18e7"]: Creation complete after 6m50s
aws_dms_replication_task.this["de607074351a17b"]: Creation complete after 7m20s
aws_dms_replication_task.this["3390840100e90a4"]: Creation complete after 3m53s
aws_dms_replication_task.this["44c1097330d673f"]: Creation complete after 8m24s
aws_dms_replication_task.this["10e1a2cbfaec4bc"]: Creation complete after 8m31s
aws_dms_replication_task.this["95e72ada0a6b985"]: Destruction complete after 8m41s
aws_dms_replication_task.this["881d1e357c2b272"]: Creation complete after 2m32s
aws_dms_replication_task.this["11733dba9ac93b8"]: Creation complete after 10m12s
aws_dms_replication_task.this["ae9e13555ee3e70"]: Destruction complete after 4m32s
aws_dms_replication_task.this["168d6da7c7d597d"]: Creation complete after 7m45s
aws_dms_replication_task.this["7d95bf1e7b08845"]: Destruction complete after 10m52s
aws_dms_replication_task.this["21fefed376eb8b3"]: Creation complete after 11m42s
aws_dms_replication_task.this["22072973d3eb470"]: Destruction complete after 9m25s
aws_dms_replication_task.this["64deafe212c1a40"]: Creation complete after 12m13s
aws_dms_replication_task.this["df81c749e2c872d"]: Creation complete after 12m23s
aws_dms_replication_task.this["17aa04c78ebd5b4"]: Creation complete after 12m43s
aws_dms_replication_task.this["ef16594f218a594"]: Creation complete after 2m42s
aws_dms_replication_task.this["8d5896137a49680"]: Destruction complete after 9m34s
aws_dms_replication_task.this["5515fc9f0ab2970"]: Destruction complete after 9m10s
aws_dms_replication_task.this["8509008450dbf1a"]: Creation complete after 11m22s
aws_dms_replication_task.this["5f1fd49d58f37f2"]: Creation complete after 6m23s
aws_dms_replication_task.this["029dbb1e0732576"]: Creation complete after 14m38s
aws_dms_replication_task.this["1d093541e00558a"]: Destruction complete after 11m45s
aws_dms_replication_task.this["120c79a10960e52"]: Creation complete after 9m45s
aws_dms_replication_task.this["94111270ac3f4c6"]: Destruction complete after 12m56s
aws_dms_replication_task.this["c3111361a470cdb"]: Destruction complete after 9m4s
aws_dms_replication_task.this["5f8c3c51b99c92a"]: Creation complete after 18m36s
aws_dms_replication_task.this["3878b48e3580e83"]: Destruction complete after 10m4s
aws_dms_replication_task.this["ffdbc21c5f55e86"]: Creation complete after 10m36s
aws_dms_replication_task.this["e7cc7d348ac9056"]: Destruction complete after 13m58s
aws_dms_replication_task.this["0519458dc38f705"]: Destruction complete after 15m47s
aws_dms_replication_task.this["1a0e32d2a7a5502"]: Destruction complete after 8m55s
aws_dms_replication_task.this["abd54e439de3945"]: Destruction complete after 11m26s
aws_dms_replication_task.this["8a25f50d4473342"]: Destruction complete after 15m57s
aws_dms_replication_task.this["c4ab4aa61bd2639"]: Creation complete after 14m18s

I believe creation/deletion increases over time since these are all on the same instance and the changes get internally queued on the AWS side (with limited concurrency)

@nijave
Copy link
Copy Markdown
Contributor Author

nijave commented Apr 30, 2024

I did hit

│ Error: starting DMS Replication Task (...): InvalidResourceStateFault: Test connection for replication instance ...-datalake-default and endpoint ...-datalake should be successful for starting the replication task
│
│   with aws_dms_replication_task.this["4c24b14bcd90514"],
│   on dms_tasks.tf line 57, in resource "aws_dms_replication_task" "this":
│   57: resource "aws_dms_replication_task" "this" {

now which I think must be a result of how connection testing works. All these DMS tasks use different PG databases/tables and send to the same target (S3) on the same replication instance

nijave and others added 2 commits April 30, 2024 16:13
These hardcoded timeouts in waiters are being exceeded and there's
currently no way to configure them. This passes through remaining time using context.WithTimeout
@nijave nijave force-pushed the nv-dms-passthru-timeout branch 3 times, most recently from 58457fd to c6f0904 Compare April 30, 2024 21:33
- Check to see if there's already an existing successful test before starting a new one
- Don't fail out of the test loop if a connection is already being tested
- Add replication instance filter to TestConnection waiter since tests are based on (replication instance, endpoint)
@nijave nijave force-pushed the nv-dms-passthru-timeout branch from c6f0904 to 478c123 Compare May 1, 2024 12:48
@Bexanderthebex
Copy link
Copy Markdown

@justinretzolk any chance can this get reviewed? I had to write a couple of bash script lines to ensure that the dms tasks reach a certain state before our terraform apply stage and after it to ensure that the dms tasks are properly started.

@nijave nijave closed this May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Warning

This Issue has been closed, meaning that any additional comments are much easier for the maintainers to miss. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Requests to existing resources that expand the functionality or scope. service/dms Issues and PRs that pertain to the dms service. size/M Managed by automation to categorize the size of a PR. timeouts Pertains to timeout increases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement]: aws_dms_replication_task take too long in aws and terraform terminate with a timeout, state report tainted object

3 participants