Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running SFTP transfers with :ssh_sftp.read/4 fail #8724

Open
nsweeting opened this issue Aug 15, 2024 · 1 comment
Open

Long running SFTP transfers with :ssh_sftp.read/4 fail #8724

nsweeting opened this issue Aug 15, 2024 · 1 comment
Assignees
Labels
bug Issue is reported as a bug priority:medium team:PS Assigned to OTP team PS

Comments

@nsweeting
Copy link

nsweeting commented Aug 15, 2024

Describe the bug
Long running SFTP transfers using :ssh_sftp.read/4 seem to consistently fail at some point. Failure comes in the form of :ssh_sftp.read/4 getting stuck as a result of an :infinity timeout. The overall task wrapping the transfer eventually times out after x minutes of no data movement.

To Reproduce
Unfortunately this is a bit difficult. We were more or less able to reproduce - it just takes a long time. Essentially executing a long running SFTP transfer using :ssh_sftp.read/4 with a throttled download speed (500-600 kb/s range). After about 6-7GB of transfer - the read function seems to get "stuck" with no data movement.

Expected behavior
Long running SFTP transfers using :ssh_sftp.read/4 should complete.

Affected versions
erlang-26.2.5.1

Additional context
We run a service that is responsible for moving data from some SFTP location to our internal network. We move thousands of files a day. In this specific context - these servers are hosted by Salesforce. Download speeds are typically throttled to be in the 500-600 kb/s range. We can have many of these transfers running at the same time for the same server. We normally have no issue.

We recently upgraded the base docker image we use from hexpm/elixir:1.16.0-erlang-26.2.1-alpine-3.18.4 to hexpm/elixir:1.17.1-erlang-26.2.5.1-alpine-3.20.1. After this upgrade we had consistent failures for long running transfers. This would be for files in in 15GB range. They seemed to consistently fail in the 6-7GB range. We had days of these kinds of failures accumulate - so it actually seemed to be fairly reproducible - although it takes a long time! As soon as we switched back to hexpm/elixir:1.16.0-erlang-26.2.1-alpine-3.18.4 - all transfer jobs succeeded. Shorter transfers seem to have no issue.

Its difficult to know specifically whether this is an issue introduced from the OTP upgrade - but at this point - it seems related. There were a couple updates to the :ssh module within this upgrade range.

@nsweeting nsweeting added the bug Issue is reported as a bug label Aug 15, 2024
@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Aug 15, 2024
@IngelaAndin
Copy link
Contributor

Spontaneously this sounds like a ssh window_adjustment problem such fixed for a different scenario described here #7483. Our ssh expert is on vacation right now but he will be back soon an look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug priority:medium team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

3 participants