Description
Describe the bug
The built-in retry functionality only retries when it gets a valid HTTP response representing a server error (or rate limit error), and does not retry when there’s an actual networking failure, e.g. if the server hangs up early or can’t be contacted.
My general experience is that a notable number of intermittent errors are networking failures where there is no completed HTTP response at all. Often this is a gateway/router/proxy server restarting, some intermediary service that gets overloaded, a container that is still starting and doesn’t yet have network access, or any number of other weird scenarios. Because of this, I usually want to retry on most networking errors (i.e. when await fetch(...)
throws).
I noticed the built-in retry support a while back (#1293) and have been meaning to try it out instead of my old home-grown solution. I started to play with it today and It’s not usable for me because of this issue.
To Reproduce
You can try this with a dummy HTTP server in express that breaks the connection on every other request:
const express = require("express");
const app = express();
const port = 8001;
let count = 0;
app.use((req, res) => {
console.log(`Request for "${req.originalUrl}"`);
count++;
if (count % 2 == 0) {
console.log(` Allowing response`);
res.end("ok");
} else {
console.log(` Forcing socket error`);
res.destroy();
}
});
app.listen(port, () => {
console.log(`Listening on port ${port}...`)
});
Run that server, and try to make a request to it with retries:
import { client, v1 } from '@datadog/datadog-api-client';
const configuration = client.createConfiguration({
authMethods: { apiKeyAuth: 'abc123' },
enableRetry: true,
maxRetries: 1,
// Use index 1, where we can easily configure things to use the test server.
serverIndex: 1,
});
configuration.setServerVariables({
name: 'api.datadoghq.test:8001',
protocol: 'http',
});
try {
const result = await (new v1.MetricsApi(configuration)).submitMetrics({
body: {
series: [{
metric: 'nodepackage.test.metric',
points: [[Math.floor(Date.now() / 1000), 2]],
type: 'count',
}]
}
});
console.log('Result:', result);
} catch (error) {
console.error('Error:', error);
process.exitCode = 1;
}
This fails on the first request rather than retrying and succeeding.
You can make this succeed by replacing the res.destroy()
line in the server with res.statusCode = 500; res.end('error');
.
Expected behavior
I’d expect the above to make two requests and end with a successful result.
Environment and Versions (please complete the following information):
- Node.js 22.9.0
@datadog/datadog-api-client
1.30.0