-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix: handle TLSSocket errors to prevent process crashes #3474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
B4nan
wants to merge
8
commits into
master
Choose a base branch
from
claude/slack-fix-ssl-crash-b5tLS
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+151
−0
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
5f89c36
fix: handle TLSSocket errors to prevent process crashes
claude 4f76257
test: add tests for TLSSocket error handling in GotScrapingHttpClient
claude 6d97b3d
fix(test): remove unused variable in failedRequestHandler callbacks
claude 373f84c
test: simplify socket error handling tests for reliability
claude 3cca8cd
fix(test): convert body Buffer to string before assertion
claude 0c70d66
style: format code with prettier
claude 6f952e0
style: format code with biome
claude 22bada4
fix: use socket.once to prevent memory leaks from accumulated listeners
claude File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -59,6 +59,17 @@ export class GotScrapingHttpClient implements BaseHttpClient { | |
| stream.on('error', reject); | ||
|
|
||
| stream.on('response', (response: PlainResponse) => { | ||
| // Handle socket errors to prevent unhandled TLS errors from crashing the process. | ||
| // TLS errors are emitted on the underlying socket (response.socket), not the got stream. | ||
| // Without this handler, errors like ERR_SSL_SSLV3_ALERT_UNEXPECTED_MESSAGE crash the process. | ||
| // Using `once` to avoid memory leaks from accumulated listeners on pooled sockets. | ||
| const socket = (response as any).socket; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, this is unnecessary even for higher-level errors (like the closed socket in the new test). Internal got error handling already handles these errors. See that the test suite will pass even without these changes here. |
||
| if (socket && typeof socket.once === 'function') { | ||
| socket.once('error', (err: Error) => { | ||
| stream.destroy(err); | ||
| }); | ||
| } | ||
|
|
||
| const result: StreamingHttpResponse = { | ||
| stream, | ||
| request, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| import http from 'node:http'; | ||
| import type { AddressInfo, Socket } from 'node:net'; | ||
|
|
||
| import { GotScrapingHttpClient, HttpCrawler } from '@crawlee/http'; | ||
| import { MemoryStorageEmulator } from 'test/shared/MemoryStorageEmulator'; | ||
|
|
||
| let server: http.Server; | ||
| let url: string; | ||
|
|
||
| const sockets = new Set<Socket>(); | ||
|
|
||
| const router = new Map<string, http.RequestListener>(); | ||
|
|
||
| router.set('/ok', (_req, res) => { | ||
| res.setHeader('content-type', 'text/html; charset=utf-8'); | ||
| res.end('<html><body>OK</body></html>'); | ||
| }); | ||
|
|
||
| router.set('/destroy-socket-after-headers', (req, res) => { | ||
| // Send headers, start body, then destroy the socket to simulate a mid-response error. | ||
| // This simulates the TLS error scenario where the socket fails after headers are sent. | ||
| res.setHeader('content-type', 'text/html; charset=utf-8'); | ||
| res.setHeader('content-length', '10000'); | ||
| res.write('<html>'); | ||
|
|
||
| // Destroy the underlying socket after a short delay to trigger an error | ||
| // after the 'response' event has already fired. | ||
| setTimeout(() => { | ||
| req.socket.destroy(); | ||
| }, 50); | ||
| }); | ||
|
|
||
| beforeAll(async () => { | ||
| server = http.createServer((request, response) => { | ||
| try { | ||
| const requestUrl = new URL(request.url!, 'http://localhost'); | ||
| const handler = router.get(requestUrl.pathname); | ||
| if (handler) { | ||
| handler(request, response); | ||
| } else { | ||
| response.statusCode = 404; | ||
| response.end(); | ||
| } | ||
| } catch { | ||
| response.destroy(); | ||
| } | ||
| }); | ||
|
|
||
| server.on('connection', (socket) => { | ||
| sockets.add(socket); | ||
| socket.on('close', () => sockets.delete(socket)); | ||
| }); | ||
|
|
||
| await new Promise<void>((resolve) => | ||
| server.listen(() => { | ||
| url = `http://127.0.0.1:${(server.address() as AddressInfo).port}`; | ||
| resolve(); | ||
| }), | ||
| ); | ||
| }); | ||
|
|
||
| const localStorageEmulator = new MemoryStorageEmulator(); | ||
|
|
||
| beforeEach(async () => { | ||
| await localStorageEmulator.init(); | ||
| }); | ||
|
|
||
| afterAll(async () => { | ||
| for (const socket of sockets) { | ||
| socket.destroy(); | ||
| } | ||
| await new Promise((resolve) => server.close(resolve)); | ||
| await localStorageEmulator.destroy(); | ||
| }); | ||
|
|
||
| describe('HttpCrawler socket error handling', () => { | ||
| test('should handle mid-response socket destruction gracefully without crashing', async () => { | ||
| const errors: Error[] = []; | ||
|
|
||
| const crawler = new HttpCrawler({ | ||
| httpClient: new GotScrapingHttpClient(), | ||
| maxRequestRetries: 0, | ||
| maxConcurrency: 1, | ||
| requestHandler: () => { | ||
| // Should not complete successfully for the error case | ||
| }, | ||
| failedRequestHandler: (_ctx, error) => { | ||
| errors.push(error as Error); | ||
| }, | ||
| }); | ||
|
|
||
| await crawler.run([`${url}/destroy-socket-after-headers`]); | ||
|
|
||
| // The request should have failed (not crashed the process). | ||
| // The key assertion is that we reach this point without process crash. | ||
| expect(errors.length).toBe(1); | ||
| }); | ||
|
|
||
| test('normal requests still work correctly', async () => { | ||
| const results: string[] = []; | ||
|
|
||
| const crawler = new HttpCrawler({ | ||
| httpClient: new GotScrapingHttpClient(), | ||
| maxRequestRetries: 0, | ||
| maxConcurrency: 1, | ||
| requestHandler: ({ body }) => { | ||
| results.push(body.toString()); | ||
| }, | ||
| }); | ||
|
|
||
| await crawler.run([`${url}/ok`]); | ||
|
|
||
| expect(results.length).toBe(1); | ||
| expect(results[0]).toContain('OK'); | ||
| }); | ||
|
|
||
| test('crawler recovers after socket error and processes next request', async () => { | ||
| const results: string[] = []; | ||
| const errors: Error[] = []; | ||
|
|
||
| const crawler = new HttpCrawler({ | ||
| httpClient: new GotScrapingHttpClient(), | ||
| maxRequestRetries: 0, | ||
| maxConcurrency: 1, | ||
| requestHandler: ({ body }) => { | ||
| results.push(body.toString()); | ||
| }, | ||
| failedRequestHandler: (_ctx, error) => { | ||
| errors.push(error as Error); | ||
| }, | ||
| }); | ||
|
|
||
| await crawler.run([`${url}/destroy-socket-after-headers`, `${url}/ok`]); | ||
|
|
||
| // One should fail and one should succeed, but no process crash. | ||
| expect(errors.length).toBe(1); | ||
| expect(results.length).toBe(1); | ||
| expect(results[0]).toContain('OK'); | ||
| }); | ||
| }); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
'response'event is emitted once the HTTP headers arrive, which is a long time after the TLS connection has been established (and theunexpected_messagealert has likely been received).