fix: indefinite hangs during TLS handshake and cipher enumeration#960
fix: indefinite hangs during TLS handshake and cipher enumeration#960allornothingai wants to merge 3 commits intoprojectdiscovery:mainfrom
Conversation
Neo - PR Security ReviewNo security issues found Highlights
Hardening Notes
Comment |
WalkthroughTLS handshakes were made context-aware with per-call timeouts and refactored to run the handshake in a goroutine with context cancellation; PDCP upload writer buffer was enlarged and channel sends made non-blocking with dropped-result counting and warning logs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ZTLS as ztls.Client
participant Goroutine as Handshake Goroutine
participant Server as Remote TLS Server
participant CtxMgr as Context (with timeout)
Client->>ZTLS: EnumerateCiphers / ConnectWithOptions
ZTLS->>CtxMgr: create ctx (with optional timeout)
ZTLS->>Goroutine: start goroutine to run tlsConn.Handshake()
ZTLS->>CtxMgr: monitor ctx.Done()
Goroutine->>Server: perform Handshake (tlsConn.Handshake)
alt Handshake completes
Server-->>Goroutine: handshake result/error
Goroutine-->>ZTLS: send error on errChan
ZTLS->>ZTLS: receive err, treat tls.ErrCertsOnly as nil
else Context cancels / times out
CtxMgr-->>ZTLS: ctx.Done signal
ZTLS->>Goroutine: consider handshake aborted (select chooses ctx)
end
ZTLS->>CtxMgr: cancel context and cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/tlsx/ztls/ztls.go (1)
326-341: Goroutine-based timeout pattern correctly fixes the race condition.The buffered channel (size 1) ensures the goroutine won't leak when the context times out before the handshake completes. The
ErrCertsOnlyhandling aligns with the client'sCertsOnlymode.Minor convention note: Go idiom places
ctx context.Contextas the first parameter. Consider reordering for consistency with standard library patterns.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` around lines 326 - 341, The function tlsHandshakeWithTimeout currently takes (tlsConn *tls.Conn, ctx context.Context); reorder parameters to place ctx context.Context first to follow Go conventions (i.e., change signature to tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)), and update all call sites that invoke tlsHandshakeWithTimeout to pass the context as the first argument; keep the existing logic inside the function (channel/goroutine handshake and ErrCertsOnly handling) unchanged.internal/pdcp/writer.go (1)
94-98: Non-blocking send prevents hangs but silently drops data.The select/default pattern correctly prevents goroutine stalls. However, when the buffer is full, results are permanently lost with only a warning log. Consider tracking dropped results with a counter (similar to
u.counterfor uploads) so users can verify scan completeness at the end.♻️ Suggested enhancement to track dropped results
Add a field to
UploadWriter:droppedCounter atomic.Int32Then update the callback:
func (u *UploadWriter) GetWriterCallback() func(*clients.Response) { return func(resp *clients.Response) { select { case u.data <- resp: default: + u.droppedCounter.Add(1) gologger.Warning().Msgf("PDCP upload buffer full, skipping result") } } }And report in
autoCommitdefer:if dropped := u.droppedCounter.Load(); dropped > 0 { gologger.Warning().Msgf("Dropped %v results due to upload buffer overflow", dropped) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pdcp/writer.go` around lines 94 - 98, Add a dropped results counter to UploadWriter and increment it whenever the non-blocking send to u.data fails; specifically add a droppedCounter (e.g., atomic.Int32) to the UploadWriter struct, replace the select/default branch that currently logs "PDCP upload buffer full, skipping result" with logic that increments u.droppedCounter and still logs the warning, and then in autoCommit's defer read/report the counter (u.droppedCounter.Load()) and emit a warning if >0 so callers can verify dropped results (keep existing u.counter for successful uploads).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-244: The handshake loop uses context.WithTimeout with
c.options.Timeout which if zero yields an already-expired context and makes
conn.HandshakeContext fail; modify the code around
context.WithTimeout/conn.HandshakeContext so it first checks c.options.Timeout
!= 0 and only creates a timeout context in that case, otherwise use
context.Background() (or context.WithCancel) for the handshake; update the
cancel() usage accordingly to only call cancel when a cancellable timeout
context was created (refer to c.options.Timeout, context.WithTimeout,
conn.HandshakeContext, and how ConnectWithOptions guards Timeout).
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 260-265: The cipher-enumeration loop uses context.WithTimeout with
c.options.Timeout which when zero makes the context expire immediately; update
the code around tlsHandshakeWithTimeout and the block that builds ctx, cancel to
guard for c.options.Timeout == 0 (same pattern as ConnectWithOptions) by
choosing a sensible default or skipping the timeout logic so handshakes can run,
then call tlsHandshakeWithTimeout(conn, ctx) and cancel() as before; reference
c.options.Timeout, tlsHandshakeWithTimeout, conn.GetHandshakeLog and
enumeratedCiphers to locate and apply the guard.
---
Nitpick comments:
In `@internal/pdcp/writer.go`:
- Around line 94-98: Add a dropped results counter to UploadWriter and increment
it whenever the non-blocking send to u.data fails; specifically add a
droppedCounter (e.g., atomic.Int32) to the UploadWriter struct, replace the
select/default branch that currently logs "PDCP upload buffer full, skipping
result" with logic that increments u.droppedCounter and still logs the warning,
and then in autoCommit's defer read/report the counter (u.droppedCounter.Load())
and emit a warning if >0 so callers can verify dropped results (keep existing
u.counter for successful uploads).
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 326-341: The function tlsHandshakeWithTimeout currently takes
(tlsConn *tls.Conn, ctx context.Context); reorder parameters to place ctx
context.Context first to follow Go conventions (i.e., change signature to
tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)), and update all
call sites that invoke tlsHandshakeWithTimeout to pass the context as the first
argument; keep the existing logic inside the function (channel/goroutine
handshake and ErrCertsOnly handling) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5846bef1-b0a3-42d1-9379-34888db32676
📒 Files selected for processing (3)
internal/pdcp/writer.gopkg/tlsx/tls/tls.gopkg/tlsx/ztls/ztls.go
Guard against Timeout == 0 causing immediate context expiration. Co-authored-by: Paperclip <noreply@paperclip.ing>
|
I have addressed the review comments regarding timeout validation in . Added guards against to prevent immediate context expiration. |
- Add dropped results counter to PDCP UploadWriter\n- Reorder parameters of tlsHandshakeWithTimeout to follow Go conventions\n\nCo-authored-by: Paperclip <noreply@paperclip.ing>
|
I have addressed the review comments and nitpicks:
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/tlsx/ztls/ztls.go (1)
260-264:⚠️ Potential issue | 🔴 CriticalSet a default 5-second timeout in the ztls cipher enumeration when no timeout is provided.
The codebase convention (as seen in
pkg/tlsx/clients/utils.goand documented incmd/tlsx/main.go) establishes thatTimeout == 0should use a 5-second default. The current code at lines 260-264 breaks this by usingcontext.Background()instead, which creates a context without any deadline. This allows cipher enumeration to hang indefinitely against servers that do not complete the TLS handshake.Proposed fix
- ctx := context.Background() - var cancel context.CancelFunc - if c.options.Timeout != 0 { - ctx, cancel = context.WithTimeout(ctx, time.Duration(c.options.Timeout)*time.Second) - } + timeout := time.Duration(c.options.Timeout) * time.Second + if timeout == 0 { + timeout = 5 * time.Second + } + ctx, cancel := context.WithTimeout(context.Background(), timeout) if err := c.tlsHandshakeWithTimeout(ctx, conn); err == nil { h1 := conn.GetHandshakeLog() enumeratedCiphers = append(enumeratedCiphers, h1.ServerHello.CipherSuite.String()) } - if cancel != nil { - cancel() - } + cancel()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` around lines 260 - 264, The cipher enumeration uses context.Background() when c.options.Timeout == 0, which allows hangs; change this so that when c.options.Timeout == 0 you create a context with a 5-second timeout instead (use context.WithTimeout(ctx, 5*time.Second)) and ensure the resulting cancel function is assigned and deferred; update the logic around ctx, cancel in the ztls cipher enumeration code (referencing c.options.Timeout, ctx and cancel in ztls.go) so the default timeout follows the project convention.
🧹 Nitpick comments (1)
internal/pdcp/writer.go (1)
95-100: Consider rate-limiting the per-drop warning log.The non-blocking send correctly prevents goroutine stalls. However, logging a warning for every dropped result can cause log spam during sustained buffer overflow. Based on the call sites in
runner.go(lines 220-230, 353-359, 414-417), tight loops can generate thousands of results, potentially flooding logs with warnings.Since the total dropped count is already reported at termination (line 135), consider either:
- Removing the per-drop warning entirely, or
- Rate-limiting it (e.g., log once per N drops or once per time interval)
♻️ Option: Log only on first drop, rely on final summary
return func(resp *clients.Response) { select { case u.data <- resp: default: - u.droppedCounter.Add(1) - gologger.Warning().Msgf("PDCP upload buffer full, skipping result") + if u.droppedCounter.Add(1) == 1 { + gologger.Warning().Msgf("PDCP upload buffer full, some results will be skipped (total reported at end)") + } } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pdcp/writer.go` around lines 95 - 100, The per-drop gologger.Warning in the non-blocking send (sending to u.data with default branch) causes log spam; change it to either remove the warning or rate-limit it: add a small logging guard in the uploader (e.g., track a counter or timestamp on the uploader struct) and only call gologger.Warning().Msgf("PDCP upload buffer full, skipping result") on the first drop or once per N drops / per time interval; use u.droppedCounter (or an atomic/int field like lastDropLog or dropsSinceLastLog) to decide when to log so the drop increment stays but the warning is emitted rarely. Ensure the conditional surrounds the existing gologger call and preserve the u.droppedCounter.Add(1) increment and non-blocking send behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-243: EnumerateCiphers currently creates a background context
and only sets a deadline if c.options.Timeout != 0, causing
conn.HandshakeContext to potentially block indefinitely; change the logic in
EnumerateCiphers (around ctx := context.Background(), pool.Acquire and
conn.HandshakeContext) to treat c.options.Timeout == 0 as a 5-second default
(the same default used by GetConn()), i.e. create a context.WithTimeout(ctx,
time.Duration(timeout)*time.Second) where timeout := c.options.Timeout; if
timeout == 0 set timeout = 5, and ensure you call the cancel function defer
cancel() after creating the context so the deadline is always enforced and
resources are released.
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 332-347: tlsHandshakeWithTimeout returns on context timeout while
the goroutine running tlsConn.Handshake may still reference a mutable tls.Config
(created via tls.Client using baseCfg), which allows concurrent mutation of
baseCfg.CipherSuites to race with the in-flight handshake; fix by cloning the
tls.Config for each handshake attempt so the goroutine uses an immutable copy
(i.e., create a shallow copy of baseCfg or use baseCfg.Clone() before calling
tls.Client/handshake) and pass that cloned config to tls.Client/tlsConn so
mutations to baseCfg.CipherSuites in later iterations cannot affect the running
handshake.
---
Duplicate comments:
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 260-264: The cipher enumeration uses context.Background() when
c.options.Timeout == 0, which allows hangs; change this so that when
c.options.Timeout == 0 you create a context with a 5-second timeout instead (use
context.WithTimeout(ctx, 5*time.Second)) and ensure the resulting cancel
function is assigned and deferred; update the logic around ctx, cancel in the
ztls cipher enumeration code (referencing c.options.Timeout, ctx and cancel in
ztls.go) so the default timeout follows the project convention.
---
Nitpick comments:
In `@internal/pdcp/writer.go`:
- Around line 95-100: The per-drop gologger.Warning in the non-blocking send
(sending to u.data with default branch) causes log spam; change it to either
remove the warning or rate-limit it: add a small logging guard in the uploader
(e.g., track a counter or timestamp on the uploader struct) and only call
gologger.Warning().Msgf("PDCP upload buffer full, skipping result") on the first
drop or once per N drops / per time interval; use u.droppedCounter (or an
atomic/int field like lastDropLog or dropsSinceLastLog) to decide when to log so
the drop increment stays but the warning is emitted rarely. Ensure the
conditional surrounds the existing gologger call and preserve the
u.droppedCounter.Add(1) increment and non-blocking send behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 45e33b6f-d3a9-4a56-a2b5-2b61f2108cab
📒 Files selected for processing (3)
internal/pdcp/writer.gopkg/tlsx/tls/tls.gopkg/tlsx/ztls/ztls.go
| ctx := context.Background() | ||
| var cancel context.CancelFunc | ||
| if c.options.Timeout != 0 { | ||
| ctx, cancel = context.WithTimeout(ctx, time.Duration(c.options.Timeout)*time.Second) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and locate the relevant files
fd -t f "tls\.go|utils\.go" | grep -E "(pkg/tlsx|tlsx)" | head -20Repository: projectdiscovery/tlsx
Length of output: 203
🏁 Script executed:
# Read the actual code at lines 239-243 and surrounding context
cat -n pkg/tlsx/tls/tls.go | sed -n '230,260p'Repository: projectdiscovery/tlsx
Length of output: 1252
🏁 Script executed:
# Find and examine the clients/utils.go file to understand the timeout default
fd -t f "utils\.go" | xargs grep -l "Timeout" | grep clientsRepository: projectdiscovery/tlsx
Length of output: 90
🏁 Script executed:
# Read the clients/utils.go file
cat -n pkg/tlsx/clients/utils.goRepository: projectdiscovery/tlsx
Length of output: 5897
🏁 Script executed:
# Search for Timeout handling and GetConn function in clients
rg -A 10 -B 5 "Timeout.*==.*0|GetConn|5.*time\.Second" pkg/tlsx/clients/Repository: projectdiscovery/tlsx
Length of output: 2422
🏁 Script executed:
# Check imports in tls.go to verify ProjectDiscovery library usage
head -50 pkg/tlsx/tls/tls.go | grep -E "^import|^\"" | head -30Repository: projectdiscovery/tlsx
Length of output: 73
🏁 Script executed:
# Get full imports section
cat -n pkg/tlsx/tls/tls.go | sed -n '1,30p'Repository: projectdiscovery/tlsx
Length of output: 1067
🏁 Script executed:
# Final verification: Check if there are any other places in EnumerateCiphers that handle timeout
cat -n pkg/tlsx/tls/tls.go | sed -n '200,255p'Repository: projectdiscovery/tlsx
Length of output: 2120
Add default handshake timeout to cipher enumeration to prevent indefinite blocking.
pkg/tlsx/clients/utils.go treats Timeout == 0 as a 5-second default via GetConn(). However, EnumerateCiphers uses pool.Acquire() directly, bypassing this default. Without a context deadline, conn.HandshakeContext() can hang indefinitely if a peer accepts TCP but never completes the TLS handshake.
Proposed fix
- ctx := context.Background()
- var cancel context.CancelFunc
- if c.options.Timeout != 0 {
- ctx, cancel = context.WithTimeout(ctx, time.Duration(c.options.Timeout)*time.Second)
- }
+ timeout := time.Duration(c.options.Timeout) * time.Second
+ if timeout == 0 {
+ timeout = 5 * time.Second
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
if err := conn.HandshakeContext(ctx); err == nil {
ciphersuite := conn.ConnectionState().CipherSuite
enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite))
}
- if cancel != nil {
- cancel()
- }
+ cancel()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/tlsx/tls/tls.go` around lines 239 - 243, EnumerateCiphers currently
creates a background context and only sets a deadline if c.options.Timeout != 0,
causing conn.HandshakeContext to potentially block indefinitely; change the
logic in EnumerateCiphers (around ctx := context.Background(), pool.Acquire and
conn.HandshakeContext) to treat c.options.Timeout == 0 as a 5-second default
(the same default used by GetConn()), i.e. create a context.WithTimeout(ctx,
time.Duration(timeout)*time.Second) where timeout := c.options.Timeout; if
timeout == 0 set timeout = 5, and ensure you call the cancel function defer
cancel() after creating the context so the deadline is always enforced and
resources are released.
| func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error { | ||
| errChan := make(chan error, 1) | ||
| defer close(errChan) | ||
|
|
||
| go func() { | ||
| errChan <- tlsConn.Handshake() | ||
| }() | ||
|
|
||
| select { | ||
| case <-ctx.Done(): | ||
| return errorutil.NewWithTag("ztls", "timeout while attempting handshake") //nolint | ||
| case errChan <- tlsConn.Handshake(): | ||
| } | ||
|
|
||
| err := <-errChan | ||
| if err == tls.ErrCertsOnly { | ||
| err = nil | ||
| case err := <-errChan: | ||
| if err == tls.ErrCertsOnly { | ||
| err = nil | ||
| } | ||
| return err | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the file to see the code at lines 332-347 and surrounding context
wc -l pkg/tlsx/ztls/ztls.goRepository: projectdiscovery/tlsx
Length of output: 90
🏁 Script executed:
# Read the specific section mentioned in the review
sed -n '320,360p' pkg/tlsx/ztls/ztls.goRepository: projectdiscovery/tlsx
Length of output: 816
🏁 Script executed:
# Search for EnumerateCiphers function and how baseCfg.CipherSuites is used
rg -A 30 "func.*EnumerateCiphers" pkg/tlsx/ztls/Repository: projectdiscovery/tlsx
Length of output: 1792
🏁 Script executed:
# Search for how tlsHandshakeWithTimeout is called and how baseCfg is passed
rg -B 10 "tlsHandshakeWithTimeout" pkg/tlsx/ztls/Repository: projectdiscovery/tlsx
Length of output: 1835
🏁 Script executed:
# Get more context around the tlsHandshakeWithTimeout call in EnumerateCiphers
rg -B 5 -A 15 "conn := tls.Client.*baseConn.*baseCfg" pkg/tlsx/ztls/Repository: projectdiscovery/tlsx
Length of output: 1274
🏁 Script executed:
# Search for the loop that contains this code
rg -B 30 "baseCfg.CipherSuites = \[\]uint16" pkg/tlsx/ztls/Repository: projectdiscovery/tlsx
Length of output: 1860
🏁 Script executed:
# Check if there's any defer cancel() or cleanup that ensures goroutines are waited on
sed -n '140,280p' pkg/tlsx/ztls/ztls.goRepository: projectdiscovery/tlsx
Length of output: 4787
🏁 Script executed:
# Check for go.mod or go version requirements
cat go.mod | head -5Repository: projectdiscovery/tlsx
Length of output: 126
🏁 Script executed:
# Check tls.Config.Clone() documentation from Go standard library
web_search
Go tls.Config Clone() method standard libraryRepository: projectdiscovery/tlsx
Length of output: 246
🌐 Web query:
Go crypto/tls Config Clone() method availability version
💡 Result:
(*crypto/tls.Config).Clone() is part of the Go standard library starting in Go 1.8 (“added in go1.8” in the official crypto/tls docs), and is therefore available in Go 1.8+. [1]
Clone the config to prevent race conditions in cipher enumeration.
On timeout, tlsHandshakeWithTimeout returns immediately while the handshake goroutine continues running. Since tls.Client() holds a reference to the config (not a copy), the next iteration's mutation of baseCfg.CipherSuites can affect the in-flight handshake, causing enumeration flakiness under timeout pressure.
Fix: Clone config for each attempt
- conn := tls.Client(baseConn, baseCfg)
- baseCfg.CipherSuites = []uint16{ztlsCiphers[v]}
+ cfg := baseCfg.Clone()
+ cfg.CipherSuites = []uint16{ztlsCiphers[v]}
+ conn := tls.Client(baseConn, cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/tlsx/ztls/ztls.go` around lines 332 - 347, tlsHandshakeWithTimeout
returns on context timeout while the goroutine running tlsConn.Handshake may
still reference a mutable tls.Config (created via tls.Client using baseCfg),
which allows concurrent mutation of baseCfg.CipherSuites to race with the
in-flight handshake; fix by cloning the tls.Config for each handshake attempt so
the goroutine uses an immutable copy (i.e., create a shallow copy of baseCfg or
use baseCfg.Clone() before calling tls.Client/handshake) and pass that cloned
config to tls.Client/tlsConn so mutations to baseCfg.CipherSuites in later
iterations cannot affect the running handshake.
This PR fixes multiple indefinite hangs in tlsx:
These fixes resolve issue #819 where tlsx hangs indefinitely after processing a large number of targets.
Summary by CodeRabbit