Skip to content

fix(lfs): fix open file handle blocking rename and concurrent upload race#808

Closed
dvrd wants to merge 75 commits intocharmbracelet:mainfrom
dvrd:fix/soft-serve-770
Closed

fix(lfs): fix open file handle blocking rename and concurrent upload race#808
dvrd wants to merge 75 commits intocharmbracelet:mainfrom
dvrd:fix/soft-serve-770

Conversation

@dvrd
Copy link
Copy Markdown

@dvrd dvrd commented Mar 28, 2026

Summary

  • SSH LFS (pkg/git/lfs.go): Remove unnecessary storage.Open() call before storage.Rename(). The open file handle caused ERROR_SHARING_VIOLATION on Windows, preventing temp files from being moved to their final path. tempName is already in scope and passed directly to Rename.
  • HTTP LFS (pkg/web/git_lfs.go): Treat db.ErrDuplicateKey from CreateLFSObject as idempotent success (HTTP 200). git-lfs sends parallel PUT requests for large pushes; the first concurrent request wins the DB insert and subsequent ones hit a duplicate key constraint. Returning 500 caused the client to mark the transfer as failed.

Upstream: #770

Test plan

  • LFS push on Windows — temp file rename no longer fails with sharing violation
  • LFS push with multiple large files — parallel PUTs for the same OID return 200 instead of 500
  • Existing LFS tests still pass: go test ./pkg/git/... ./pkg/web/...

🤖 Generated with Claude Code

dvrd and others added 30 commits March 27, 2026 10:23
…racelet#800)

Access tokens were not providing authentication because:
1. KeyboardInteractiveHandler ignored the challenge callback entirely,
   never prompting for or validating tokens.
2. SSH commands used UserByPublicKey/AccessLevelByPublicKey with no
   fallback to the context user, so token sessions got no identity.

Changes:
- Rewrite KeyboardInteractiveHandler to prompt for an access token,
  validate it via UserByAccessToken, and store the username in the
  permissions extensions under "token-user" key.
- Update AuthenticationMiddleware to resolve token-authenticated users
  from the "token-user" extension and set them in context.
- Add currentUser() helper in cmd.go that resolves user by public key
  first, then falls back to context user (for token sessions).
- Update info, pubkey, set_username commands to use currentUser().
- Fix N+1 query in repo list and TUI selection by using context user
  instead of per-repo public key lookups.
- Add nil guard to AccessLevelByPublicKey with context-user fallback.
- Add integration tests covering all token auth scenarios.

Fixes charmbracelet#800

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(ssh): validate access tokens in keyboard-interactive auth
When passing a public key via SSH, the SSH layer splits the command
string on whitespace. A key like "ssh-ed25519 AAAA..." becomes two
separate tokens, so cobra received the key type as the flag value and
the base64 blob as an extra positional arg, failing ExactArgs(1).

Fix by accepting extra positional args after the username and joining
them as the key value when -k is not provided. This mirrors the
pattern already used by user add-pubkey.

Fixes charmbracelet#750

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a repository is created by an authenticated user, set the
gitweb.owner git config variable to the creator's username.

This allows gitweb and other tools that read gitweb.owner to
display the correct repository owner.

Fixes charmbracelet#753

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…racelet#769)

When viewing a repository in the TUI, pressing 'c' now copies the
git clone command to the clipboard. Previously this shortcut was
only available from the repository list, not inside a repo view.

Fixes charmbracelet#769

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(ssh): accept split key args in user create -k (charmbracelet#750)
feat(backend): set gitweb.owner on repo create (charmbracelet#753)
feat(ui): add 'c' shortcut to copy clone cmd in repo view (charmbracelet#769)
When SSH splits -k "ssh-ed25519 AAAA...", cobra may capture the first
token into the -k flag value with the rest as positional args. The
previous key == "" guard missed this case. Now always merge trailing
args into the key value, matching the add-pubkey pattern.

Also reverts Use string to "create USERNAME" to avoid advertising
an unintended positional-key API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Only trigger clone command copy on Readme tab to avoid conflicting
  with pane-level copy handlers (Files, Log, Refs, Stash each have
  their own 'c' handler for copying file names, commit hashes, etc.)
- Guard against empty clone command when HideCloneCmd is true
- Use more specific status bar message "Clone command copied"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix misleading log when token resolves to nil user (logged err=<nil>,
  now logs err="user not found")
- Simplify currentUser() to use context user directly since
  AuthenticationMiddleware already resolves the user for both
  public key and token auth paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When -k is not provided but extra positional args are present (e.g.
user create alice bob), return a clear error instead of silently
treating them as key material.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The TUI repo selection guard checked only for public key presence,
blocking token-authenticated users when AllowKeyless was disabled.
Now also checks for a context user (set by token auth middleware).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove AccessLevelByPublicKey (zero callers after migration to
  AccessLevelForUser)
- Remove unused *backend.Backend parameter from currentUser()
- Clean up unused 'be' variables in info.go and pubkey.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
initializePermissions created a new permissions object when
ctx.Permissions() returned nil but never called ctx.SetValue
to persist it back, leaving a potential nil-pointer risk for
subsequent callers of ctx.Permissions().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminates a TOCTOU race where a username rename between
KeyboardInteractiveHandler and AuthenticationMiddleware could
resolve to the wrong user. Now stores the user ID and resolves
via UserByID in the middleware.

Also tightens test assertion from 'ssh-' to 'ssh-ed25519 '.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace weak 'stderr .' with specific 'stderr no key found'
- Add TEST 6: extra args without -k flag produce clear error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(ssh): validate access tokens in keyboard-interactive auth (charmbracelet#800)
fix(ssh): accept split key args in user create -k (charmbracelet#750)
feat(ui): add 'c' shortcut to copy clone cmd in repo view (charmbracelet#769)
Root cause: when Write consumed fewer bytes than Read produced, the code
returned n, err where err was nil — silently hiding data loss. Also, a
Read that returns n>0 alongside io.EOF (valid per io.Reader contract)
had its bytes dropped.

Fix: restructure the loop to process any bytes returned before checking
the read error, and return io.ErrShortWrite on a partial write.

Closes charmbracelet#616
…ediately

Root cause: Start() launched all servers in errgroup goroutines. When one
server failed to bind (e.g. EACCES on a privileged port), errg.Wait() had
to wait for all other goroutines -- but those goroutines blocked in
ListenAndServe() forever, so the error was never returned to the caller.

Fix: bind all TCP listeners before launching goroutines. A bind failure
is returned immediately from Start() before any server accepts connections.
Each server gains a Serve(net.Listener) method. HTTPServer.Serve wraps
the listener with TLS when configured.

Closes charmbracelet#645
filepath.Join and filepath.Dir use backslash on Windows. Git tree
paths always use forward slashes, so TreePath lookups failed after
the first directory level -- the path was built with backslashes
but git expected forward slashes, causing navigation to silently
revert to the parent directory on each Enter press.

Replace filepath.Join/Dir with path.Join/Dir (the slash-only
standard library package) in the TUI Files and Readme components.

Closes charmbracelet#681
Operators can now set initial anonymous access level and keyless mode
via config file or environment variables instead of requiring a manual
post-start SSH session with the settings command.

Both settings are applied to the database on every startup when
configured, making it possible to stand up a fully automated server:

  SOFT_SERVE_ANON_ACCESS=no-access
  SOFT_SERVE_ALLOW_KEYLESS=false

Or in config.yaml:
  anon_access: read-only
  allow_keyless: true

The fields are optional: omitting them preserves whatever is in the
database (set via the settings command). Validation in Validate()
rejects unknown access-level strings at startup.

Closes charmbracelet#758
feat(config): add anon_access and allow_keyless config fields (charmbracelet#758)
fix(web): return io.ErrShortWrite and handle partial Read in ReadFrom (charmbracelet#616)
fix(ui): use path instead of filepath for git tree paths on Windows (charmbracelet#681)
fix(server): pre-bind listeners so port permission errors surface immediately (charmbracelet#645)
dvrd and others added 26 commits March 28, 2026 11:16
The previous else-if checked only ctx.Err()==context.Canceled which could
suppress genuine pipe/IO errors. Check exec.ErrWaitDelay explicitly — the
only non-ExitError produced by WaitDelay expiry — before suppressing.
… compat

filepath.Join produces OS-native separators (backslash on Windows) but
glob patterns and git tree paths always use forward slashes. Use path.Join
(always forward slash) so subdirectory README patterns match on Windows.
…ows compat

On Windows, filepath.Dir produces backslash-separated paths. Combined with
the existing path.Join on line 31, this creates mixed separators that break
glob matching. Switch to path.Dir to keep all virtual path operations
consistently using forward slashes.
Pre-existing bug: the goroutine logging stderr copy failures checked
the outer err variable (always nil at that point) instead of the
local erro return value, silently swallowing all stderr pipe errors.
Disk I/O and permission errors were swallowed, leaving users with a
blank readme and no indication of failure. Return ErrorMsg on non-nil
errors so the TUI surfaces them correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ExitCode()==-1 only signals a killed process on Unix; on Windows,
TerminateProcess sets exit code 1. Gate only on ctx.Err()==Canceled
which is true on both platforms when we killed the process.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cmd.Wait() never returns os.ErrNotExist; the ErrInvalidRepo translation
at Wait-time was unreachable dead code copied from the cmd.Start() guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix(ui): stop loading spinner when readme tab opens on empty repo
fix(ui): pressing copy key no longer duplicates items in filtered list
fix(git): treat signal-killed git processes as client disconnects
feat(backend): find README in docs/, .github/, .gitlab/ subdirectories
#16)

* fix(ssh): user create correctly handles space-separated public keys

SSH exec tokenizes commands on spaces, splitting "ssh-ed25519 AAAA..."
into separate args. Reconstruct the full key from the -k flag value and
any trailing positional args. Also handles the positional-only path
(no -k flag) and rejects an empty -k "" value.

Closes charmbracelet#750
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ssh): replace hardcoded key-type allowlist with ParseAuthorizedKey probe

The validKeyPrefixes slice (ssh-, ecdsa-, sk-) would silently reject
future key types (xmss-, post-quantum, etc.). Replace the prefix check
with a full ParseAuthorizedKey probe on the reconstructed key — the
parser is the authoritative validator and handles all current and future
key types without a maintained allowlist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ssh): show full reconstructed key in unexpected-argument error

The error message for an unrecognised positional argument cited only
args[1] (the key-type token) even when multiple extra tokens were
passed. Show the full joined string so the user sees exactly what was
rejected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(ssh): remove redundant ParseAuthorizedKey probe in user create

The probe validated the same joined string that ParseAuthorizedKey at
line 72 would parse immediately after, making it a redundant double
parse with a misleading error frame ("unexpected argument") for what is
really a parse failure. Remove the probe; the final parse provides an
accurate error message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…17)

* fix(git): prevent zombie processes when git commands are context-cancelled

Without cmd.WaitDelay, a context-cancelled exec.CommandContext sends SIGKILL
but the process entry lingers until Wait() is called. Setting WaitDelay ensures
the OS process is reaped even if the stdin pipe goroutine is still running.

Closes charmbracelet#797

* fix(git): clarify WaitDelay semantics — post-termination only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(git): document that ctx carries no deadline in gitServiceHandler

SSH and git-daemon timeouts operate at the net.Conn layer via
SetDeadline, not via context.WithDeadline. The context passed here is
cancellation-only, so WaitDelay cannot interfere with legitimate large
pushes or clones.

* fix(git): suppress pipe-drain error logs on context cancellation

When the git process is killed after context cancellation, the stdout,
stderr and stdin io.Copy goroutines return broken-pipe errors. These
were logged at Error level, producing spurious noise on every client
disconnect. Guard each log with ctx.Err() == nil so errors are only
reported for genuinely unexpected I/O failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(git): suppress ErrWaitDelay unconditionally when no ExitError present

When git exits cleanly (code 0) but pipe goroutines stall for longer
than WaitDelay (30s), cmd.Wait() returns exec.ErrWaitDelay alone with
no ExitError. The previous guard required ctx.Err()==Canceled, so a
slow-reading client on a non-cancelled context would receive a spurious
error even though the git command succeeded.

Remove the context check: this branch is only reached when there is no
ExitError component, meaning git exited 0. ErrWaitDelay is an internal
drain timeout, not a caller-visible error in that case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(git): strip ErrWaitDelay noise from non-zero-exit errors

When git exits non-zero and WaitDelay also fires, cmd.Wait returns
errors.Join(exitErr, exec.ErrWaitDelay). The caller was receiving that
joined error string, which includes "exec: WaitDelay expired" noise
alongside the real exit-status error.

Return exitErr directly (with stderr appended if present) so callers
see a clean exit-status error, not an internal drain-timeout detail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(git): downgrade stdin goroutine broken-pipe log to debug

When git exits cleanly before the client finishes sending, cmd.Wait()
closes the stdin pipe write-end, causing the stdin io.Copy goroutine to
return a broken-pipe error. Since ctx is not cancelled at this point,
the ctx.Err() guard did not suppress the log, producing spurious
Error-level noise on normal push/clone completions.

Downgrade to Debug: a stdin broken-pipe after git exits is expected
operational behaviour, not an error worth alerting on.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ssh): create host key directory before generating key

When ssh.key_path points to a directory that does not yet exist, the
keygen.New call inside wish.WithHostKeyPath fails silently. wish then
falls back to a new ephemeral in-memory host key, causing the server's
SSH identity to change on every restart and triggering host-key-changed
errors for clients.

Add os.MkdirAll for the key file's parent directory before building the
SSH server options. This mirrors the implicit assumption that the
default ssh/ directory exists, and makes any custom key_path work
correctly on first start.

Closes charmbracelet#779
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ssh): skip MkdirAll when KeyPath has no directory component

filepath.Dir on a bare filename returns "." which would silently make
MkdirAll a no-op targeting the process CWD. Guard with dir != "." to
skip the call in that case and avoid confusing failure modes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…mbracelet#645) (#19)

* fix(server): surface bind errors immediately by pre-binding listeners

Previously, Start() launched all servers as goroutines and called
ListenAndServe() inside each one. If a server failed to bind (EACCES
on a privileged port, address already in use, etc.) the goroutine
returned an error to the errgroup — but the errgroup's derived context
was discarded (`errg, _ := errgroup.WithContext`), so the remaining
goroutines kept running and errg.Wait() blocked forever. The error
was never propagated to the caller and the process appeared healthy.

Fix: pre-bind all net.Listeners synchronously before starting any
goroutines. If any net.Listen call fails, all already-bound listeners
are closed (via deferred cleanup) and the error is returned immediately.
Once all binds succeed, goroutines call Serve(ln) with the pre-bound
listener.

Add Serve(net.Listener) to HTTPServer and StatsServer to support this
pattern (SSHServer and GitDaemon already had Serve methods).

Closes charmbracelet#645

* fix(server): shut down all servers when any goroutine fails unexpectedly

errgroup.WithContext was called but its derived context was discarded
(`errg, _ := ...`). When a server goroutine fails at runtime, the
errgroup cancels its derived context but nothing observes it, so the
other goroutines keep running and errg.Wait() blocks forever.

Add a monitor goroutine that waits for the errgroup context to be
cancelled, then calls s.Shutdown() to stop all remaining servers so
errg.Wait() unblocks and the caller sees the original error. When the
parent context (s.ctx) is already done (e.g. SIGTERM), the signal
handler in serve.go is already driving shutdown, so we do nothing to
avoid calling Shutdown twice.

* fix(server): move shutdown monitor outside errgroup to avoid blocking Wait()

The previous commit added a monitor goroutine *inside* the errgroup to
trigger shutdown when a server goroutine fails. This was wrong: during
normal SIGTERM shutdown all server goroutines return nil (ErrServerClosed
is filtered), so the errgroup's derived context is never cancelled by an
error. The monitor goroutine then blocked on <-gctx.Done() forever and
errg.Wait() never returned.

Move the monitor to a bare goroutine outside the errgroup so it does not
participate in errg.Wait(). errgroup.Wait() cancels gctx on return, so
the goroutine always unblocks eventually — no goroutine leak.

Normal SIGTERM path: serve.go's signal handler calls s.Shutdown(), all
server goroutines exit with ErrServerClosed (filtered → nil), errg.Wait()
returns nil immediately, gctx fires, monitor sees s.ctx is cancelled,
returns without calling Shutdown again.

Unexpected failure path: failing goroutine cancels gctx, monitor fires,
s.ctx is not yet cancelled, monitor calls s.Shutdown() to stop remaining
servers so errg.Wait() unblocks and the caller sees the error.

* fix(server): use context.Cause to guard monitor goroutine

The previous guard (s.ctx.Err() != nil) was always false on the SIGTERM
path because serve.go's signal handler calls s.Shutdown() directly
without cancelling s.ctx. This meant the monitor goroutine always called
Shutdown a second time, racing with the signal handler's own call.

errgroup.WithContext uses context.WithCancelCause internally:
- When a goroutine returns a non-nil error, it calls cancel(err)
  → context.Cause(gctx) == err (non-nil) → server crashed, Shutdown needed
- When Wait() returns normally or the parent is cancelled, it calls
  cancel(nil) → context.Cause(gctx) == nil → skip, avoid double-Shutdown

Closes charmbracelet#645

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(server): address adversarial review findings

- server.go: add shutdownOnce to prevent double-Shutdown when SIGTERM
  and a goroutine failure race; fix comment to accurately describe
  errgroup cancel semantics (cancel(g.err), not always cancel(nil))
- http.go: guard ServeTLS against TLSConfig without a certificate
  source (GetCertificate or Certificates); add errors import
- stats.go: document that the stats endpoint intentionally omits TLS

Closes charmbracelet#645

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(server): make Shutdown idempotent via shutdownOnce

Move shutdownOnce protection into Shutdown() itself so every call
path — monitor goroutine and serve.go's SIGTERM handler — is covered
by the same Once guard. The previous approach only wrapped the monitor
path, leaving the SIGTERM path unguarded.

Also fix TLS guard in HTTPServer.Serve: add GetConfigForClient to the
certificate-source check so SNI-routing configs are not rejected.

Closes charmbracelet#645

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(server): propagate shutdown error and guard ListenAndServe TLS

- Store shutdownErr on Server so all concurrent Shutdown callers
  (both monitor goroutine and SIGTERM handler) return the real error;
  process exits with non-zero when shutdown fails after a crash
- Add errgroup version note to context.Cause usage comment
- Apply TLS certificate source guard to ListenAndServe (was only in Serve)

Closes charmbracelet#645

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…armbracelet#456) (#20)

* feat(web): add GET /{repo}/raw/{ref}/{filepath} endpoint

Exposes raw file content over HTTP, closing the parity gap with the
SSH `repo blob` command.

- Route: GET /{repo}/raw/{ref}/{filepath:.*}
- Requires at least ReadOnlyAccess (respects public/private repo settings)
- Content-Type inferred from file extension; falls back to text/plain
  for text files and application/octet-stream for binaries
- Accept: application/octet-stream triggers Content-Disposition: attachment
  so browsers download the file instead of rendering it
- Returns 404 for unknown repo/ref/path and for paths that resolve to
  a directory (tree) rather than a file (blob)

Closes charmbracelet#456

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(web): address security review findings on raw blob endpoint

- Remove redundant access check from handler; withAccess catch-all
  already enforces ReadOnlyAccess (404 on insufficient access) for all
  unrecognised service paths
- Sanitize Content-Type: downgrade text/html, text/javascript, SVG,
  XML and XHTMLvariants to text/plain to prevent stored-XSS
- Add X-Content-Type-Options: nosniff on every response
- Guard against OOM/DoS: blobs > 32 MiB return 413 instead of being
  fully buffered into memory
- Quote and escape Content-Disposition filename per RFC 6266 §4.3
- Detect binary using already-loaded bytes (gitb.IsBinary) instead of
  re-running git cat-file via te.File().IsBinary()
- Add Cache-Control: no-store to prevent proxy caching of mutable refs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(web): strengthen blob endpoint security — allowlist MIME and belt-and-suspenders size check

- Switch sanitizeMIME from a blocklist to an allowlist: only
  text/plain, application/octet-stream, application/json, image/*
  (excluding SVG), audio/*, and video/* pass through; everything else
  (HTML, CSS, JavaScript, SVG, XML, PDF, fonts, multipart) is
  downgraded to text/plain
- Add post-load size check after te.Contents() as a belt-and-suspenders
  guard for the case where te.Size() silently returns 0 on git error
- Fix Content-Length to use FormatInt(int64) instead of Itoa(int)
- Document intentional fall-through in withAccess switch for routes
  that carry no service var (raw blob, go-get)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(web): strip control chars from Content-Disposition filename; add safety comment

- Strip ASCII control chars (0x00-0x1F, 0x7F) from blob filename before
  inserting into Content-Disposition to prevent response-header injection
  via crafted filenames (e.g. CRLF sequences)
- Add comment noting sanitizeMIME has already run before the Accept-override
  block so future edits know they must maintain that invariant

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ge repos (#21)

* fix(mirror): disable gc.auto and remove timeout to prevent CPU peg

git fetch --prune and git remote update --prune both trigger gc --auto
by default. On large repos this spawns git rev-list --objects --all
which pegs a CPU for the entire mirror-sync window.

Fix by prepending -c gc.auto=0 to all git commands in the mirror job.

Also add .WithTimeout(-1) so large-repo syncs are not killed by the
1-minute DefaultTimeout — matching the same flag already used by
ImportRepository for the initial clone.

Also break out of the command loop on first error so git remote update
is not attempted when git fetch already failed.

Closes charmbracelet#777

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mirror): skip LFS sync when git sync fails

When git fetch --prune fails, break out of the command loop and return
early to avoid running LFS sync against a repo whose refs may be
out of date. LFS sync on stale refs pulls wrong objects or fails
in confusing ways.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mirror): fix misleading error log and hoist SSH env outside loop

- Log message now includes the specific command that failed instead of
  hardcoding "git remote update" for all failures including fetch
- Build GIT_SSH_COMMAND string once outside the inner loop; identical
  value was rebuilt on every iteration unnecessarily
- Extend WithTimeout(-1) comment to cite the library guard condition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ssh): validate access tokens in keyboard-interactive auth

KeyboardInteractiveHandler was ignoring the challenge entirely
(discarding it with _) and only checking AllowKeyless. This meant:
- tokens were never validated — any string was treated as no-token
- token-authenticated sessions had no user identity, so
  collaborator-based repo access control was silently bypassed

Fix:
- Prompt for an access token via the keyboard-interactive challenge
- Validate the token with UserByAccessToken; on success store the
  user ID in perms.Extensions[tokenUserExtKey] and return true
  regardless of AllowKeyless
- Fall back to AllowKeyless behavior when no valid token is provided
- Update AuthenticationMiddleware to resolve the user by stored token
  user ID when no public key is present, so collaborator-based
  filtering works for token-authenticated sessions

Also move ctx.SetValue into initializePermissions to ensure the
permissions struct is always persisted after initialization.

Closes charmbracelet#800

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ssh): prevent certificate extension injection in token auth

The previous fix stored the token-authenticated user ID in
perms.Extensions[tokenUserExtKey] (a string key). gossh merges
SSH certificate extensions into the same map during public-key auth,
so a client presenting a crafted certificate with a matching extension
key could impersonate any user without a valid token.

Fix: replace the string-keyed perms.Extensions carrier with a
package-private context key type (tokenAuthUserIDKey{}). A struct type
key can only be set from within this package, making injection
impossible regardless of certificate contents.

Also: log challenge errors at debug level instead of silently treating
them as "no token", and log UserByID failures in the middleware rather
than silently downgrading to anonymous.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ssh): explicit SetValue after pubkey-fp writes; log pubkey lookup error

- Re-add ctx.SetValue(ssh.ContextKeyPermissions, perms) after every
  perms.Extensions mutation in PublicKeyHandler and
  KeyboardInteractiveHandler to make the write explicit rather than
  relying on the implicit pointer identity guarantee from the wish
  context implementation
- Add comment explaining why pubkey-fp is cleared on successful token auth
- Log UserByPublicKey failures at debug level instead of discarding them
  silently with _

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lfs): encode expires_in as seconds, not nanoseconds

time.Duration serialises as nanoseconds in JSON. The git-lfs spec
defines expires_in as seconds, and git-lfs clients expect a small
integer (e.g. 300 for 5 minutes). On 32-bit platforms (i686) the
nanosecond value 300000000000 overflows int32 and causes:

  json: cannot unmarshal number 300000000000 into Go struct field
  sshAuthResponse.expires_in of type int

Fix:
- AuthenticateResponse.ExpiresIn: change time.Duration → int64 and
  encode with int64(d / time.Second) in lfs_auth.go
- Link.ExpiresIn: change *time.Duration → *int64 (field is currently
  unused by the server but must match the spec for future callers)

Closes charmbracelet#781

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lfs): strengthen Link.ExpiresIn doc and remove stale comment

- Explicitly document that Link.ExpiresIn must be whole seconds and
  warn callers against passing raw time.Duration values
- Remove "expire in an hour" comment from lfs_auth.go that was wrong
  (the token actually expires in 5 minutes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
#24)

* fix(web): return io.ErrShortWrite and handle EOF-with-data in ReadFrom

Two bugs in flushResponseWriter.ReadFrom:

1. When nRead != nWrite the code returned `err` which was nil at that
   point (the Write succeeded but was short). Fixed to io.ErrShortWrite,
   matching the behaviour described in io.Copy.

2. The loop broke on io.EOF before writing the bytes returned in the
   same Read call. The io.Reader contract allows returning n > 0 and
   io.EOF simultaneously; breaking before the Write dropped the last
   chunk of data silently. Fixed by writing nRead bytes first, then
   acting on the read error.

Closes charmbracelet#616

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(web): count actual written bytes and add ReadFrom tests

- Count n by nWrite (actual bytes written) not nRead; the old code
  over-reported n when a short write occurred
- Use nWrite < nRead (not !=) to match the io.Copy convention
- Add pkg/web/git_test.go with three regression tests:
  EOF-with-data propagation, io.ErrShortWrite on short write,
  non-EOF read error propagation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(web): fix flush error message format (label before sentinel)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
#25)

* fix(cmd): user create -k accepts multi-word key from unquoted SSH args

OpenSSH strips shell quoting before sending commands over the wire.
Running:
  ssh host user create alice -k 'ssh-ed25519 AAAA...'
transmits as separate tokens: [-k, ssh-ed25519, AAAA...]
Cobra assigns only 'ssh-ed25519' to the -k flag and treats 'AAAA...'
as a second positional arg, triggering "accepts 1 arg(s), received 2".

Fix: change Args from ExactArgs(1) to MinimumNArgs(1) and, when -k
was explicitly provided, re-join the flag value with any trailing
positional args to reconstruct the full authorized-key string.
This matches the existing behaviour of the add-pubkey command which
uses strings.Join(args[1:], " ") for the same reason.

Extra positional args without -k are still rejected.

Closes charmbracelet#750

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cmd): handle empty -k flag, better error messages, add tests

- Explicit error when -k is provided with an empty value instead of
  a confusing ParseAuthorizedKey failure
- Use switch statement for cleaner flag-handling logic
- Format unexpected-args error with strings.Join instead of %v
  (avoids bracket notation in the message)
- Update Use string to reflect that key args are accepted
- Add TestUserCreateKeyReconstruction covering: full key in flag,
  split key across flag + positional args, key type only (error),
  empty value (error), invalid base64 (error)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cmd): add Long description, document empty-key test limitation

- Replace misleading Use string with a Long description that explains
  the SSH quoting stripping behaviour with examples
- Document in the test file that the Changed("key") && key=="" guard
  requires an integration test (testscript) because PersistentPreRunE
  needs a full backend context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…connect (#26)

* fix(ssh): downgrade spurious signal-killed log to debug on client disconnect

When a git client (e.g. Flux Source controller) disconnects early during
git-upload-pack — which is expected behaviour: the client closes the TCP
connection once it has collected the ref advertisements it needs — Go's
exec.CommandContext kills the git subprocess with SIGKILL, causing
cmd.Wait() to return "signal: killed". That error propagated to the
upload-pack handler in pkg/ssh/cmd/git.go and was logged unconditionally
at ERROR level, flooding operator dashboards with false alarms.

Fix: check ctx.Err() before choosing the log level. A non-nil ctx.Err()
means the session context was already cancelled when the error occurred,
so the kill was triggered by a client disconnect, not a server fault.
Log those cases at DEBUG. Genuine server-side failures (ctx still alive)
continue to log at ERROR.

Closes charmbracelet#807

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ssh): use context.Canceled check; include err in debug log

Address two issues from code review:
- ctx.Err() != nil also matches context.DeadlineExceeded (server-side
  timeout), which is a real operational signal operators need to see.
  Switch to ctx.Err() == context.Canceled so only genuine client-
  disconnect cases are silenced to DEBUG.
- The original err value was missing from the debug log, making it
  impossible to distinguish a normal "signal: killed" disconnect from
  an unexpected error that happened to coincide with cancellation.
  Add "err", err to the debug call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Merge all accumulated fixes and features from dev into main. Conflict
resolution strategy:
- server.go: kept main's complete version (shutdownOnce, monitor goroutine,
  listener cleanup on partial failure, TLS validation guard)
- ssh.go: kept main's tokenAuthUserIDKey{} secure approach
- middleware.go: combined both — main's secure tokenAuthUserIDKey{} context
  key for user resolution, plus dev's correct gate check (!isTokenAuth) to
  allow token-auth sessions when AllowKeyless is false. Also adds
  wish.Fatalln denial on token-auth user lookup failure (from dev)
- cmd/user.go: kept main's cleaner key reconstruction logic
- web/git.go: kept main's version (raw blob endpoint, correct EOF-with-data
  fix with n+=nWrite and ErrShortWrite sentinel)
- web/http.go: kept main's version (TLS validation guard in Serve and
  ListenAndServe)
- stats.go: removed duplicate Serve method left by auto-merge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…race

SSH LFS transfer: remove unnecessary storage.Open() call before Rename.
The file handle returned by Open() remained open while storage.Rename()
was called, causing ERROR_SHARING_VIOLATION on Windows and preventing
the temp file from being moved to its final location.

HTTP LFS transfer: treat ErrDuplicateKey from CreateLFSObject as
idempotent success. git-lfs sends parallel PUT requests for large pushes;
the first request wins the DB insert and the rest hit a duplicate key
constraint. Previously this returned 500; now returns 200 so the client
does not retry unnecessarily.

Fixes charmbracelet#770

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dvrd dvrd requested a review from aymanbagabas as a code owner March 28, 2026 17:51
@dvrd
Copy link
Copy Markdown
Author

dvrd commented Mar 28, 2026

Note: the SSH change in this PR overlaps with PR #782 (which uses storage.Exists instead of removing the Open call). This PR's unique contribution is the HTTP concurrent-upload fix in pkg/web/git_lfs.go (treating ErrDuplicateKey from CreateLFSObject as idempotent success). Happy to rebase to drop the SSH portion if #782 lands first.

@dvrd
Copy link
Copy Markdown
Author

dvrd commented Mar 28, 2026

Closing — PRs are being redirected to the fork.

@dvrd dvrd closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant