Skip to content

Conversation

@nitishgargiitd
Copy link

@nitishgargiitd nitishgargiitd commented Nov 1, 2025

Agent name: CellCog Agent
Agent org: CellCog AI
A link to information about the agent: https://cellcog.ai
Model name: Claude Sonnet 4.5
Model org: Anthropic

Terminal Bench was excellent to test how cellcog agent is performing in terminal, thanks for creating this!

Few Notes:

  1. I added a capability for cellcog agents to restart the tmux session,
# Step 1: End current asciinema recording gracefully
if tmux_session._recording_path:
    tmux_session.send_keys(["C-d"], block=False, min_timeout_sec=0.5)  # End asciinema stdin mode
    time.sleep(0.5)

# Step 2: Kill tmux session completely
result = tmux_session._exec_run(["tmux", "kill-session", "-t", tmux_session._session_name])

if result.exit_code != 0:
    log.warning(f"kill-session returned {result.exit_code}")

time.sleep(0.5)

# Step 3: Verify session is dead, force kill if needed
if tmux_session.is_session_alive():
    log.warning("Session still alive after kill-session, force killing...")
    tmux_session._exec_run(["tmux", "kill-server"])
    time.sleep(1.0)

# Step 4: Reset internal state
tmux_session._previous_buffer = None

# Step 5: Create fresh tmux session
tmux_session._exec_run(tmux_session._tmux_start_session)

# Step 6: Restart asciinema with APPEND
if tmux_session._recording_path:
    log.warning(f"Restarting asciinema with --append to {tmux_session._recording_path}")

    # CRITICAL: Use --stdin --append to continue recording
    tmux_session.send_keys(
        keys=[
            f"asciinema rec --stdin --append {tmux_session._recording_path}",
            "Enter",
        ],
        block=False,
        min_timeout_sec=3.0,
    )

    # Clear screen
    tmux_session.send_keys(
        keys=["clear", "Enter"],
        block=False,
        min_timeout_sec=0.5,
    )

# Step 7: Verify session is responsive
time.sleep(1.5)

if not tmux_session.is_session_alive():
    raise RuntimeError("Failed to restart tmux session")

if not self._is_session_responsive(tmux_session):
    log.warning("Session restarted but not responsive, attempting recovery...")
    self._cancel_hung_command(tmux_session)
  1. We have a tool for managing files Terminal_File_Get/Update/Create/...,
container, tmux_session = _get_container_and_session(post_office)

# Check if file exists
result = container.exec_run(["test", "-f", full_file_path])
if result.exit_code != 0:
    raise FileNotFoundError(f"File not found in container: {full_file_path}")

# Create temp file with new content
with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".tmp") as tmp:
    temp_path = tmp.name
    tmp.write(file_content)

# Copy to container (overwrites existing)
parent_dir = os.path.dirname(full_file_path)
tmux_session.copy_to_container(
    paths=[Path(temp_path)],
    container_dir=parent_dir if parent_dir else "/",
    container_filename=os.path.basename(full_file_path),
)

# Verify file still exists after update
result = container.exec_run(["test", "-f", full_file_path])
if result.exit_code != 0:
    raise RuntimeError(f"File update verification failed: {full_file_path}")

log.debug(f"Updated file in container: {full_file_path}")

@gemini-code-assist
Copy link

Note

The number of changes in this pull request is too large for Gemini Code Assist to generate a summary.

@TheMikeMerrill
Copy link
Collaborator

Hey there,

We don't allow submissions to modify how the harness is run, just the agent. In theory it would be okay for an agent to manage its own tmux session so this might be okay, but I'd want to look at your code to be sure. Could you share a repo with me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants