现象
当 Team CR 通过 spec.workers[].name 引用一个已存在的标准 Worker CR 时,TeamReconciler 一开始确实计算出正确的 channels.matrix.groupAllowFrom = [leader, admin, sibling-workers] 并通过 DeployWorkerConfig 写到 MinIO(agents/<worker>/openclaw.json)。
但随后 WorkerReconciler 任何一次 reconcile(informer event / periodic resync / status patch trigger)都会重新生成 openclaw.json 并覆盖回 standalone 默认 [manager, admin] —— 因为 workerMemberContext 只读 hiclaw.io/team-leader annotation 来判断 team 关系,shared-external Worker CR 不带这个 annotation。
最终 MinIO 上 specialist 的 groupAllowFrom:
[
"@manager:<homeserver>",
"@admin:<homeserver>"
]
(实际场景再叠加 accessibleTeams Human reconciler 通过 ChannelPolicy.GroupAllowExtra 注入的 user MXID,但仍然缺 leader 与 sibling)。
最小复现链路
# 1. apply 标准 Worker CR(不带 hiclaw.io/team-leader annotation)
cat <<YAML | hiclaw apply -f -
apiVersion: hiclaw.io/v1beta1
kind: Worker
metadata:
name: worker-pop
spec:
model: qwen3.5-plus
runtime: copaw
YAML
# 2. 等 worker provisioned,dump MinIO config([manager, admin],符合预期)
docker exec hiclaw-controller mc cat hiclaw/hiclaw-storage/agents/worker-pop/openclaw.json | jq .channels.matrix.groupAllowFrom
# 3. apply Team CR 引用上述 worker
cat <<YAML | hiclaw apply -f -
apiVersion: hiclaw.io/v1beta1
kind: Team
metadata:
name: t1
spec:
description: minimum repro
leader:
name: t1-lead
model: qwen3.5-plus
soul: ...
agents: ...
workers:
- {name: worker-pop}
YAML
# 4. 等 Team active 后立即 dump(看 TeamReconciler 写的版本)
# → 这一步偶尔能看到 [t1-lead, admin, ...](race-win),偶尔已被覆盖
docker exec hiclaw-controller mc cat hiclaw/hiclaw-storage/agents/worker-pop/openclaw.json | jq .channels.matrix.groupAllowFrom
# 5. 触发任意一次 worker reconcile(restart 容器 / patch status / 等 periodic resync)
docker restart hiclaw-worker-worker-pop
# 6. 再 dump 一次(被 WorkerReconciler 写回 standalone)
docker exec hiclaw-controller mc cat hiclaw/hiclaw-storage/agents/worker-pop/openclaw.json | jq .channels.matrix.groupAllowFrom
# → [
# "@manager:<homeserver>",
# "@admin:<homeserver>"
# ]
影响
Leader 的 Matrix REST PUT /rooms/<team-room>/send/m.room.message/<txn> dispatch(按 AGENTS.md 教 Leader 的方式)落到 specialist 后:
- m.text body 含
@worker-pop:<homeserver> mention
m.mentions.user_ids 含 @worker-pop:<homeserver>
- Matrix server 端投递正常
但 specialist worker 的 channel filter 第一关 sender allowlist check 发现 sender @t1-lead:<homeserver> 不在 groupAllowFrom,在 requireMention 检查之前就静默 drop。specialist 容器 log 完全没有 event 痕迹,没有 LLM inference,Team 协作链路断在 leader→specialist 这一跳。
整个 ADR 0010 风格的 Team Room (Leader-led dispatch) 因此对外部消费者不可用。
Code-level trace
internal/controller/worker_controller.go:264:
TeamLeaderName: w.Annotations["hiclaw.io/team-leader"],
internal/controller/team_controller.go:711-727 + :870-892 teamWorkerSpecToWorkerSpec:
// team-side projection adds leader + admin + peers to ChannelPolicy
policy = appendGroupAllowExtra(policy, t.Spec.Leader.Name)
// ...
for _, peer := range t.Spec.Workers {
if peer.Name != w.Name {
policy = appendGroupAllowExtra(policy, peer.Name)
}
}
internal/agentconfig/generator.go:208-218:
groupAllowFrom := []string{managerMatrixID, adminMatrixID}
if req.TeamLeaderName != "" {
leaderMatrixID := fmt.Sprintf("@%s:%s", req.TeamLeaderName, domain)
groupAllowFrom = []string{leaderMatrixID, adminMatrixID}
}
两个 reconciler 用不同信息源算出不同 WorkerConfigRequest,因为它们共享同一 MinIO 写入路径,最后一次写入决定 specialist 的实际 allowlist。
我在本地分支 feat/team-worker-groupallow-peers 加了一个 lock-in unit test (TestWorkerReconciler_ExternalTeamWorker_MissesTeamContext) 把这条 asymmetry 锁住,PASS 表示当前 bug 存在;maintainer 确认方向后我可以把测试翻成 fix 后的 parity 断言,一并 PR。
提议方向(doctrine 问题需要 maintainer 先对齐)
我倾向 WorkerReconciler 主动反查 Team membership(field indexer on Team.Spec.Workers[].name),union 所有引用 Team 的 leader/admin/peers 进 groupAllowFrom;hiclaw.io/team-leader annotation 退化为单 team 场景的可选 hint,不再是必要条件。
但这之前请 maintainer 先表态:
- Team CR 通过
spec.workers[].name 引用 existing Worker CR 是否是受支持路径?还是说 Team 的 worker 必须内联在 spec.workers[] 里(让 TeamReconciler 拥有完整 spec ownership)?
- 同一 Worker 被多个 Team 引用时("shared worker across teams" 用例,例如多业务网关给同一 user 复用 worker),allowlist 应该 union 所有 team 的 leader/peer,还是按 Team 隔离(即每个 team 应该有自己的 worker 副本)?
- WorkerReconciler 是否应该感知 Team membership?还是说应该让 TeamReconciler 在 reconcile Team 时给 Worker CR 打 annotation / owner reference(这样 WorkerReconciler 不变,但需要面对多 team annotation 怎么编码的问题)?
- owner reference 模型可行性:把 Worker CR 设为 Team CR 的 owner reference 的话,多 team 引用时怎么处理(K8s ownerReferences 允许多个 controller-flag false 的 owner,但 controller-flag true 只能一个)?
任一方向都能让 specialist 收到 Leader dispatch,但对 K8s ownership 模型和升级路径有不同影响,希望先听 maintainer 的偏好再写完整 PR。
环境
Phenomenon
When a Team CR references an existing standard Worker CR via spec.workers[].name, TeamReconciler does initially calculate the correct channels.matrix.groupAllowFrom = [leader, admin, sibling-workers] and writes it to MinIO (agents/<worker>/openclaw.json) via DeployWorkerConfig.
But then any reconciliation (informer event / periodic resync / status patch trigger) of WorkerReconciler will regenerate openclaw.json and overwrite back to the standalone default [manager, admin] - because workerMemberContext only reads the hiclaw.io/team-leader annotation to determine the team relationship, and the shared-external Worker CR does not carry this annotation.
Finally groupAllowFrom of specialist on MinIO:
[
"@manager:<homeserver>",
"@admin:<homeserver>"
]
(In the actual scenario, the user MXID injected by accessibleTeams Human reconciler through ChannelPolicy.GroupAllowExtra is superimposed, but the leader and sibling are still missing).
Minimum recurring link
# 1. apply standard Worker CR (without hiclaw.io/team-leader annotation)
cat <<YAML | hiclaw apply -f -
apiVersion: hiclaw.io/v1beta1
Kind: Worker
metadata:
name: worker-pop
spec:
model: qwen3.5-plus
runtime: copaw
YAML
# 2. Wait for worker provisioned, dump MinIO config ([manager, admin], as expected)
docker exec hiclaw-controller mc cat hiclaw/hiclaw-storage/agents/worker-pop/openclaw.json | jq .channels.matrix.groupAllowFrom
# 3. apply Team CR refers to the above worker
cat <<YAML | hiclaw apply -f -
apiVersion: hiclaw.io/v1beta1
Kind: Team
metadata:
name: t1
spec:
description: minimum repro
leader:
name: t1-lead
model: qwen3.5-plus
soul: ...
agents: ...
workers:
- {name: worker-pop}
YAML
# 4. Dump immediately after Team active (see the version written by TeamReconciler)
# → This step can occasionally see [t1-lead, admin, ...] (race-win), and occasionally it has been overwritten.
docker exec hiclaw-controller mc cat hiclaw/hiclaw-storage/agents/worker-pop/openclaw.json | jq .channels.matrix.groupAllowFrom
# 5. Trigger any worker reconcile (restart container/patch status/etc. periodic resync)
docker restart hiclaw-worker-worker-pop
# 6. Dump again (written back to standalone by WorkerReconciler)
docker exec hiclaw-controller mc cat hiclaw/hiclaw-storage/agents/worker-pop/openclaw.json | jq .channels.matrix.groupAllowFrom
# → [
# "@manager:<homeserver>",
# "@admin:<homeserver>"
# ]
Impact
The Leader's Matrix REST PUT /rooms/<team-room>/send/m.room.message/<txn> dispatch (according to the method taught by AGENTS.md to the Leader) falls after the specialist:
- m.text body contains
@worker-pop:<homeserver> mention
m.mentions.user_ids contains @worker-pop:<homeserver>
- Matrix server delivery is normal
However, the first pass of the specialist worker's channel filter sender allowlist check found that sender @t1-lead:<homeserver> was not in groupAllowFrom, and it silently dropped before the requireMention check. The specialist container log has no event traces at all, no LLM inference, and the Team collaboration link is broken at the leader→specialist hop.
The entire ADR 0010 style Team Room (Leader-led dispatch) is therefore not available to external consumers.
Code-level trace
internal/controller/worker_controller.go:264:
TeamLeaderName: w.Annotations["hiclaw.io/team-leader"],
internal/controller/team_controller.go:711-727 + :870-892 teamWorkerSpecToWorkerSpec:
// team-side projection adds leader + admin + peers to ChannelPolicy
policy = appendGroupAllowExtra(policy, t.Spec.Leader.Name)
// ...
for _, peer := range t.Spec.Workers {
if peer.Name != w.Name {
policy = appendGroupAllowExtra(policy, peer.Name)
}
}
internal/agentconfig/generator.go:208-218:
groupAllowFrom := []string{managerMatrixID, adminMatrixID}
if req.TeamLeaderName != "" {
leaderMatrixID := fmt.Sprintf("@%s:%s", req.TeamLeaderName, domain)
groupAllowFrom = []string{leaderMatrixID, adminMatrixID}
}
The two reconcilers calculate different WorkerConfigRequest with different sources of information, and since they share the same MinIO write path, the last write determines the specialist's actual allowlist.
I added a lock-in unit test (TestWorkerReconciler_ExternalTeamWorker_MissesTeamContext) to the local branch feat/team-worker-groupallow-peers to lock this asymmetry. PASS indicates that the current bug exists; after the maintainer confirms the direction, I can convert the test into a post-fix parity assertion and PR together.
Propose direction (doctrine issues require maintainer to align first)
I prefer WorkerReconciler Proactively check Team membership (field indexer on Team.Spec.Workers[].name), union all leader/admin/peers that reference Team into groupAllowFrom; hiclaw.io/team-leader annotation is reduced to an optional hint in a single team scenario, and is no longer a necessary condition.
But before that, please maintainer take a stand:
- Is Team CR referencing existing Worker CR via
spec.workers[].name a supported path? Or do Team's workers have to be inlined in spec.workers[] (giving TeamReconciler full spec ownership)?
- When the same Worker is referenced by multiple Teams ("shared worker across teams" use case, such as multiple service gateways reusing workers for the same user), should the allowlist union the leaders/peers of all teams, or should it be isolated by Team (that is, each team should have its own copy of the worker)?
- Should WorkerReconciler be aware of Team membership? Or should TeamReconciler give annotation / owner reference to Worker CR when reconciling Team (in this way, WorkerReconciler remains unchanged, but we need to face the problem of how to code multiple team annotations)?
- owner reference model feasibility: If Worker CR is set as the owner reference of Team CR, how to deal with multiple team references (K8s ownerReferences allows multiple owners with controller-flag false, but only one controller-flag true)?
Either direction can allow the specialist to receive Leader dispatch, but it has different impacts on the K8s ownership model and upgrade path. We hope to listen to the maintainer's preferences before writing a complete PR.
Environment
现象
当 Team CR 通过
spec.workers[].name引用一个已存在的标准 Worker CR 时,TeamReconciler 一开始确实计算出正确的channels.matrix.groupAllowFrom = [leader, admin, sibling-workers]并通过DeployWorkerConfig写到 MinIO(agents/<worker>/openclaw.json)。但随后
WorkerReconciler任何一次 reconcile(informer event / periodic resync / status patch trigger)都会重新生成openclaw.json并覆盖回 standalone 默认[manager, admin]—— 因为workerMemberContext只读hiclaw.io/team-leaderannotation 来判断 team 关系,shared-external Worker CR 不带这个 annotation。最终 MinIO 上 specialist 的
groupAllowFrom:(实际场景再叠加
accessibleTeamsHuman reconciler 通过ChannelPolicy.GroupAllowExtra注入的 user MXID,但仍然缺 leader 与 sibling)。最小复现链路
影响
Leader 的 Matrix REST
PUT /rooms/<team-room>/send/m.room.message/<txn>dispatch(按 AGENTS.md 教 Leader 的方式)落到 specialist 后:@worker-pop:<homeserver>mentionm.mentions.user_ids含@worker-pop:<homeserver>但 specialist worker 的 channel filter 第一关 sender allowlist check 发现 sender
@t1-lead:<homeserver>不在groupAllowFrom,在requireMention检查之前就静默 drop。specialist 容器 log 完全没有 event 痕迹,没有 LLM inference,Team 协作链路断在 leader→specialist 这一跳。整个 ADR 0010 风格的 Team Room (Leader-led dispatch) 因此对外部消费者不可用。
Code-level trace
internal/controller/worker_controller.go:264:internal/controller/team_controller.go:711-727+:870-892 teamWorkerSpecToWorkerSpec:internal/agentconfig/generator.go:208-218:两个 reconciler 用不同信息源算出不同
WorkerConfigRequest,因为它们共享同一 MinIO 写入路径,最后一次写入决定 specialist 的实际 allowlist。提议方向(doctrine 问题需要 maintainer 先对齐)
我倾向
WorkerReconciler主动反查 Team membership(field indexer onTeam.Spec.Workers[].name),union 所有引用 Team 的 leader/admin/peers 进groupAllowFrom;hiclaw.io/team-leaderannotation 退化为单 team 场景的可选 hint,不再是必要条件。但这之前请 maintainer 先表态:
spec.workers[].name引用 existing Worker CR 是否是受支持路径?还是说 Team 的 worker 必须内联在spec.workers[]里(让 TeamReconciler 拥有完整 spec ownership)?任一方向都能让 specialist 收到 Leader dispatch,但对 K8s ownership 模型和升级路径有不同影响,希望先听 maintainer 的偏好再写完整 PR。
环境
dev / embedded(self-build offorigin/main = e21ac83,外加 PR feat(controller): add PUT /api/v1/humans/{name} route — fixes #729 #796 PUT route — 但 feat(controller): add PUT /api/v1/humans/{name} route — fixes #729 #796 与本 issue 无关,PUT route 没用上)higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/hiclaw-copaw-worker:v1.1.1hiclaw-net,container DNS 别名通过--network-alias正确挂接Phenomenon
When a Team CR references an existing standard Worker CR via
spec.workers[].name, TeamReconciler does initially calculate the correctchannels.matrix.groupAllowFrom = [leader, admin, sibling-workers]and writes it to MinIO (agents/<worker>/openclaw.json) viaDeployWorkerConfig.But then any reconciliation (informer event / periodic resync / status patch trigger) of
WorkerReconcilerwill regenerateopenclaw.jsonand overwrite back to the standalone default[manager, admin]- becauseworkerMemberContextonly reads thehiclaw.io/team-leaderannotation to determine the team relationship, and the shared-external Worker CR does not carry this annotation.Finally
groupAllowFromof specialist on MinIO:(In the actual scenario, the user MXID injected by
accessibleTeamsHuman reconciler throughChannelPolicy.GroupAllowExtrais superimposed, but the leader and sibling are still missing).Minimum recurring link
Impact
The Leader's Matrix REST
PUT /rooms/<team-room>/send/m.room.message/<txn>dispatch (according to the method taught by AGENTS.md to the Leader) falls after the specialist:@worker-pop:<homeserver>mentionm.mentions.user_idscontains@worker-pop:<homeserver>However, the first pass of the specialist worker's channel filter sender allowlist check found that sender
@t1-lead:<homeserver>was not ingroupAllowFrom, and it silently dropped before therequireMentioncheck. The specialist container log has no event traces at all, no LLM inference, and the Team collaboration link is broken at the leader→specialist hop.The entire ADR 0010 style Team Room (Leader-led dispatch) is therefore not available to external consumers.
Code-level trace
internal/controller/worker_controller.go:264:internal/controller/team_controller.go:711-727+:870-892 teamWorkerSpecToWorkerSpec:internal/agentconfig/generator.go:208-218:The two reconcilers calculate different
WorkerConfigRequestwith different sources of information, and since they share the same MinIO write path, the last write determines the specialist's actual allowlist.Propose direction (doctrine issues require maintainer to align first)
I prefer
WorkerReconcilerProactively check Team membership (field indexer onTeam.Spec.Workers[].name), union all leader/admin/peers that reference Team intogroupAllowFrom;hiclaw.io/team-leaderannotation is reduced to an optional hint in a single team scenario, and is no longer a necessary condition.But before that, please maintainer take a stand:
spec.workers[].namea supported path? Or do Team's workers have to be inlined inspec.workers[](giving TeamReconciler full spec ownership)?Either direction can allow the specialist to receive Leader dispatch, but it has different impacts on the K8s ownership model and upgrade path. We hope to listen to the maintainer's preferences before writing a complete PR.
Environment
dev/embedded(self-build offorigin/main = e21ac83, plus PR feat(controller): add PUT /api/v1/humans/{name} route — fixes #729 #796 PUT route — but feat(controller): add PUT /api/v1/humans/{name} route — fixes #729 #796 has nothing to do with this issue, PUT route is not used)higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/hiclaw-copaw-worker:v1.1.1hiclaw-net, container DNS alias is correctly mounted through--network-alias