fix: collation repair for space_member/user tables (#482)#489
Conversation
The original compat-repair migration (20260512) converted user_verification to utf8mb4_general_ci but missed space_member and user. On deployments where the DB default is utf8mb4_0900_ai_ci, the queryMembers LEFT JOIN on user_verification triggers Error 1267 (Illegal mix of collations), causing the member list endpoint to return 500. Two-part fix: 1. New migration adds space_member and user to the collation repair (root cause — eliminates the collation asymmetry for all JOINs) 2. Explicit COLLATE utf8mb4_general_ci on the uv JOIN condition as defense-in-depth (survives until the migration runs) Fixes #482
Jerry-Xin
left a comment
There was a problem hiding this comment.
APPROVE — collation repair correctly makes the joined columns match on both sides; idempotent, ordering-safe, no FK/uniqueness/data-loss risk.
This PR addresses the MySQL 8 collation drift behind #482 / #420. On deployments where the DB default is utf8mb4_0900_ai_ci, space_member (created without an explicit COLLATE) inherited that default, while user_verification is explicitly utf8mb4_general_ci, so queryMembers' LEFT JOIN user_verification raised Error 1267 (Illegal mix of collations).
Core correctness (verified at head 051815722a160a3ef722ba274026f8f1b0e280b9):
user_verificationDDL (modules/user/sql/20260505000003_user_legacy01.sql) is explicitlyCOLLATE=utf8mb4_general_ci.- The migration converts
space_memberanduser→utf8mb4_general_ci, i.e. it aligns the previously-drifted side to the join counterpart. Both sides of thesm.uid = uv.user_idcomparison now resolve togeneral_ci. ✅ The repair makes the joined columns match, not just change one side. space_membersource DDL has no explicitCOLLATE(confirms the inheritance bug);usersource DDL is alreadygeneral_ci, so theuserALTER is a no-op on standard installs and only acts on drifted ones.
🔴 Blocking — None.
💬 Non-blocking
🟡 modules/space/db.go:161 — The defensive COLLATE utf8mb4_general_ci pin only covers the uv join. The adjacent LEFT JOIN user u ON u.uid=sm.uid is not pinned, so in the narrow window of "new binary, migration not yet applied, drifted schema" the user join could still hit Error 1267 before the migration repairs space_member. Non-blocking: the new migration fixes both tables and runs at normal startup; this only affects the migrations-disabled edge case.
🔵 modules/base/sql/20260627000001_base_collation_repair_space_user.sql — Consider a lightweight migration-content guard test (as the repo does elsewhere) pinning that both space_member and user stay in the repair list.
✅ Highlights
- Idempotent: each
ALTERis gated by anINFORMATION_SCHEMA.TABLEScollation check inside a stored procedure, mirroring the established20260512000001_base_oss_compat_repair.sqlpattern; safe to re-run (sql-migrate also tracks it once ingorp_migrations). - Version
20260627000001is unique and lexicographically ordered after20260614, so it runs after the legacyuser/space_member/user_verificationcreation migrations (global timestamp ordering). - No FK constraints exist on
space_member/user, so no FK-collation ALTER failure.CONVERT TO CHARACTER SET utf8mb4→utf8mb4is metadata-only at row level;general_cifolds no more aggressively than0900_ai_cion the ASCII UID keys, so no duplicate-key collision on theUNIQUE(space_id,uid)index.
ALTER TABLE ... CONVERT TO CHARACTER SET still triggers a COPY rebuild + table-level metadata lock; on a very large space_member/user table, apply off-hours. Not a code blocker.
OctoBoooot
left a comment
There was a problem hiding this comment.
Review: fix: collation repair for space_member/user tables (#489)
Verdict: Needs author input — one load-bearing question (symmetric collation risk on the robot JOIN) that, if unaddressed, turns this fix into the inverse of the bug it fixes. PR is a draft, so this is feedback-while-in-progress, not a merge gate. CI green.
Risk tier: schema migration touching the user hot table. Fix shape (idempotent stored-proc + defensive COLLATE) is sound; the open question is whether the space_member collation flip creates a new mismatch on a sibling JOIN.
Headline question — does flipping space_member break the robot JOIN? ([bidirectional-edge-construction])
The bug: space_member (inherits DB default, possibly utf8mb4_0900_ai_ci) JOINed to user_verification (forced general_ci by 20260512) → Error 1267. The fix flips space_member → general_ci.
But queryMembers (modules/space/db.go:158-164) JOINs space_member to three tables:
LEFT JOIN user u ON u.uid=sm.uid -- user: fixed by this migration ✓
LEFT JOIN user_verification uv ON uv.user_id=sm.uid COLLATE general_ci -- defensive COLLATE added ✓
LEFT JOIN robot r ON r.robot_id=sm.uid -- robot: NOT pinned, NOT in repair list ⚠The robot table is NOT in 20260512's repaired list (that list has robot_apply, not robot) and is NOT converted by this migration. If robot currently inherits 0900_ai_ci (same default that caused the original bug), then:
- Before this PR:
space_member(0900) JOINrobot(0900) = same collation = OK. - After this PR:
space_member(general_ci) JOINrobot(0900) = NEW Error 1267 on therobotJOIN.
So the fix could move the 1267 from the uv JOIN to the robot JOIN — the symmetric bug. The defensive COLLATE was added only to uv, not robot. I couldn't byte-confirm robot's collation from its legacy DDL (modules/robot/sql/*_robot_legacy01.sql — no explicit COLLATE found, which means it inherits the DB default = the danger case).
Please confirm: on a 0900_ai_ci deployment, after this migration, does space_member ⨝ robot still match? If robot isn't general_ci, either (a) add robot to this migration's convert list, or (b) add the same defensive COLLATE utf8mb4_general_ci to the robot JOIN. The whole-table convert of space_member changes its collation relative to every table it joins, not just uv — so the fix needs to be verified against all three JOIN partners, not just the one that reported the error.
Verified — the parts that are right
- ✅ Idempotent stored-proc migration follows the established 20260512 pattern (skip-if-already-target via
TABLE_COLLATIONcheck,DROP PROCEDUREbefore+after). Won't double-convert on re-run. - ✅ Defensive COLLATE on the uv JOIN (
db.go:162) makes the fix work before the migration runs — good "works during the rollout window" thinking. Comment names the issue (#482) and the rationale. - ✅ Description-vs-diff ([description-vs-diff-crosscheck]): the body claims 20260512 "missed space_member and user" — verified, the 20260512 list (lines 50-66) does NOT contain
space_memberoruser(it hasuser_verification,user_api_key,user_pinned_channel, etc. but not bareuser/space_member). Claim accurate. - ✅
Fixes #482links the P0 regression; the diff plausibly addresses it (collation asymmetry on the queryMembers JOIN).
Major — user is a hot table; CONVERT TO locks it
ALTER TABLE user CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci rewrites every row and holds a metadata lock for the duration — on a large user table this can be a multi-second-to-minutes write-blocking operation during migration. Per skill §3 (migration safety), this warrants either:
- A note in the PR body on expected
usertable size + lock duration in the target deployment, OR - Confirmation it runs in a maintenance window.
CONVERT TO CHARACTER SETis NOT an INSTANT operation (unlike an end-of-table ADD COLUMN) — it's a full table rebuild. Forspace_member(likely smaller) the risk is lower; foruserit's the real operational cost. Not a code defect, but the migration's blast radius should be stated.
Praise
- Defensive COLLATE + migration belt-and-suspenders — the author didn't just add the migration (which fixes it after it runs) OR just the query COLLATE (which fixes it without a schema change); they did both, so the fix works during the rollout window (query-level) AND eliminates the asymmetry permanently (schema-level). That's the right shape for a collation bug that bites mid-deploy.
- Idempotent stored-proc pattern reused, not reinvented — follows 20260512's exact structure (TABLE_COLLATION guard, MAX() lookup form). Consistency with the existing repair migration means the same operational expectations apply.
- Root-cause writeup names the exact mechanism — DB-default
0900_ai_civs forcedgeneral_cionuser_verification, the specific JOIN, the Error 1267, the #420 precedent. A reviewer can verify the diagnosis without re-deriving it.
Recommended before un-drafting
- Resolve the
robotJOIN question — confirmrobot's collation post-migration, and either convert it or pin its JOIN. This is the symmetric-bug risk and the load-bearing item. - Add a one-line note on
usertable CONVERT lock duration / maintenance-window expectation. - Optional: sweep the other
space_member ⨝ user/robotJOINs (bot_api/groups.go:335,340,search/api.go:316,space/db.go:355) — they share the same tables and could hit the same 1267 on a 0900 deployment; worth confirming they're covered by theuser/space_memberconversion.
mochashanyao
left a comment
There was a problem hiding this comment.
[Octo-Q · automated review]
Verdict: Request changes — blocking findings below (data-flow traced).
octo-server PR#489 Review Report
Reviewer: Octo-Q (automated review)
Head SHA: 051815722a160a3ef722ba274026f8f1b0e280b9
PR Title: fix: collation repair for space_member/user tables (#482)
Changed files: 2 (+58 −1)
1. Verification Summary
| Item | Status | Evidence |
|---|---|---|
| Migration SQL syntax | ✅ | Stored procedure with idempotent IF guard, MAX(TABLE_COLLATION) avoids shared-handler pitfall |
| Migration pattern fidelity | ✅ | Follows 20260512000001_base_oss_compat_repair.sql pattern (proc + DROP before/after + no-op Down) |
COLLATE placement in queryMembers |
✅ | sm.uid COLLATE utf8mb4_general_ci correctly pins comparison collation; coercibility 1 > 2, so general_ci wins |
| Defensive COLLATE before migration runs | ✅ | Both sides resolve to general_ci — no Error 1267 |
| Migration after it runs (uv JOIN) | ✅ | Both space_member and user_verification at general_ci — no Error 1267 |
| Migration after it runs (user JOIN) | ✅ | Both space_member and user at general_ci — no Error 1267 |
| Migration after it runs (robot JOIN) | ❌ | NEW mismatch introduced — see P1 below |
2. Findings
🔴 P1 — Migration converts space_member to general_ci but excludes robot table → breaks space_member JOIN robot on MySQL 8 default deployments
Severity: P1 (blocking)
Diff-scope: NEW — introduced by this PR's migration. Before the migration both tables are at DB default (same collation → JOIN works). After, they diverge.
Root cause analysis (data-flow trace)
-
robottable DDL (modules/robot/sql/20210926000001_robot_legacy01.sql:6):create table `robot` ( ... robot_id VARCHAR(40) not null default '', ... );
No explicit
CHARSETorCOLLATE— inherits DB default. -
20260512 repair (
modules/base/sql/20260512000001_base_oss_compat_repair.sql:57-63): repair list covers 17 tables includingrobot_apply, butrobotis NOT in the list. No subsequent migration adds COLLATE torobot. -
After this PR's migration runs (on MySQL 8 default
utf8mb4_0900_ai_ci):space_member→utf8mb4_general_ci(converted by this PR)robot→utf8mb4_0900_ai_ci(inherits DB default, NOT repaired)
-
MySQL coercibility rule: both sides have coercibility 2 (string columns with implicit collations). Different collations at equal coercibility → Error 1267: Illegal mix of collations.
Affected queries (ALL currently working → ALL broken after migration)
| Query | File:Line | JOIN | Impact |
|---|---|---|---|
queryMembers |
modules/space/db.go:163 |
LEFT JOIN robot r ON r.robot_id=sm.uid |
Member list endpoint |
querySpaceBots |
modules/space/db.go:354 |
INNER JOIN robot r ON r.robot_id=sm.uid |
Bot list in Space |
| bot listing | modules/robot/api.go:1290 |
INNER JOIN robot r ON r.robot_id = sm.uid |
Robot API |
| integration lookup | modules/integration/db.go:159 |
INNER JOIN space_member sm ON sm.uid=r.robot_id |
Integration |
| integration lookup | modules/integration/db.go:269,298 |
same pattern | Integration |
| botfather check | modules/botfather/db.go:78 |
INNER JOIN space_member sm ON sm.uid=r.robot_id |
BotFather |
Note: queryMembers at line 162 has defensive COLLATE on the uv JOIN but NOT on the robot JOIN at line 163.
R1 test
This makes currently working production paths unavailable (member listing, bot listing, botfather, integrations all return 500). On any MySQL 8 deployment using utf8mb4_0900_ai_ci as default, the migration creates exactly the same class of bug (#482) it's trying to fix — just shifted to a different JOIN pair. Worse than doing nothing for those paths.
Fix options (pick one)
Option A — Include robot in the repair migration (recommended, simplest):
Add a third block to the stored procedure:
-- Repair robot
SELECT MAX(TABLE_COLLATION) INTO v_collation
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'robot';
IF v_collation IS NOT NULL AND v_collation <> 'utf8mb4_general_ci' THEN
ALTER TABLE `robot` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
END IF;Option B — Defensive COLLATE on ALL space_member JOIN robot (belt-and-suspenders):
Same pattern as the uv JOIN fix, but applied to every r.robot_id=sm.uid / sm.uid=r.robot_id across the codebase.
Option C — Verify actual production robot collation before merging:
SELECT TABLE_NAME, TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN ('space_member', 'user', 'robot', 'user_verification');If robot is already utf8mb4_general_ci (e.g. from an undocumented manual ALTER), no code change needed — but document this in the PR description to close the question.
ℹ️ P2 — Defensive COLLATE only covers queryMembers, not other space_member JOINs (pre-migration window)
Severity: P2 (non-blocking)
Diff-scope: Pre-existing (these JOINs existed before this PR; PR adds protection only for one)
On deployments with MySQL 8 default before the migration runs:
space_member JOIN userinsearchMembers(db.go:183),countMembers(db.go:204),search/api.go:316,bot_api/groups.go:335,340— currently safe because BOTH tables inherit DB default.space_member JOIN robotinquerySpaceBots(db.go:354) — currently safe, same reason.
These are not broken today, but the asymmetric defensive coverage means the PR's "defense-in-depth before migration" guarantee is incomplete. If a future change converts user or robot to general_ci (without converting space_member first), the unprotected JOINs break.
Recommendation: Consider a broader sweep of defensive COLLATEs on all cross-table uid JOINs involving space_member, or merge the migration first and verify before removing COLLATEs.
3. Recommendations
- [Must-fix before merge] Add
robotto the repair migration (Option A above), or verify its production collation and document (Option C). Without this, the migration is a net negative for bot-related queries. - [Should-fix] Consider adding defensive COLLATE to the
robotJOIN inqueryMembers(db.go:163) for the same belt-and-suspenders rationale as theuvJOIN fix. - [Nit] Migration comment at line 7 says "The original repair converted 17 tables … but missed space_member and user" — consider also mentioning
robothere (either "also missed robot, verified safe" or "repairing robot too").
4. Additional Findings
bot_mention_pref(20260603000001_bot_mention_pref.sql:14) IS created withCOLLATE=utf8mb4_general_ci— evidence that newer tables are being properly pinned. The gap is specifically in older tables that predate explicit collation practices.- The
COLLATE utf8mb4_general_ciinqueryMembersis a runtime-only hint — it does NOT alter the column definition. Safe to keep permanently or remove after migration is confirmed on all deployments.
5. Data-Flow Trace
| Consumed data | Upstream source | Flows correctly? |
|---|---|---|
sm.uid in uv JOIN |
space_member.uid column (DB default → general_ci after migration) |
✅ COLLATE pins to general_ci |
uv.user_id in uv JOIN |
user_verification.user_id (explicit general_ci from creation DDL) |
✅ |
u.uid in user JOIN |
user.uid (DB default → general_ci after migration) |
✅ after migration; safe before (both DB default) |
r.robot_id in robot JOIN |
robot.robot_id (DB default, NOT repaired) |
❌ after migration: sm.uid=general_ci, r.robot_id=0900_ai_ci → Error 1267 |
sm.space_id in WHERE |
space_member.space_id |
✅ not affected by collation change |
Migration v_collation |
information_schema.TABLES.TABLE_COLLATION via MAX() |
✅ handles missing tables (NULL check) |
6. Blind-Spot Checklist (R5)
- C1 — Dual-path parity: N/A. Migration is unidirectional (no Down logic, consistent with 20260512 pattern). No subscribe/unsubscribe or create/delete symmetry involved.
- C2 — Control-flow ordering / nesting reuse: N/A. COLLATE is a single-expression modifier; no multi-call-site or nesting risk.
- C3 — Authorization boundary ≠ capability boundary: N/A. Schema DDL change + read-query collation fix. No auth/authz implications.
- C4 — Authorization lifecycle / container-member cascade: N/A. No authorization changes.
- C5 — Build/note ≠ runtime path: HIT — The migration passes
go buildand static SQL review, but the runtime path forspace_member JOIN robotbreaks after migration on MySQL 8 default deployments. Build-time validation does not catch collation mismatches. The PR description notes "实际端点验证需在非规范 collation 部署上进行" — correct, but the gap is that the migration itself creates a new mismatch, not just fails to fix an old one. - C6 — Governance / policy doc consistency: N/A. Not a governance/policy PR.
7. Cross-Round Blocker Re-Check (R6)
N/A — first review of this PR.
[Octo-Q] verdict: REQUEST_CHANGES
Reasoning: The migration converts space_member to utf8mb4_general_ci but excludes robot (which also inherits DB default, confirmed from source DDL). After migration, all space_member JOIN robot queries (queryMembers:163, querySpaceBots:354, robot/api.go:1290, integration/db.go:159/269/298, botfather/db.go:78) will hit Error 1267 on MySQL 8 default deployments — these are currently working paths that the migration makes unavailable (R1). Fix: add robot to the repair procedure, or verify its production collation before merge.
Retracting my APPROVE: the migration converts space_member+user to general_ci but not robot, which queryMembers/searchMembers also join on sm.uid; on MySQL-8 deployments this moves Error 1267 to the robot join. Re-reviewing as REQUEST_CHANGES.
Jerry-Xin
left a comment
There was a problem hiding this comment.
Verdict: REQUEST_CHANGES — retracting my earlier APPROVE. I missed the symmetric-collation regression that OctoBoooot and mochashanyao correctly flagged. After reading the live source I can confirm it is real and PR-introduced.
🔴 Blocking — this migration moves Error 1267 from the uv join to the robot join (new regression)
modules/space/db.go queryMembers joins space_member to THREE tables on sm.uid:
LEFT JOIN user u ON u.uid=sm.uid(not COLLATE-pinned)LEFT JOIN user_verification uv ON uv.user_id=sm.uid COLLATE utf8mb4_general_ci(pinned — this PR's fix)LEFT JOIN robot r ON r.robot_id=sm.uid(not COLLATE-pinned)
searchMembers likewise joins space_member → user and space_member → robot, neither pinned.
The migration converts space_member and user to utf8mb4_general_ci, but robot is never converted — it is not in this migration, and it was not in the prior 20260512000001 repair (that list converts robot_apply, not robot; the only robot references there are an unrelated idx_robot_bot_token index repair). I checked the robot table DDL in modules/robot/sql/20210926000001_robot_legacy01.sql: create table \robot`is declared with **no explicitCOLLATE/CHARACTER SET**, so it inherits the DB default — exactly like space_member/user` did before this PR.
Consequence on a MySQL-8 deployment (DB default utf8mb4_0900_ai_ci):
- Before this PR:
space_member,user,robotall inherit0900_ai_ci→space_member ⋈ robotMATCHES (theuvjoin was the only broken one, becauseuser_verificationwas explicitly converted togeneral_ciby20260512— that is the #482 bug). - After this PR:
space_member→general_ciwhilerobotstays0900_ai_ci→space_member ⋈ robotbecomes a NEWError 1267: Illegal mix of collations, throwing 500 onqueryMembersandsearchMembers.
So as written the migration does not eliminate the collation fault — it relocates it from the verification join to the robot join.
Fix (either is sufficient; (a) preferred for completeness):
- (a) Add
robotto the converted-table list in this migration (same idempotentINFORMATION_SCHEMA-gatedCONVERT TO ... utf8mb4_general_ciproc), so all ofspace_member's join partners (user,user_verification,robot) are uniformlygeneral_ci. SweepqueryMembers/searchMembersfor any othersm.uidjoin partner and include them too. - (b) Or pin
COLLATE utf8mb4_general_cion therobot(anduser) join conditions in bothqueryMembersandsearchMembers, mirroring theuvpin — defense-in-depth for the pre-migration window.
🟡 Non-blocking (carried from prior pass)
- The
userjoin inqueryMembersis also unpinned; the migration convertsuserso it is fixed at startup, but during the rollout window (new binary, migration not yet applied)space_member ⋈ useris unprotected. The (b) pin would close this too. ALTER ... CONVERT TO CHARACTER SETrebuilds the table under a metadata lock (noALGORITHM=INPLACE/INSTANT); fine at current sizes and consistent with the prior migration, but worth an off-hours/maintenance-window note for largeusertables.- A lightweight migration content-preservation test (assert the repair list keeps the intended tables) would guard re-run stability and catch a future dropped table.
✅ What is correct
- The idempotent stored-proc pattern (INFORMATION_SCHEMA collation check, skip-if-already-target,
DROP PROCEDURE IF EXISTSbefore/after) is sound and reuses the established20260512recipe. - The
uv-joinCOLLATEpin and its rationale comment are correct for that join. utf8mb4 → utf8mb4conversion is metadata-only at the row level (no truncation);general_civs0900_ai_ciare both case/accent-insensitive on the ASCII UID keys, so no new uniqueness collision onUNIQUE(space_id,uid); no FK constraints on these tables, so no FK-collation ALTER failure.
The fix direction is right and the migration scaffolding is solid — it just needs to cover all of space_member's join partners (the robot table) so the conversion completes the repair instead of shifting it. Credit to OctoBoooot and mochashanyao for catching the robot-join regression.
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #489 (octo-server)
Reviewed at head SHA 051815722a160a3ef722ba274026f8f1b0e280b9 against main.
Summary
The PR correctly diagnoses #482: on MySQL 8 deployments whose DB default is utf8mb4_0900_ai_ci, space_member and user were created without an explicit COLLATE and so inherited 0900_ai_ci, while the 20260512 compat-repair migration forced user_verification to utf8mb4_general_ci. That asymmetry is what throws Error 1267 on the space_member ⋈ user_verification join in queryMembers.
The fix is the right idea but incomplete in a way that moves the same error to a new join instead of removing it. space_member is joined to three tables in this code path, not one.
Blocking issue (P1) — converting space_member/user introduces a new Error 1267 on the robot join
queryMembers joins space_member to user, user_verification, and robot:
modules/space/db.go:162 LEFT JOIN user u ON u.uid=sm.uid
modules/space/db.go:163 LEFT JOIN user_verification uv ON uv.user_id=sm.uid COLLATE utf8mb4_general_ci
modules/space/db.go:164 LEFT JOIN robot r ON r.robot_id=sm.uid
This migration converts space_member and user to utf8mb4_general_ci, but robot is not converted:
- New migration converts only
space_memberanduser—modules/base/sql/20260627000001_base_collation_repair_space_user.sql:26-40 - The 20260512 repair cursor list does not include
robot(it only repairs a robot index, not the table collation) —modules/base/sql/20260512000001_base_oss_compat_repair.sql:56-67and:98-115 robotsource DDL has no explicit charset/collation, so it also inherits0900_ai_cion MySQL 8 —modules/robot/sql/20210926000001_robot_legacy01.sql:6-17
Net effect on a 0900_ai_ci deployment:
- Before this PR:
space_member,user,robotare all0900_ai_ci→ therobotjoin is fine; only theuser_verificationjoin is broken (= #482). - After this PR:
space_member/userbecomegeneral_ciwhilerobotstays0900_ai_ci→LEFT JOIN robot r ON r.robot_id=sm.uidnow throwsError 1267.
So the migration trades one collation mismatch for another, and the defensive COLLATE was added only to the uv join (db.go:163), not the robot join (db.go:164) — so it does not protect this path either.
The same space_member/user ↔ robot pairing exists in several other queries that will break the same way after the migration:
querySpaceBots—modules/space/db.go:354(INNER JOIN robot r ON r.robot_id=sm.uid)searchMembersbuilds the samespace_member+robotjoin —modules/space/db.go:184- Space bot listing —
modules/robot/api.go:1289-1290 - Integration bot → space lookup —
modules/integration/db.go:159 - BotFather robots-by-creator-in-space —
modules/botfather/db.go:78 user↔robotlookups:modules/user/db_pinned.go:72,modules/group/bot_ownership.go:68,modules/user/api.go:4043,modules/bot_api/obo_db.go:1626
Recommended fix
Two coherent options; either resolves the P1:
- Add
robotto the table-conversion set (and audit any other*_id/uidjoin partner ofspace_member/user). This is the durable fix — it makes the join graph collation-consistent. Note the next-domino risk below. - Pin
COLLATE utf8mb4_general_cion every column-column equality that crosses these tables (therobotjoin inqueryMembers/querySpaceBots/searchMembers, etc.), matching what was done foruv. This survives the pre-migration window but is easy to miss a site.
Preferably both: convert the tables for correctness, and keep the defensive COLLATE on the hot path joins for the deploy-ordering window.
Next-domino risk if you only add robot (P2)
A table-only fix that adds just robot can shift the mismatch to other legacy UID tables that also lack an explicit collation and are joined to robot/user, e.g. group_member and friend:
group_member.uid = robot.robot_id—modules/robot/db.go:224friend.to_uid = user.uid/robot.robot_id—modules/robot/api.go:1401-1402
The safest path is a single coherent collation pass over all UID-join tables rather than table-by-table patching.
Other findings (non-blocking)
P2 — migration ordering creates a transient split-brain window. The procedure alters space_member before user. During that window space_member is general_ci while user is still 0900_ai_ci, so any concurrent space_member ⋈ user query (e.g. queryMembers) can throw Error 1267 mid-migration. modules/base/sql/20260627000001_base_collation_repair_space_user.sql:27-40. This is transient (clears once both ALTERs finish) but worth noting for prod rollout.
P2 — locking / online-DDL. ALTER TABLE ... CONVERT TO CHARACTER SET rebuilds the table and holds a metadata lock; on large user/space_member tables this can block writes and induce replication lag. The 20260512 migration already documents this and tells operators to run off-hours — carry the same operator note here, and consider ALGORITHM=INPLACE, LOCK=NONE where supported. modules/base/sql/20260627000001_base_collation_repair_space_user.sql:28,36.
P2 — idempotency granularity. The guard reads TABLE_COLLATION from information_schema.TABLES. If a prior partial/manual DDL changed the table default but left individual VARCHAR columns (e.g. uid) at the old collation, the script skips the table and leaves the join still broken. Checking information_schema.COLUMNS for the join columns would be more robust. modules/base/sql/20260627000001_base_collation_repair_space_user.sql:23-26,31-34.
What I could not verify
- No live MySQL
0900_ai_ciinstance to reproduceError 1267end-to-end; analysis is static (matches the PR's own note that CI runsgeneral_ciand can't cover this). - Production table sizes for
user/space_member(lock-time impact) are unknown. - I traced the main
space_member/userjoin partners but did not claim exhaustive coverage of every*_idequality in the repo.
Verdict
The migration is well-structured and idempotent, and the diagnosis is correct, but as written it relocates Error 1267 to the space_member/user ↔ robot joins on the exact MySQL 8 deployments it targets, affecting queryMembers, querySpaceBots, and several bot/integration paths. Requesting changes to bring robot (and any other UID-join partner) into the same collation and/or pin COLLATE on the remaining cross-collation joins.
根因 / 实现说明
queryMembers(modules/space/db.go:158) 的LEFT JOIN user_verification uv ON uv.user_id=sm.uid在utf8mb4_0900_ai_ci默认 collation 的部署上触发 Error 1267(space_member继承 DB 默认 collation,user_verification被 compat-repair 迁移强制为general_ci)space_member和user表改动摘要
20260627000001_base_collation_repair_space_user.sql:将space_member和user转换为utf8mb4_general_ci(存储过程模式,幂等,遵循 20260512 的已有模式)queryMembers的uvJOIN 条件加COLLATE utf8mb4_general_ci,在迁移运行前也能正常工作测试
go build ./...✓0900_ai_ci环境的 MySQL 实例(CI 的general_ci环境无法覆盖)SELECT TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME IN ('space_member','user_verification','user');验证 (baseline diff, charter §6)
go build ./...: before = pass, after = pass, delta = 0 new failuresFixes #482