Skip to content

fix: name community splits from their members and dedupe community names#603

Open
SHudici wants to merge 1 commit into
tirth8205:mainfrom
SHudici:feat/community-split-naming
Open

fix: name community splits from their members and dedupe community names#603
SHudici wants to merge 1 commit into
tirth8205:mainfrom
SHudici:feat/community-split-naming

Conversation

@SHudici

@SHudici SHudici commented Jul 3, 2026

Copy link
Copy Markdown

Problem

Two related defects make community output hard to trust on real graphs:

  1. Splits are opaque. _split_oversized names every shard "<parent>-sub<N>" and hardcodes "cohesion": 0.0. On a mid-size production graph (~7k nodes, 103 communities), 26 of the 103 communities were indistinguishable services-load-subN entries, all reporting zero cohesion — the numbers looked like detection failures, not results.
  2. Names collide. Nothing enforces uniqueness: the same graph carried five collision groups (services-normalize ×4, services-job ×3, services-fake ×3, services-sql ×2, services-testclientlazyinit ×2 — 14 communities sharing 5 names). get_community resolves by name match, so 9 of those 14 were unreachable by name.

Fix

  • _split_oversized names each shard from its own members via the existing _generate_community_name (fallback "<parent>-<id>" only when members yield nothing), and computes real cohesion for all shards in one _compute_cohesion_batch pass over the full edge set — same cost profile as the top-level detectors.
  • detect_communities runs a new _dedupe_community_names pass after splitting: within a collision group the largest community keeps the bare name; every other member is suffixed with its most distinctive keyword (skipping keywords already in the name and candidates that would collide with any existing name), with a deterministic numeric fallback.

Measured effect (same production repo)

Re-ran community detection on the same graph (103 communities before and after — the partition is untouched, only naming and cohesion change):

metric before after
-subN placeholder names 26 0
communities reporting cohesion 0.0 26 0
duplicate-name groups 5 (14 communities) 0
communities unreachable via get_community by name 9 0

The 26 former services-load-subN shards now carry member-derived names with real cohesion, e.g. services-upload (85 members, 0.29), services-rows (74, 0.42), lx-raw (35, 0.27), iag-market (8, 0.33), afklm-parse (7, 0.41).

Testing

New tests: a dumbbell-graph oversized community splits into member-named shards (no -subN), each with exact expected cohesion (15/16) and parent lineage preserved; dedup keeps the bare name on the largest community, suffixes the rest by keyword, skips keywords already in the name, avoids colliding with existing community names, falls back to a numeric suffix, and leaves unique names untouched. Full suite passes.

Composes with #600 (community naming quality) but does not require it; cut independently from main.

🤖 Generated with Claude Code

Splitting an oversized community produced shards named
"<parent>-sub<N>" with a hardcoded cohesion of 0.0 — on a mid-size
production graph, 14 of the 35 largest communities were opaque
"services-load-subN" entries indistinguishable from each other, all
reporting zero cohesion. Separately, nothing enforced name uniqueness:
the same graph carried three distinct communities all named
"services-job", and get_community resolves by name match, so two of
the three were unreachable by name.

- _split_oversized now names each shard from its own members via
  _generate_community_name (falling back to "<parent>-<id>" only when
  members yield nothing) and computes real cohesion for every shard in
  one _compute_cohesion_batch pass over the full edge set.
- detect_communities runs a new _dedupe_community_names pass: within a
  collision group the largest community keeps the bare name; each other
  member gets its most distinctive keyword as a suffix (skipping
  keywords already in the name and candidates that would collide with
  any existing name), with a deterministic numeric fallback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant