Skip to content

fix: split camelCase and truncate community slugs at word boundaries#605

Open
SHudici wants to merge 1 commit into
tirth8205:mainfrom
SHudici:fix/community-slug-word-boundary
Open

fix: split camelCase and truncate community slugs at word boundaries#605
SHudici wants to merge 1 commit into
tirth8205:mainfrom
SHudici:fix/community-slug-word-boundary

Conversation

@SHudici

@SHudici SHudici commented Jul 4, 2026

Copy link
Copy Markdown

What

_to_slug() lowercases its input and hard-slices at 30 characters, so a camelCase class name produces an unreadable, mid-word community name. Observed on a real ~4.8k-node graph:

services-testbuildpreanalysispromptbloc   # from TestBuildPreAnalysisPromptBlock...

Two problems compound here: camelCase survives lowercasing as one giant "word" (the existing [^a-z0-9]+ substitution never fires inside it), and the [:30] slice cuts wherever it lands.

How

_to_slug() now:

  1. normalizes punctuation to spaces,
  2. splits camelCase/snake_case into words via the existing _split_name() helper,
  3. joins with hyphens,
  4. truncates at the last complete word within the 30-char budget.

A single word longer than the budget still falls back to a hard cut (best effort).

TestBuildPreAnalysisPromptBlock  ->  test-build-pre-analysis-prompt   (was: testbuildpreanalysispromptbloc)
AuthService                      ->  auth-service                     (was: authservice)
rate_sheet                       ->  rate-sheet                       (unchanged)

Community names derived from dominant classes become hyphenated and readable; names from file stems and already-split keywords are unchanged.

Testing

  • New TestToSlug class: 9 tests covering camelCase splitting, exact-boundary cuts, mid-word cuts, overlong single words, punctuation, and empty input.
  • tests/test_communities.py passes (36 passed on Windows; the teardown errors are the pre-existing Windows unlink issue addressed separately in test: make the suite pass on Windows #596).

Known trade-offs (kept out of scope)

  • Word-boundary truncation can collide slightly more often than a hard cut when two long names share every complete word within the budget (service_SupercalifragilisticAlpha / ...Beta both become service-supercalifragilistic). The hard cut already collided for shared 30-char prefixes; name-level dedup is handled in fix: name community splits from their members and dedupe community names #603.
  • _split_name does not split consecutive-uppercase acronyms (HTTPServerhttpserver, unchanged from current behavior for every existing caller). Improving that helper would also change keyword extraction, so it belongs in its own change if wanted.

_to_slug() previously lowercased the input and hard-sliced at 30
characters, so camelCase class names produced unreadable, mid-word
community names like "testbuildpreanalysispromptbloc" (a real name
observed on a ~4.8k-node production graph).

Slugs are now built by splitting camelCase/snake_case/punctuated input
into words (reusing _split_name), joining with hyphens, and truncating
at the last complete word within the 30-char budget. A single word
longer than the budget still falls back to a hard cut.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant