fix: split camelCase and truncate community slugs at word boundaries#605
Open
SHudici wants to merge 1 commit into
Open
fix: split camelCase and truncate community slugs at word boundaries#605SHudici wants to merge 1 commit into
SHudici wants to merge 1 commit into
Conversation
_to_slug() previously lowercased the input and hard-sliced at 30 characters, so camelCase class names produced unreadable, mid-word community names like "testbuildpreanalysispromptbloc" (a real name observed on a ~4.8k-node production graph). Slugs are now built by splitting camelCase/snake_case/punctuated input into words (reusing _split_name), joining with hyphens, and truncating at the last complete word within the 30-char budget. A single word longer than the budget still falls back to a hard cut.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
_to_slug()lowercases its input and hard-slices at 30 characters, so a camelCase class name produces an unreadable, mid-word community name. Observed on a real ~4.8k-node graph:Two problems compound here: camelCase survives lowercasing as one giant "word" (the existing
[^a-z0-9]+substitution never fires inside it), and the[:30]slice cuts wherever it lands.How
_to_slug()now:_split_name()helper,A single word longer than the budget still falls back to a hard cut (best effort).
Community names derived from dominant classes become hyphenated and readable; names from file stems and already-split keywords are unchanged.
Testing
TestToSlugclass: 9 tests covering camelCase splitting, exact-boundary cuts, mid-word cuts, overlong single words, punctuation, and empty input.tests/test_communities.pypasses (36 passed on Windows; the teardown errors are the pre-existing Windows unlink issue addressed separately in test: make the suite pass on Windows #596).Known trade-offs (kept out of scope)
service_SupercalifragilisticAlpha/...Betaboth becomeservice-supercalifragilistic). The hard cut already collided for shared 30-char prefixes; name-level dedup is handled in fix: name community splits from their members and dedupe community names #603._split_namedoes not split consecutive-uppercase acronyms (HTTPServer→httpserver, unchanged from current behavior for every existing caller). Improving that helper would also change keyword extraction, so it belongs in its own change if wanted.