Skip to content

Conversation

@satnaing
Copy link
Owner

Description

This PR implements a hybrid slugify approach to fix the issue where acronyms like "E2E testing" were being converted to "e-2-e-testing" instead of "e2e-testing", while also preserving non-Latin characters in URLs.

Problem

The previous implementation used lodash.kebabcase which:

  • ❌ Converted "E2E testing" → "e-2-e-testing" (adds dashes around numbers)
  • ✅ Preserved non-Latin characters like "နိုင်ငံတကာ" → "နိုင်ငံတကာ"

Solution

Implemented a hybrid approach that:

  • Uses slugify (simov) with { lower: true } for Latin-only strings (handles E2E correctly)
  • Uses lodash.kebabcase for strings containing non-Latin characters (preserves them)

Comparison of Solutions Tested

Solution "E2E testing" "TypeScript 5.0" "HTML5" "နိုင်ငံတကာ" "E2E 世界" Bundle Size Dependencies Status
lodash.kebabcase (original) ❌ "e-2-e-testing" ❌ "type-script-5-0" ❌ "html-5" ✅ "နိုင်ငံတကာ" ❌ "e-2-e-世界" 2.8 kB 0 Original issue
slugify (simov) ✅ "e2e-testing" ✅ "typescript-5.0" ✅ "html5" ❌ "" (removed) ❌ "" (removed) 20.9 kB 0 Removes non-Latin
@sindresorhus/slugify ❌ "e2-e-testing" ❌ "type-script-5-0" ❌ "html-5" ❌ "" (removed) ❌ "" (removed) 17.7 kB 2 Same E2E issue
limax ✅ "e2e-testing" ⚠️ "typescript-5-0" ✅ "html5" ⚠️ "naingngantka" (transliterates) ⚠️ "e2e-shi4-jie4" 28.7 kB 3 Transliterates/removes
Hybrid (current) ✅ "e2e-testing" ✅ "typescript-5.0" ✅ "html5" ✅ "နိုင်ငံတကာ" ⚠️ "e-2-e-世界" 23.7 kB 0 Best fit

Testing

Tested with various inputs:

  • ✅ "E2E testing" → "e2e-testing"
  • ✅ "TypeScript 5.0" → "typescript-5.0"
  • ✅ "HTML5" → "html5"
  • ✅ "နိုင်ငံတကာ" → "နိုင်ငံတကာ" (Burmese: international)
  • ✅ "ระหว่างประเทศ" → "ระหว่างประเทศ" (Thai: international)
  • ✅ "Hello 世界" → "hello-世界"
  • ⚠️ "E2E 世界" → "e-2-e-世界" (edge case - acceptable)

Types of changes

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update (if none of the other choices apply)
  • Others (any other types not listed above)

Checklist

  • I have read the Contributing Guide
  • I have added the necessary documentation (if appropriate)
  • Breaking Change (fix or feature that would cause existing functionality to not work as expected)

Related Issue

Closes: #584

Update slugify function to handle better acronyms and preserve non-latin characters.
- uses slugify (simov) for Latin-only strings to fix acronym issue (eg: e2e-testing)
- uses lodash.kebabcase for strings with non-latin characters to preserve them

Fixes #584
@cloudflare-workers-and-pages
Copy link

Deploying astro-paper with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6ad340c
Status: ✅  Deploy successful!
Preview URL: https://31560aab.astro-paper.pages.dev
Branch Preview URL: https://fix-slugify-acronyms-and-non.astro-paper.pages.dev

View logs

@satnaing satnaing merged commit fb63d96 into main Jan 13, 2026
5 checks passed
@satnaing satnaing deleted the fix/slugify-acronyms-and-non-latin-support branch January 13, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: dashes around numbers in URLs for tags

2 participants