-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
β¨π Added language-specific handling to search for en, fr, de #23122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
""" WalkthroughThis change updates the search indexing functionality to support locale-aware text processing. The Possibly related PRs
π Recent review detailsConfiguration used: CodeRabbit UI π Files selected for processing (3)
β Files skipped from review due to trivial changes (1)
π§ Files skipped from review as they are similar to previous changes (2)
β° Context from checks skipped due to timeout of 90000ms (2)
β¨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. πͺ§ TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
π§Ή Nitpick comments (1)
apps/sodo-search/src/search-index.js (1)
57-83
: Consider adding documentation for supported languagesSince the PR mentions that Flexsearch currently only supports these three languages for stemming but additional ones could be added, it might be helpful to add a comment documenting the currently supported languages and how to add more in the future.
+// Currently supports locale-specific stemming for English (en), French (fr), and German (de). +// To add support for additional languages: +// 1. Import the language preset from flexsearch/lang/<language-code> +// 2. Add a case for the language code in the switch statement below const chooseEncoder = (locale) => { switch (locale) { case 'en':
π Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π Files selected for processing (2)
apps/sodo-search/src/App.js
(1 hunks)apps/sodo-search/src/search-index.js
(5 hunks)
π§° Additional context used
𧬠Code Graph Analysis (1)
apps/sodo-search/src/search-index.js (1)
ghost/core/core/frontend/src/ghost-stats/ghost-stats.js (1)
locale
(145-145)
π Additional comments (5)
apps/sodo-search/src/App.js (1)
19-20
: LGTM: Locale parameter addition for SearchIndexThe locale parameter is now correctly passed to the SearchIndex constructor, enabling language-specific search functionality. This change aligns perfectly with the PR objectives of adding language-specific stemming support.
apps/sodo-search/src/search-index.js (4)
3-5
: LGTM: Language preset importsAppropriate language presets are imported from Flexsearch to support the locale-specific stemming for English, French, and German.
57-83
: LGTM: Well-implemented locale-specific encoder selectionThe
chooseEncoder
function provides a clean implementation for selecting language-specific encoders based on the locale parameter. The function correctly handles the three supported languages (English, French, German) with a sensible fallback for other locales.This implementation enables the behavior described in the PR objectives where, for example, searching for "informational" in English would now return results containing "information" due to language-specific stemming.
85-85
: LGTM: Updated constructor signatureThe SearchIndex constructor now accepts a locale parameter, which is destructured from the options object. This is consistent with how the other parameters are handled.
101-101
: LGTM: Encoder configuration for all indexesThe change correctly applies the locale-specific encoder to all three indexes (posts, authors, tags), ensuring consistent search behavior across all content types.
Also applies to: 111-111, 121-121
export default class SearchIndex { | ||
constructor({adminUrl, apiKey, dir}) { | ||
constructor({adminUrl, apiKey, dir, locale}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π‘ Verification agent
π§© Analysis chain
Verify the locale value format
Ensure that the locale value passed from App.js is compatible with the switch statement in chooseEncoder. Currently, it expects two-letter language codes ('en', 'fr', 'de'), but browser-provided locale strings might include region codes (e.g., 'en-US', 'fr-FR').
π Script executed:
#!/bin/bash
# Check for any locale handling code that might normalize locale values
rg -A 2 -B 2 "locale.*=|=.*locale" --glob "*.js" --glob "*.ts"
Length of output: 45616
Iβd like to inspect the chooseEncoder
implementation and constructor in apps/sodo-search/src/search-index.js
to confirm thereβs no locale normalization before the switch:
π Script executed:
#!/bin/bash
# Show the chooseEncoder function and constructor in search-index.js
rg -n "chooseEncoder" -A10 -B2 apps/sodo-search/src/search-index.js
Length of output: 1427
Normalize locale to two-letter codes before choosing an encoder
The locale
passed into chooseEncoder
comes directly from the script tag (e.g. "en-US"
) but the switch only matches exact two-letter codes ('en'
, 'fr'
, etc.). Youβll need to strip off any region subtags (and lowercase) so you always pass a supported code:
β’ File: apps/sodo-search/src/search-index.js
β Constructor (lines ~98β102) currently does:
js this.postsIndex = new Flexsearch.Document({ β¦, encoder: chooseEncoder(locale) });
β chooseEncoder
(lines 57β67) only handles exact matches like 'en'
and 'fr'
.
Suggested change:
constructor({adminUrl, apiKey, dir, locale}) {
- const encoder = chooseEncoder(locale);
+ const lang = locale.split(/[-_]/)[0].toLowerCase();
+ const encoder = chooseEncoder(lang);
this.postsIndex = new Flexsearch.Document({
β¦,
encoder
});
// repeat for authorsIndex, tagsIndex
This ensures "en-US"
, "fr-FR"
, etc. fall back to your two-letter presets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (5)
apps/sodo-search/src/search-index.test.js (5)
472-474
: Remove extra blank linesThere are excessive blank lines in these sections that violate your ESLint rules.
await searchIndex.init(); - - let searchResults = searchIndex.search('running');expect(searchResults.posts[0].url).toEqual('http://localhost/ghost/dem-mann/'); - - });Also applies to: 486-489
π§° Tools
πͺ ESLint
[error] 472-474: More than 1 blank line not allowed.
(no-multiple-empty-lines)
539-541
: Remove blank lines before closing braceThese extra blank lines violate your ESLint rules.
expect(searchResults.posts.length).toEqual(2); - - });π§° Tools
πͺ ESLint
[error] 539-541: Block must not be padded by blank lines.
(padded-blocks)
591-593
: Remove blank lines before closing braceThese extra blank lines violate your ESLint rules.
expect(searchResults.posts.length).toEqual(1); - - });π§° Tools
πͺ ESLint
[error] 591-593: Block must not be padded by blank lines.
(padded-blocks)
432-593
: Consider adding a French stemming testThe PR mentions support for French (fr) locale, but there's no test that specifically verifies French stemming behavior. Adding a test case for French would provide more complete coverage of the feature.
Consider adding a test similar to the English and German ones but with French-specific stemming examples, such as "informationnel"/"information" or other common French suffixes.
π§° Tools
πͺ ESLint
[error] 472-474: More than 1 blank line not allowed.
(no-multiple-empty-lines)
[error] 486-489: Block must not be padded by blank lines.
(padded-blocks)
[error] 488-489: More than 1 blank line not allowed.
(no-multiple-empty-lines)
[error] 539-541: Block must not be padded by blank lines.
(padded-blocks)
[error] 591-593: Block must not be padded by blank lines.
(padded-blocks)
432-434
: Improve test suite documentationThe comment is helpful but could be more detailed to explain the specific stemming differences being tested.
- // These tests illustrate differences in stemming between languages en and de. + // These tests illustrate differences in language-specific stemming: + // 1. In English: suffixes like "-ing" are removed, so "running" and "run" match the same documents + // 2. In German: noun case forms like "des Mannes" and "dem Mann" are stemmed to the same root + // 3. With unsupported locales: no language-specific stemming occurs
π Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π Files selected for processing (1)
apps/sodo-search/src/search-index.test.js
(1 hunks)
π§° Additional context used
πͺ ESLint
apps/sodo-search/src/search-index.test.js
[error] 472-474: More than 1 blank line not allowed.
(no-multiple-empty-lines)
[error] 486-489: Block must not be padded by blank lines.
(padded-blocks)
[error] 488-489: More than 1 blank line not allowed.
(no-multiple-empty-lines)
[error] 539-541: Block must not be padded by blank lines.
(padded-blocks)
[error] 591-593: Block must not be padded by blank lines.
(padded-blocks)
β° Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Setup
π Additional comments (3)
apps/sodo-search/src/search-index.test.js (3)
435-489
: Well-structured test for English-specific stemmingThis test effectively demonstrates how English stemming treats words differently than other languages, showing how "running" and "run" both return the same results due to English suffix removal, while German words maintain their distinct forms.
π§° Tools
πͺ ESLint
[error] 472-474: More than 1 blank line not allowed.
(no-multiple-empty-lines)
[error] 486-489: Block must not be padded by blank lines.
(padded-blocks)
[error] 488-489: More than 1 blank line not allowed.
(no-multiple-empty-lines)
490-541
: Effective test for German-specific stemmingThis test nicely demonstrates how German stemming works differently from English, particularly with noun case forms like "des Mannes" and "dem Mann" which are correctly stemmed to the same root in German locale but not in English.
π§° Tools
πͺ ESLint
[error] 539-541: Block must not be padded by blank lines.
(padded-blocks)
542-593
: Good fallback test for unsupported localesThis test properly verifies the fallback behavior when an unsupported locale is specified, showing that custom stemming doesn't occur in this case.
π§° Tools
πͺ ESLint
[error] 591-593: Block must not be padded by blank lines.
(padded-blocks)
no issue