-
-
Notifications
You must be signed in to change notification settings - Fork 251
Feature/fuzzy search logic #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/fuzzy search logic #556
Conversation
WalkthroughThis pull request introduces fuzzy search functionality across the system. On the backend, a new query parameter triggers fuzzy matching using a Levenshtein-based algorithm, and the search logic in the repository is updated accordingly. The frontend is modified to support the new parameter through updated type definitions, UI elements, and localization. Additionally, several shell scripts have been added to automate branch creation, testing branch updates, and branch organization for feature development and Docker testing. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant FE as Frontend (items.vue)
participant AH as API Handler (v1_ctrl_items.go)
participant Repo as Repository (repo_items.go)
participant FM as FuzzyMatch (fuzzy_search.go)
U->>FE: Enters search query and toggles fuzzy search
FE->>AH: Sends GET request with fuzzySearch parameter
AH->>Repo: Calls QueryByGroup with fuzzySearch flag enabled
Repo->>FM: Invokes FuzzyMatch for each item attribute
FM-->>Repo: Returns similarity results
Repo-->>AH: Returns filtered search results
AH-->>FE: Sends API response with matching items
FE->>U: Renders search results
Security Recommendations
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (10)
🧰 Additional context used🪛 Shellcheck (0.10.0)update-and-organize-branches.sh[error] 103-103: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @). (SC2199) [warning] 103-103: Remove quotes from right-hand side of =~ to match as a regex rather than literally. (SC2076) 🔇 Additional comments (16)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 3
🧹 Nitpick comments (11)
frontend/lib/api/classes/items.ts (1)
31-31: LGTM: Proper type definition added for fuzzy search.The addition of the optional boolean property for fuzzy search is well-typed and correctly positioned in the ItemsQuery interface.
Consider adding a JSDoc comment to document what this property controls and its default behavior:
/** * When true, enables fuzzy matching for search queries to find similar results * that don't exactly match the search term. Defaults to false. */ fuzzySearch?: boolean;Always ensure fuzzy search implementation has appropriate rate limiting to prevent potential DoS attacks from resource-intensive string comparisons.
backend/app/api/handlers/v1/v1_ctrl_items.go (1)
34-34: API documentation properly added for the new parameter.The Swagger documentation correctly specifies the new fuzzy search parameter.
Consider adding more detail to the description, such as explaining the default behavior and performance implications:
// @Param fuzzySearch query bool false "Enable fuzzy matching for more flexible search results (may impact performance)"backend/internal/data/repo/repo_items.go (4)
370-370: Consider making the fuzzy threshold configurable.The hardcoded threshold of 0.7 might not be optimal for all scenarios. Different data sets might require different thresholds for optimal search results.
Consider refactoring to make this configurable:
- const fuzzyThreshold = 0.7 // Adjust threshold as needed (0.0 to 1.0) + fuzzyThreshold := 0.7 // Default threshold + if config.FuzzySearchThreshold > 0 { + fuzzyThreshold = config.FuzzySearchThreshold + }
372-381: Security recommendation: Add rate limiting for fuzzy searches.Fuzzy searches are more computationally expensive than regular searches. Without rate limiting, this could potentially be exploited for a denial of service attack by repeatedly making fuzzy search requests with different parameters.
Consider implementing rate limiting specifically for fuzzy search operations to protect against abuse.
373-378: Consider adding logging for rejected fuzzy matches.For debugging purposes, it would be helpful to log when items are close to but don't quite meet the fuzzy threshold, especially during initial deployment.
for _, i := range fuzzyItems { + nameScore := FuzzyMatchScore(i.Name, q.Search) + descScore := FuzzyMatchScore(i.Description, q.Search) + // ... other scores + + log.Debug(). + Str("itemName", i.Name). + Str("itemID", i.ID.String()). + Str("query", q.Search). + Float64("nameScore", nameScore). + Float64("descScore", descScore). + Msg("Fuzzy match scores") + if FuzzyMatch(i.Name, q.Search, fuzzyThreshold) || FuzzyMatch(i.Description, q.Search, fuzzyThreshold) || FuzzyMatch(i.SerialNumber, q.Search, fuzzyThreshold) || FuzzyMatch(i.ModelNumber, q.Search, fuzzyThreshold) || FuzzyMatch(i.Manufacturer, q.Search, fuzzyThreshold) || FuzzyMatch(i.Notes, q.Search, fuzzyThreshold) { fuzzyPredicates = append(fuzzyPredicates, item.ID(i.ID)) } }
355-394:❓ Verification inconclusive
Performance concern with current fuzzy search implementation.
The current implementation fetches all items from the database and then filters them in memory. While this works, it could lead to performance issues with large datasets as:
- It loads all items into memory at once
- The filtering happens in application code rather than at the database level
Consider these alternatives for better performance:
- Implement fuzzy matching directly in the database using PostgreSQL's pg_trgm extension
- Add pagination to the fuzzy search results to limit memory usage
- Make the fuzzy threshold configurable rather than hardcoded to 0.7
Additionally, I recommend adding metrics logging to monitor the performance impact of fuzzy searches versus regular searches.
🏁 Script executed:
#!/bin/bash # Check if this is a PostgreSQL database and if pg_trgm extension could be used # Look for database connection configuration grep -r "postgres" --include="*.go" backend/internal/data/ # Check if we're using PostgreSQL and what ORM/database libraries grep -r "database/sql" --include="*.go" backend/internal/ grep -r "ent" --include="*.go" backend/internal/data/Length of output: 66376
Fuzzy Search Performance & Security Improvements
It’s important to note that the current fuzzy search implementation fetches all items into memory and then applies filtering in application code. This could lead to performance and scalability issues on large datasets. Moreover, while the suggestion to use PostgreSQL’s pg_trgm extension is valid for production environments, our grep results hint that tests run with SQLite. Please manually verify the production database configuration before proceeding.
Recommendations:
- Database Integration: If using PostgreSQL in production, leverage the pg_trgm extension to perform fuzzy matching directly in SQL.
- Pagination: Introduce pagination for fuzzy search results to mitigate excessive memory usage.
- Configurable Threshold: Externalize the fuzzy threshold (currently hardcoded to 0.7) to enable tuning.
- Metrics & Logging: Implement dedicated metrics logging to monitor the performance impact of fuzzy searches versus standard searches.
- Security Advisory: Sanitize and validate the search input rigorously to avoid potential injection risks.
scripts/create-feature-test-branch.sh (2)
1-80: Thorough shell script to create testing branches.This script provides a clean way to create test branches for Docker testing of features. The error handling is good, with proper checks for branch existence and graceful failure handling.
I do notice a security consideration:
The script doesn't sanitize the input parameter, which could potentially lead to command injection if special characters are used in the branch name. Consider adding input validation to ensure the branch name follows Git naming conventions.
+ # Validate branch name format + if [[ ! $FEATURE_BRANCH =~ ^[a-zA-Z0-9_/-]+$ ]]; then + echo "Error: Invalid branch name format. Use only alphanumeric characters, underscores, hyphens and slashes." + exit 1 + fi
59-75: Consider using a better method for cherry-picking changes.The script is currently only trying to cherry-pick the latest commit from the feature branch. If the feature branch has multiple commits, this might not be sufficient.
Consider cherry-picking all commits from where the feature branch diverged from the base:
- # Get the commit hash of the latest commit in the feature branch - FEATURE_COMMIT=$(git rev-parse $FEATURE_BRANCH) - - # Try cherry-picking - if git cherry-pick $FEATURE_COMMIT; then + # Find common ancestor + COMMON_ANCESTOR=$(git merge-base $BASE_BRANCH $FEATURE_BRANCH) + + # Get list of commits to cherry-pick + COMMITS_TO_PICK=$(git rev-list --reverse $COMMON_ANCESTOR..$FEATURE_BRANCH) + + # Try cherry-picking all commits + if [ -n "$COMMITS_TO_PICK" ] && git cherry-pick $COMMON_ANCESTOR..$FEATURE_BRANCH; thenscripts/create-fuzzy-search-branch.sh (1)
1-120: Well-structured script for creating the fuzzy search feature branch.The script is well-organized with clear separation of concerns, helpful documentation, proper error handling, and cleanup mechanisms. I particularly like the trap setup to ensure we return to the original branch even if an error occurs.
Security consideration:
The script creates the branch from main without verifying if the user has the latest main branch, which could lead to divergence issues. Consider adding a verification step to ensure the local main branch is up-to-date with remote before creating the feature branch.
git checkout "$MAIN_BRANCH" git pull origin "$MAIN_BRANCH" + # Verify we have latest main + LOCAL_MAIN_SHA=$(git rev-parse HEAD) + REMOTE_MAIN_SHA=$(git rev-parse origin/"$MAIN_BRANCH") + if [ "$LOCAL_MAIN_SHA" != "$REMOTE_MAIN_SHA" ]; then + echo "Warning: Your local $MAIN_BRANCH is not up-to-date with remote." + read -p "Continue anyway? (y/n) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo "Aborting." + exit 1 + fi + fi git checkout -b "$FEATURE_BRANCH"scripts/update-docker-testing-branch.sh (2)
1-60: Good automation script for managing Docker testing branches.This script provides a seamless way to update the Docker testing branch with multiple feature branches. The error handling for merge conflicts is especially good.
Security consideration:
The script is directly merging feature branches without verifying their content or running tests. Consider adding a verification step before merging each feature branch:
merge_feature() { local branch=$1 echo "Merging $branch into $DOCKER_BRANCH..." + # Verify branch doesn't contain potentially problematic changes + echo "Checking branch $branch for security issues..." + git diff --name-only $MAIN_BRANCH..$branch | grep -q "docker/" && { + echo "⚠️ Warning: Branch $branch modifies Docker-related files!" + read -p "Continue with merge? (y/n) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo "Skipping branch $branch" + return 1 + fi + } + # Try to merge, but don't fail the script if there are conflicts if git merge --no-ff $branch -m "Merge $branch into $DOCKER_BRANCH for testing"; then
40-46: Consider dynamically determining feature branches.The script currently uses a hardcoded list of feature branches to merge. This might require frequent updates as new features are developed.
Consider adding an option to automatically detect feature branches that match certain criteria:
# List of feature branches to merge # Add or remove branches as needed FEATURE_BRANCHES=( "feature/favorite-items" # Add more feature branches here # "feature/another-feature" ) + # Option to dynamically find feature branches + if [ "$1" == "--auto-detect" ]; then + echo "Auto-detecting feature branches..." + # Find all local branches matching feature/* pattern that have commits newer than main + AUTO_BRANCHES=$(git for-each-ref --format='%(refname:short)' refs/heads/feature/* | + while read branch; do + if [ "$(git rev-list -1 $MAIN_BRANCH..$branch)" ]; then + echo "$branch" + fi + done) + + if [ -n "$AUTO_BRANCHES" ]; then + read -p "Found these feature branches to merge: $AUTO_BRANCHES. Use these instead? (y/n) " -n 1 -r + echo + if [[ $REPLY =~ ^[Yy]$ ]]; then + FEATURE_BRANCHES=($AUTO_BRANCHES) + fi + fi + fi
🛑 Comments failed to post (3)
update-and-organize-branches.sh (2)
72-72: 💡 Verification agent
🧩 Analysis chain
Consider adding error handling for git push failures.
The script attempts to delete remote branches but doesn't handle authentication or permission failures specifically.
🏁 Script executed:
#!/bin/bash # Check if git credentials are properly configured if ! git config --get credential.helper >/dev/null; then echo "WARNING: Git credential helper not configured. Push operations may require manual authentication." fi # Check if remote connectivity works if ! git ls-remote --heads origin >/dev/null 2>&1; then echo "ERROR: Cannot connect to remote 'origin'. Check network connectivity and authentication." exit 1 fiLength of output: 272
Action: Enhance error handling for remote branch deletion
- The current command
only prints a general message on failure without properly handling authentication or network issues.git push origin --delete custom-docker-image || echo "Note: Could not delete remote branch custom-docker-image. It may not exist or you may not have permission."- The executed script output showed a warning:
which implies that authentication failures might occur unnoticed.WARNING: Git credential helper not configured. Push operations may require manual authentication.- Consider capturing the actual exit status of the
git pushoperation and providing more descriptive error handling. For instance, check for specific failure codes related to authentication or permission issues, and log these failures accordingly.- Security Recommendation: Ensure that remote branch deletion errors are handled in a way that logs sensitive failures securely, prompting a review of git credentials and network connectivity, to avoid unexpected security oversights.
103-103: 🛠️ Refactor suggestion
Fix shell syntax for array membership check.
The current syntax for checking array membership has issues according to shellcheck.
Apply this fix to properly check if a value is in an array:
-elif [[ " ${FEATURE_BRANCHES[@]} " =~ " ${CURRENT_BRANCH} " ]]; then +elif [[ " ${FEATURE_BRANCHES[*]} " =~ " ${CURRENT_BRANCH} " ]]; thenAlternatively, you could use a loop for clearer array membership checking:
-elif [[ " ${FEATURE_BRANCHES[@]} " =~ " ${CURRENT_BRANCH} " ]]; then +elif is_in_array "${CURRENT_BRANCH}" "${FEATURE_BRANCHES[@]}"; then # Add this function at the top of the script +is_in_array() { + local item="$1" + shift + local array=("$@") + for element in "${array[@]}"; do + if [[ "$element" == "$item" ]]; then + return 0 + fi + done + return 1 +}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.elif [[ " ${FEATURE_BRANCHES[*]} " =~ " ${CURRENT_BRANCH} " ]]; thenis_in_array() { local item="$1" shift local array=("$@") for element in "${array[@]}"; do if [[ "$element" == "$item" ]]; then return 0 fi done return 1 }elif is_in_array "${CURRENT_BRANCH}" "${FEATURE_BRANCHES[@]}"; then🧰 Tools
🪛 Shellcheck (0.10.0)
[error] 103-103: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).
(SC2199)
[warning] 103-103: Remove quotes from right-hand side of =~ to match as a regex rather than literally.
(SC2076)
frontend/pages/items.vue (1)
81-83: 💡 Verification agent
🧩 Analysis chain
Ensure fuzzy search parameter is properly tested with various inputs.
The fuzzy search parameter is parsed from URL and compared to the string 'true', which could lead to unexpected behavior with malformed input.
🏁 Script executed:
#!/bin/bash # Test fuzzy search parameter parsing with various inputs echo "Testing fuzzy search parameter parsing..." echo "Testing with valid values:" echo "'true' should be parsed as true: $('true' === 'true' ? 'PASS' : 'FAIL')" echo "'false' should be parsed as false: $('false' === 'true' ? 'FAIL' : 'PASS')" echo "Testing with invalid values:" echo "'1' parsed as: $('1' === 'true' ? 'true' : 'false')" echo "'yes' parsed as: $('yes' === 'true' ? 'true' : 'false')" echo "'TRUE' parsed as: $('TRUE' === 'true' ? 'true' : 'false')" echo "Consider using a more robust parsing approach:" cat << 'EOF' // Better boolean parsing function parseBoolParam(value: string | null): boolean { if (!value) return false; return ['true', '1', 'yes'].includes(value.toLowerCase()); } // Usage fuzzySearch.value = parseBoolParam(route.query.fuzzySearch as string); EOFLength of output: 1354
Action Required: Revise Fuzzy Search Parameter Parsing for Robustness & Security
Please update the fuzzy search parameter parsing as follows:
- Instead of directly comparing the URL parameter to
'true', use a helper function (e.g.,parseBoolParam) that accepts alternative truthy values such as'1'and'yes'while handling case variations.- Thoroughly test this function with various inputs (e.g.,
'true','false','1','yes','TRUE', etc.) to prevent unexpected behavior for malformed input.- Security recommendation: Sanitize and validate the URL parameter early to ensure that unexpected or malicious input cannot lead to injection vulnerabilities or other security issues.
Suggested Implementation Example:
// Helper boolean parsing function: function parseBoolParam(value: string | null): boolean { if (!value) return false; return ['true', '1', 'yes'].includes(value.toLowerCase()); } // Usage within your code: fuzzySearch.value = parseBoolParam(route.query.fuzzySearch as string);Please integrate this change along with comprehensive tests to cover all edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you closed the last PR, but you did not answer my question. So I am going to put it here again:
Out of curiosity, what order are results returned in? To me it makes sense for exact matches to be returned before fuzzy matches. Fuzzy search that is too fuzzy is frustrating and I often find it useless. EG if I am searching for a USB A to A cable and it returns USB C results first then the search has failed as far as I am concerned.
I also see that you have added a toggle to turn the fuzzy search off in this PR, which does not really adddress the problem.
I have added a review comment. But none of this is up to me, I am not in charge around here.
| await search(); | ||
| // Trigger a search with the reset values | ||
| search(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not awaited any more?
Hey! I'm a Github novice I'm afraid (well, not at Github, but at working on collaborative codebases here). I closed the other PRs because I realised (sadly too late) that I was working off an outdated fork that was significantly behind the current project. Rather than get bogged down trying to fix individual conflicts, I thought it would be quicker to just close those out and open new ones. I didn't see your comment - sorry (I was mostly looking for the test results and between all the bot comments I must have missed them). Thank you for persevering and writing it again. ImplementationIt's added under the options. For example, keyword search for USB A with lowercase Toggling on the fuzzy search adds one result Generally I find that it adds just a couple more results based on minor discrepancies in spelling. In the previous PR, I added another search "enhancement" that widened the logic more significantly but also resulted in an unacceptable amount of false positives (sorry, I discovered this only in retrospect after trying it out for a few days, I guess another reason to close the PR!) |
Agree completely. But I think that the new implementation is a good compromise: By default, it's disabled; user can enable it when you know that the search is, for sure, missing something. Perhaps more ideally there would be a granularity setting but ... I think that would be overcomplicating things. The ordering logic isn't changed: |
Fair enough. For future reference you could have technically done a force push to rewrite history (Just don't rewrite commits you did not make/have already been merged) and clean up the PR. As you have full control over your fork up until it gets merged. Although you would still have to merge all the missing commits into your fork somehow.
I personally think making it a toggle makes things more complicated, at least UX wise as there is already a lot of hidden options. Adding another can be overwhelming for people (especially in your screenshot as there are now 7 options in that pop up, and presumably other people will add more later). This would probably fit better as a search operator. EG You could do
Excellent. |
Yeah, I would say that the search by asset ID PR that I opened would actually be a more pragmatic and useful functionality for most users. Another feature that would be great and solve for much of this is a favorites list. |



Adds fuzzy search logic to make it easier to find items in the inventory