Skip to content

Commit 7c382f3

Browse files
wesmclaude
andcommitted
perf: Use Polars operations in get_all_merchants_for_autocomplete
**Optimization:** Replaced Python set operations with Polars vectorized operations in get_all_merchants_for_autocomplete(). **Before (slow with thousands of merchants):** ```python merchants_set = set(self.all_merchants) current_merchants = self.df["merchant"].unique().to_list() merchants_set.update(current_merchants) return sorted(merchants_set) ``` - Two conversions to Python (to_list, set) - Python set operations - Python sorted() **After (fast):** ```python cached_series = pl.Series("merchant", self.all_merchants) current_series = self.df["merchant"].unique() all_merchants = pl.concat([cached_series, current_series]).unique().sort() return all_merchants.to_list() ``` - One conversion to Polars Series (cached merchants) - Polars concat, unique, sort (vectorized) - Single to_list() at the end **Performance Impact:** This method is called every time the edit merchant modal is opened, so optimization here directly improves UI responsiveness. With 1000+ merchants: - Polars concat: O(n) in Rust - Polars unique: Hash-based, optimized - Polars sort: Optimized sorting algorithm - Much faster than Python set operations **Other Findings:** Reviewed other sorted/set operations in codebase: - save_merchants_cache: sorted(set()) is appropriate (preparing for JSON) - SelectCategoryScreen: sorting dict items, not DataFrame data - Demo backend: small data, not performance critical - duplicate_detector: small groups, not critical - categories.py: small lists, not critical **Testing:** - All 765 tests pass - No behavior changes - purely performance optimization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 9fda8b0 commit 7c382f3

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

moneyflow/data_manager.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -278,15 +278,19 @@ def get_all_merchants_for_autocomplete(self) -> List[str]:
278278
Returns:
279279
Sorted, deduplicated list of all merchants
280280
"""
281-
# Start with cached merchants
282-
merchants_set = set(self.all_merchants)
281+
# Use Polars operations for performance with large merchant lists
282+
# Convert cached merchants to Series
283+
cached_series = pl.Series("merchant", self.all_merchants)
283284

284-
# Add merchants from currently loaded transactions
285+
# Merge with current merchants if we have loaded data
285286
if self.df is not None and not self.df.is_empty():
286-
current_merchants = self.df["merchant"].unique().to_list()
287-
merchants_set.update(current_merchants)
287+
current_series = self.df["merchant"].unique()
288+
# Concatenate and deduplicate using Polars
289+
all_merchants = pl.concat([cached_series, current_series]).unique().sort()
290+
else:
291+
all_merchants = cached_series.unique().sort()
288292

289-
return sorted(merchants_set)
293+
return all_merchants.to_list()
290294

291295
async def fetch_all_data(
292296
self,

0 commit comments

Comments
 (0)