Remove aiocache dependency and use Streamlit's built-in caching decorators (st.cache_data for computations, st.cache_resource for singletons). This is the recommended approach for Streamlit apps.
New File: filebundler/services/cached_operations.py
- Action: Create new module with cached versions of expensive operations:
@st.cache_resourceforget_tiktoken_encoder()(singleton)@st.cache_data(ttl=600, max_entries=2000)forget_file_content(file_path: str, mtime: float)@st.cache_data(ttl=3600, max_entries=1000)forget_file_tokens(file_path: str, mtime: float)@st.cache_data(ttl=60, max_entries=100)forget_total_tokens(file_paths: tuple, mtimes: tuple)- Utility functions:
clear_file_caches(),get_cache_stats()
Key Pattern: All functions use (file_path_str, mtime) as cache key for automatic invalidation
File: filebundler/models/FileItem.py
-
Lines: 1-7 (imports)
-
Action: Add
from filebundler.services.cached_operations import get_file_content, get_file_tokens -
Lines: 52-55 (content property)
-
Action: Replace with:
@property def content(self): if self.path.is_file(): mtime = self.path.stat().st_mtime return get_file_content(str(self.path), mtime)
File: filebundler/models/FileItem.py
- Lines: 58-62 (tokens property)
- Action: Replace with:
@property def tokens(self): if self.path.is_file(): mtime = self.path.stat().st_mtime return get_file_tokens(str(self.path), mtime) else: return sum(fi.tokens for fi in self.children)
Note: Directory token sums benefit from cached file tokens (recursive optimization)
File: filebundler/services/token_count.py
-
Lines: 1-3 (imports)
-
Action: Add
from filebundler.services.cached_operations import get_tiktoken_encoder -
Lines: 11-23 (count_tokens function)
-
Action: Replace with:
def count_tokens(text: str, model: str = "o200k_base") -> int: """Count tokens using cached encoder.""" encoder = get_tiktoken_encoder(model) return len(encoder.encode(text))
File: filebundler/managers/SelectionsManager.py
-
Lines: 1-12 (imports)
-
Action: Add
from filebundler.services.cached_operations import get_total_tokens -
Lines: 54-57 (tokens property)
-
Action: Replace with:
@property def tokens(self): """Return total tokens of selected files (cached).""" selected = self.selected_file_items if not selected: return 0 # Create hashable tuples for cache key file_paths = tuple(str(fi.path) for fi in selected) mtimes = tuple(fi.path.stat().st_mtime for fi in selected) return get_total_tokens(file_paths, mtimes)
File: filebundler/FileBundlerApp.py
-
Lines: 40-50 (after self.file_items initialization)
-
Action: Add
self._highest_token_item: Optional[FileItem] = None -
Lines: 171-174 (in load_directory_recursive, after adding file to file_items)
-
Action: Add tracking logic:
# Track highest token item during loading if not file_item.is_dir: if not self._highest_token_item or file_item.tokens > self._highest_token_item.tokens: self._highest_token_item = file_item
-
Lines: 94-96 (highest_token_item property)
-
Action: Replace with:
@property def highest_token_item(self): return self._highest_token_item
File: pyproject.toml
- Lines: 7-16 (dependencies)
- Action: Add
"aiofiles>=24.1.0",
File: filebundler/FileBundlerApp.py
- Lines: 106-191
- Action: Create new async version
async def _load_directory_recursive_async():- Use
aiofilesfor async file I/O - Use
asyncio.gather()to process subdirectories in parallel - Use
asyncio.to_thread()for CPU-bound operations (pattern matching, sorting) - Wrap in sync function:
def load_directory_recursive()that callsasyncio.run()
- Use
Expected Impact: 50-70% faster loading for large projects (1000+ files)
File: pyproject.toml
- Action: Remove aiocache from dependencies if it's not used for anything else
New Feature: Add button to clear caches for debugging
- Location: Settings or debug panel
- Action: Call
st.cache_data.clear()or specific cache clear functions
- Verify token counts match before/after optimization
- Test file modification detection (change file, verify cache invalidates)
- Test with large project (1000+ files)
- Verify memory usage stays reasonable
- Test selection changes update token counts correctly
- Verify highest_token_item is accurate
- Test with non-UTF8 files (error handling)
- Verify file tree renders faster
- Test token ranking tab performance
| Operation | Before | After Phase 1-5 | After Phase 6 |
|---|---|---|---|
| Project Load (1000 files) | 8-12s | 3-5s | 1-2s |
| File Tree Render | 2-4s | 0.1-0.2s | 0.1-0.2s |
| Token Ranking Tab | 3-6s | 0.2-0.4s | 0.2-0.4s |
| Selection Changes | 1-2s | 0.1-0.15s | 0.1-0.15s |
| Subsequent Reruns | Same as initial | Near-instant | Near-instant |
Overall: 60-85% reduction in latency for most operations
- Streamlit-native: Uses built-in caching designed for Streamlit's execution model
- Automatic invalidation: mtime-based keys ensure stale data never served
- Zero async complexity: No event loop management in Phase 1-5
- Memory efficient: Streamlit handles serialization and eviction automatically
- Thread-safe: Streamlit's caching is inherently thread-safe
- Simple implementation: Minimal code changes, mostly wrapper functions
Recommended: Phases 1-5 first (2.5 hours), then test and measure. Only do Phase 6 if async loading is needed for very large projects.
Quick wins: Phase 1-3 give ~80% of the benefit in ~1.5 hours