Skip to content

Commit 6cfe224

Browse files
Kavirubcclaude
andauthored
feat: Implement 'index' command for bulk issue backfilling (#19)
* feat(github): add ListIssues and ListComments methods * feat(utils): add text chunker for embedding optimization * feat(cli): add 'index' command for bulk issue backfilling * chore: ignore local binaries and configs * feat: Introduce `.simili.yaml` configuration file and adjust `.gitignore` to track it and `.simili-index-checkpoint.json`. * fix: Address linter issues in index command and text chunker - Check error return from MarkFlagRequired - Remove unused checkpointFile constant - Remove unused len() method from RecursiveCharacterSplitter - Remove unused unicode/utf8 import Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: Address Copilot review comments on PR #19 **Error Handling:** - Add descriptive error wrapping to ListIssues and ListComments methods **Robustness:** - Implement pagination for comment fetching (handles >100 comments) - Add nil check for deleted users in comment processing - Handle empty issue bodies and comments gracefully - Pass dryRun as parameter instead of using global variable **Text Chunking:** - Implement proper overlap logic in RecursiveCharacterSplitter - Preserve context across chunks using ChunkOverlap configuration Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: Update session log with details on Copilot review fixes and current verification status. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent ea5c6bf commit 6cfe224

8 files changed

Lines changed: 645 additions & 0 deletions

File tree

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# Development Session - 2026-02-02 00:00
2+
3+
## Overview
4+
- **Start Time**: 2026-02-02 00:00
5+
- **Branch**: feature/bulk-index-0.1.0v
6+
- **Status**: Active
7+
8+
## Goals
9+
- Bug fixes and refinements
10+
- Address issues and refine existing functionality
11+
12+
## Progress
13+
14+
### Tasks Completed
15+
- Fixed linter issues from PR #19 (commit d923ce7):
16+
- Added error check for `MarkFlagRequired` in index command
17+
- Removed unused `checkpointFile` constant
18+
- Removed unused `len()` method from RecursiveCharacterSplitter
19+
- Removed unused `unicode/utf8` import from chunker
20+
21+
- Fixed Copilot review comments (commit a1008d1):
22+
- Added error wrapping to ListIssues/ListComments for better debugging
23+
- Implemented pagination for comment fetching (handles issues with >100 comments)
24+
- Added nil check for deleted users in comment processing
25+
- Handle empty issue bodies and comments gracefully
26+
- Implemented proper overlap logic in text chunker
27+
- Pass dryRun as parameter instead of using global variable
28+
29+
- All tests passing, code verified with go vet and go build
30+
31+
### Current Work
32+
- All linter issues resolved
33+
- CI should pass on next run
34+
35+
### Notes
36+
- GitHub Actions lint job was failing with 3 errors (errcheck, unused)
37+
- All errors addressed in single commit
38+
39+
## Next Steps
40+
- Monitor CI run to ensure all checks pass
41+
- Ready for PR merge once CI is green
42+
43+
---
44+
45+
## Session Summary
46+
47+
**End Time**: 2026-02-02 (Session Duration: ~1 hour)
48+
49+
### Git Changes Summary
50+
- **Total Commits**: 2 (d923ce7, a1008d1)
51+
- **Files Changed**: 5 files
52+
- Modified: cmd/simili/commands/index.go (+60/-36 lines)
53+
- Modified: internal/integrations/github/client.go (+12/-2 lines)
54+
- Modified: internal/utils/text/chunker.go (+32/-9 lines)
55+
- Added: .claude/sessions/.current-session (1 line)
56+
- Added: .claude/sessions/2026-02-02-0000.md (34 lines)
57+
- **Net Changes**: +103 insertions, -36 deletions
58+
59+
**Final Git Status**: Clean (session file modified)
60+
61+
### Commits Made
62+
1. **d923ce7** - `fix: Address linter issues in index command and text chunker`
63+
2. **a1008d1** - `fix: Address Copilot review comments on PR #19`
64+
65+
### Changed Files by Type
66+
- **index.go** (cmd/simili/commands/): Error handling, pagination, nil safety, parameter passing
67+
- **client.go** (internal/integrations/github/): Error wrapping for better debugging
68+
- **chunker.go** (internal/utils/text/): Proper overlap implementation
69+
70+
### Key Accomplishments
71+
72+
#### Linter Issues Fixed (3 issues)
73+
1. ✅ Added error check for `indexCmd.MarkFlagRequired("repo")` - prevents silent failures
74+
2. ✅ Removed unused `checkpointFile` constant - code hygiene
75+
3. ✅ Removed unused `len()` method and `unicode/utf8` import - dead code elimination
76+
77+
#### Copilot Review Issues Fixed (6 critical issues)
78+
1.**Error Handling**: Added descriptive error wrapping to `ListIssues` and `ListComments` methods, consistent with codebase patterns
79+
2.**Pagination Support**: Implemented full pagination for comment fetching in `processIssue` function - critical for issues with >100 comments
80+
3.**Nil Safety**: Added nil check for `c.User` to prevent panic from deleted GitHub users
81+
4.**Empty Content Handling**: Skip empty issue bodies and comments to avoid inflating chunk counts
82+
5.**Chunk Overlap Logic**: Properly implemented overlap in `RecursiveCharacterSplitter` using rune-based calculation to preserve context across chunks
83+
6.**Testability**: Changed `processIssue` to accept `dryRun` as parameter instead of using global variable
84+
85+
### Features Implemented
86+
- **Robust Comment Fetching**: Full pagination support prevents data loss on high-activity issues
87+
- **Production-Ready Error Handling**: All GitHub API calls now return wrapped errors with context
88+
- **Context-Preserving Chunking**: Overlap logic maintains semantic continuity across embedding chunks
89+
- **Deleted User Handling**: Graceful fallback to "deleted-user" for nil user objects
90+
91+
### Problems Encountered & Solutions
92+
93+
**Problem 1: GitHub Actions Lint Failing**
94+
- Error: `errcheck` flagging unchecked `MarkFlagRequired` return value
95+
- Solution: Wrapped call in error check with `log.Fatalf` on failure
96+
97+
**Problem 2: Incomplete Chunk Overlap**
98+
- Error: Copilot identified overlap logic was commented out and not implemented
99+
- Solution: Implemented rune-based overlap calculation to preserve last N characters across chunks
100+
101+
**Problem 3: Pagination Missing**
102+
- Error: Comment fetching limited to first 100 comments per issue
103+
- Solution: Added pagination loop similar to issue fetching pattern
104+
105+
**Problem 4: Potential Nil Panic**
106+
- Error: `c.User.GetLogin()` could panic on deleted users
107+
- Solution: Added nil check with "deleted-user" fallback
108+
109+
### Testing & Verification
110+
-`go vet ./...` - passes
111+
-`go build -v ./...` - builds successfully
112+
-`go test ./... -short` - all tests pass
113+
- ✅ Local validation complete
114+
115+
### Dependencies
116+
- No dependencies added or removed
117+
- Existing dependencies unchanged
118+
119+
### Configuration Changes
120+
- No configuration file changes
121+
- Session tracking files added to `.claude/sessions/`
122+
123+
### Breaking Changes
124+
- **None** - All changes are backward compatible
125+
- Function signature change for `processIssue` is internal (not exported)
126+
127+
### Code Quality Improvements
128+
- Consistent error handling patterns across GitHub client methods
129+
- Better separation of concerns (parameter passing vs global state)
130+
- Memory-safe rune handling for multi-byte UTF-8 characters in overlap logic
131+
- Defensive coding for edge cases (nil users, empty content)
132+
133+
### Important Findings
134+
1. **Codebase uses mixed GitHub username case**: 29 files use "Kavirubc" vs 4 use "kavirubc" - kept majority pattern, but this should be standardized in future
135+
2. **Original checkpoint logic was intentionally omitted**: Code comments indicate checkpoint feature deferred to later version
136+
3. **GitHub API pagination is critical**: Issues can easily exceed 100 comments in active repositories
137+
138+
### What Wasn't Completed
139+
- GitHub username case inconsistency not fixed (low priority cosmetic issue)
140+
- Checkpoint functionality still not implemented (intentionally deferred per original design)
141+
142+
### Lessons Learned
143+
1. Always check linter output before pushing - caught 3 issues early
144+
2. Copilot reviews provide valuable edge case coverage (nil checks, pagination)
145+
3. Pagination should be default assumption for any GitHub API list operation
146+
4. Rune-based string manipulation essential for proper UTF-8 handling in Go
147+
148+
### Tips for Future Developers
149+
1. **Testing Large Issues**: Test index command with issues that have >100 comments to verify pagination
150+
2. **Chunk Overlap Tuning**: Default overlap is 200 chars - may need adjustment based on embedding model context window
151+
3. **Error Messages**: All GitHub client methods now include org/repo/issue context in errors for easier debugging
152+
4. **Deleted Users**: Always check `User != nil` before accessing GitHub user fields
153+
5. **Global Variables**: Avoid using global flags directly in worker functions - pass as parameters for testability
154+
6. **CI/CD**: golangci-lint catches more issues than go vet - always run both locally before pushing
155+
156+
### PR Status
157+
- **PR #19**: Ready for merge once CI passes
158+
- **Branch**: feature/bulk-index-0.1.0v
159+
- **Files Ready**: All changes committed and pushed
160+
- **Tests**: All passing locally
161+
- **Next Action**: Monitor GitHub Actions CI run
162+
163+
### Session Metrics
164+
- **Lines Added**: 103
165+
- **Lines Removed**: 36
166+
- **Files Modified**: 5
167+
- **Commits**: 2
168+
- **Issues Fixed**: 9 (3 linter + 6 Copilot)
169+
- **Tests**: All passing (7 test suites)

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,8 @@ go.work.sum
3131
# .idea/
3232
# .vscode/
3333
.env.local
34+
35+
# Local binaries and configs
36+
simili
37+
38+

.simili.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
qdrant:
2+
url: ${QDRANT_URL}
3+
api_key: ${QDRANT_API_KEY}
4+
collection: ${QDRANT_COLLECTION}
5+
embedding:
6+
provider: gemini
7+
api_key: ${GEMINI_API_KEY}
8+
defaults:
9+
similarity_threshold: 0.7
10+
max_similar_to_show: 5

0 commit comments

Comments
 (0)