-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: parallelization #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 11 commits
2599712
0c655fc
641e182
352447b
4ffbcba
4c0eb30
bd25950
8360d3d
38539d5
e0b52c1
b2e8332
7d0339a
c4c77e6
5c6f478
5becc84
5e8a6d1
73c2cc3
6984401
7105749
a931a13
5411484
7650794
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
export interface WorkerConfig { | ||
workerIndex: number; | ||
startIndex: number; | ||
endIndex: number; | ||
batchSize: number; | ||
pageUrls: string[]; | ||
listSelector: string; | ||
fields: any; | ||
pagination: { | ||
type: string; | ||
selector: string; | ||
}; | ||
} | ||
|
||
export interface SharedState { | ||
totalScraped: number; | ||
results: any[]; | ||
} | ||
Comment on lines
+15
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Codebase verification Thread safety and type safety concerns are valid issues that need addressing The codebase shows:
Suggested improvements:
🔗 Analysis chainConsider thread safety and type safety for SharedState. The shared state between workers needs careful consideration:
Consider using a thread-safe data structure or implementing proper synchronization mechanisms. Also, define a generic type parameter: -export interface SharedState {
+export interface SharedState<T = unknown> {
totalScraped: number;
- results: any[];
+ results: T[];
} 🏁 Scripts executedThe following scripts were executed for the analysis: Error: No shell scripts found Script: #!/bin/bash
# Find all files importing or using SharedState
rg "SharedState" -l
# Look for actual usage patterns of SharedState
rg "SharedState" -A 5 -B 5
# Check for any thread synchronization patterns
rg -i "sync|mutex|lock|atomic" maxun-core/src/
# Look for type definitions related to worker results
ast-grep --pattern 'type $_ = $_' maxun-core/src/types/
Length of output: 12076 |
||
|
||
export interface WorkerProgressData { | ||
percentage: number; | ||
currentUrl: string; | ||
scrapedItems: number; | ||
timeElapsed: number; | ||
estimatedTimeRemaining: number; | ||
failures: number; | ||
performance: PerformanceMetrics; | ||
} | ||
|
||
export interface PerformanceMetrics { | ||
startTime: number; | ||
endTime: number; | ||
duration: number; | ||
pagesProcessed: number; | ||
itemsScraped: number; | ||
failedPages: number; | ||
averageTimePerPage: number; | ||
memoryUsage: { | ||
heapUsed: number; | ||
heapTotal: number; | ||
external: number; | ||
rss: number; | ||
}; | ||
cpuUsage: { | ||
user: number; | ||
system: number; | ||
}; | ||
} | ||
|
||
export interface GlobalMetrics { | ||
totalPagesProcessed: number; | ||
totalItemsScraped: number; | ||
totalFailures: number; | ||
workersActive: number; | ||
averageSpeed: number; | ||
timeElapsed: number; | ||
memoryUsage: NodeJS.MemoryUsage; | ||
cpuUsage: NodeJS.CpuUsage; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Modularize the pagination navigation logic.
The navigation logic is complex and could benefit from being split into smaller, focused functions for better maintainability and testing.
Consider extracting these functionalities:
Example refactor for URL collection: