Improve AliasStorm output stability and related Python 3.13 compatibility fixes#37
Open
SpiritGun91 wants to merge 2 commits into
Open
Improve AliasStorm output stability and related Python 3.13 compatibility fixes#37SpiritGun91 wants to merge 2 commits into
SpiritGun91 wants to merge 2 commits into
Conversation
- buffer per-URL output and print atomically - add request timeout flag and stalled-scan heartbeat - add scan completion summary - update Python 3.13 dependency pins and lazy pydork import
Reviewer's GuideRefactors AliasStorm’s scanning pipeline to buffer per-URL console output under locks, adds a configurable request timeout and low-noise stalled-scan heartbeat with a final summary, hardens results file writes, and updates PyDork integration and dependency pins for Python 3.13 compatibility and cleaner UX. Sequence diagram for AliasStorm buffered scanning and heartbeatsequenceDiagram
actor User
participant CLI as AliasStorm_CLI
participant Main as main
participant Exec as ThreadPoolExecutor
participant Worker as search_username_on_url
participant Visited as visited_sets_locks
participant Session as HTMLSession
participant File as results_file
participant Console
User->>CLI: run aliastorm.py with args
CLI->>Main: main(username, flags, request_timeout)
Main->>Main: load url_list
Main->>Exec: create pool(max_workers)
loop submit_all_urls
Main->>Exec: submit search_username_on_url(username, url, flags, request_timeout)
end
Main->>Main: track pending futures
loop until all futures done
Main->>Exec: wait(pending, timeout=1.0)
alt one_or_more_done
Exec-->>Main: done_futures
loop each done future
Main->>Worker: future.result()
Worker->>Visited: acquire visited_urls_lock
alt url_already_visited
Visited-->>Worker: seen
Worker-->>Main: ["Skipping duplicate URL line"]
else new_url
Visited-->>Worker: not_seen
Worker->>Session: create HTMLSession
Worker->>Session: get(url, timeout=request_timeout)
Session-->>Worker: response
alt status_code_200
Worker->>Visited: acquire visited_urls_lock
alt html_content_duplicate
Visited-->>Worker: duplicate_html
Worker-->>Main: ["Skipping duplicate HTML content line"]
else new_html_content
Visited-->>Worker: ok
Worker->>Worker: build_query_detection(...)
Worker->>File: write_to_file(...) under file_lock
Worker->>Worker: build_html_lines(...)
Worker-->>Main: buffered_lines
end
else non_200_or_error
Worker-->>Main: []
end
Worker->>Session: close()
end
Main->>Main: increment completed
Main->>Console: print each buffered line under output_lock
end
else no_done_and_idle
Main->>Main: compute idle_for
alt idle_exceeds_stall_threshold
Main->>Console: print stall heartbeat line under output_lock
else below_stall_threshold
Main->>Main: continue waiting
end
end
end
Main->>Console: print final scan summary
Console-->>User: stable, grouped output
Class diagram for aliastorm functions and PyDorkWindowclassDiagram
class AliasStormModule {
+build_profile_url(base_url, username) str
+search_username_on_url(username, url, include_titles, include_descriptions, include_html_content, request_timeout) list
+build_query_detection(username, url, html_content) list
+write_to_file(username, url, status_code, html_content, include_titles, include_descriptions, include_html_content) void
+build_html_lines(html_content, url, query, include_titles, include_descriptions, include_html_content) list
+main(username, include_titles, include_descriptions, include_html_content, request_timeout) void
-visited_urls set
-visited_html_content set
-visited_urls_lock Lock
-file_lock Lock
}
class ThreadingAndIO {
+ThreadPoolExecutor
+output_lock Lock
+stall_after_seconds int
+stall_update_interval_seconds int
+results_file
}
class PyDorkWindow {
+search_input
+result_area
+perform_search() void
}
class SearchEngine {
+set(engine_name) void
+search(query) list
}
AliasStormModule --> ThreadingAndIO : uses
AliasStormModule --> SearchEngine : none
PyDorkWindow --> SearchEngine : imports_and_uses
Architecture flow diagram for AliasStorm CLI and PyDork integrationflowchart LR
subgraph AliasStorm_CLI_Path
U["User CLI"] --> AC["AliasStorm_CLI"]
AC --> AM["aliastorm_main"]
AM --> TPE["ThreadPoolExecutor"]
TPE --> W1["search_username_on_url workers"]
W1 --> VS["visited_sets_with_lock"]
W1 --> RF["results_file_with_file_lock"]
W1 --> WS["Target_websites"]
AM --> HC["Heartbeat_and_summary_output"]
HC --> CCON["Console_output"]
end
subgraph PyDork_GUI_Path
UG["User GUI"] --> PW["PyDorkWindow"]
PW --> VQ["Validate_query_and_errors"]
PW --> WI["Warning_suppressed_SearchEngine_import"]
WI --> SE["PyDork_SearchEngine"]
SE --> GG["Google_search_backend"]
SE --> PR["Search_results"]
PR --> RA["Result_area_text"]
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
write_to_file,soupis now created only inside theinclude_titlesblock, but is still used unconditionally in theinclude_descriptionsblock, which will raise aNameErrorwhen titles are excluded but descriptions are included; initializesoupbefore both conditionals or guard the description logic similarly. - In
PyDorkWindow.perform_search, the filtered import ofSearchEngineruns on every search; consider performing this import (and warning suppression) once in__init__and caching theSearchEngineinstance to avoid repeated imports and make failures surface earlier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `write_to_file`, `soup` is now created only inside the `include_titles` block, but is still used unconditionally in the `include_descriptions` block, which will raise a `NameError` when titles are excluded but descriptions are included; initialize `soup` before both conditionals or guard the description logic similarly.
- In `PyDorkWindow.perform_search`, the filtered import of `SearchEngine` runs on every search; consider performing this import (and warning suppression) once in `__init__` and caching the `SearchEngine` instance to avoid repeated imports and make failures surface earlier.
## Individual Comments
### Comment 1
<location path="src/windows/CT/AliaStorm/aliastorm.py" line_range="105-109" />
<code_context>
- results_file.write(f"Username: {username}\n")
- results_file.write(f"URL: {url}\n")
- results_file.write(f"Status Code: {status_code}\n")
- if include_titles:
- soup = BeautifulSoup(html_content, 'html.parser')
- title = soup.title.get_text(strip=True) if soup.title else "No title found"
- results_file.write(f"Title: {title}\n")
- if include_descriptions:
- meta_description = soup.find("meta", attrs={"name": "description"})
- description = meta_description['content'] if meta_description else "No meta description found"
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid using `soup` only conditionally so `include_descriptions=True` without `include_titles` doesn’t rely on an undefined variable.
`soup` is only initialized inside `if include_titles:` but is also used when `include_descriptions` is true. With `include_titles=False` and `include_descriptions=True`, this will raise a `NameError` at runtime. Initialize `soup` once before these branches (e.g., when either flag is true) and then use it in both blocks to avoid this latent bug and keep the control flow clear.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Owner
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Initialize BeautifulSoup once when title or description extraction is enabled to avoid NameError when include_titles is false.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Details
Validation
Why This Change
Summary by Sourcery
Improve AliasStorm scan output stability, responsiveness, and dependency compatibility while tightening PyDork integration error handling.
New Features:
Bug Fixes:
Enhancements:
Build: