Skip to content

Improve AliasStorm output stability and related Python 3.13 compatibility fixes#37

Open
SpiritGun91 wants to merge 2 commits into
AnonCatalyst:mainfrom
SpiritGun91:fix/aliastorm-output-and-timeout
Open

Improve AliasStorm output stability and related Python 3.13 compatibility fixes#37
SpiritGun91 wants to merge 2 commits into
AnonCatalyst:mainfrom
SpiritGun91:fix/aliastorm-output-and-timeout

Conversation

@SpiritGun91
Copy link
Copy Markdown

@SpiritGun91 SpiritGun91 commented May 10, 2026

Summary

  • Fixes AliasStorm console output corruption caused by concurrent thread printing.
  • Buffers output per URL and flushes it atomically so each result block stays readable.
  • Replaces verbose per-site progress spam with a low-noise stalled-scan heartbeat.
  • Adds a request timeout flag for better control of slow/hanging endpoints.
  • Adds a scan completion summary once all URLs are processed.
  • Updates Python 3.13 dependency pins and improves PyDork import behavior to reduce startup noise/issues.

Details

  • AliasStorm now builds output lines in worker threads and prints only in the main completion loop under a lock.
  • Shared write paths are synchronized:
    • Thread-safe visited URL/content tracking.
    • Thread-safe results file writes.
  • Progress UX now reports only when scans appear stalled, instead of printing every completion.

Validation

  • Syntax check passed for AliasStorm.
  • Manual scan runs complete successfully.
  • Confirmed output no longer interleaves across threads.
  • Confirmed stalled heartbeat appears only during slow periods.
  • Confirmed final scan summary is printed at end.

Why This Change

  • Improves reliability and readability during large concurrent scans.
  • Makes runtime feedback actionable without excessive noise.
  • Keeps compatibility with current Python 3.13 environment.

Summary by Sourcery

Improve AliasStorm scan output stability, responsiveness, and dependency compatibility while tightening PyDork integration error handling.

New Features:

  • Add a configurable per-request timeout flag for AliasStorm scans.
  • Introduce a final scan completion summary after processing all URLs.

Bug Fixes:

  • Prevent console output corruption and interleaving by buffering AliasStorm worker output and printing atomically from the main thread.
  • Avoid duplicate URL and HTML content processing with thread-safe visited tracking.
  • Handle missing PyDork dependency and runtime failures gracefully in the PyQt search window.

Enhancements:

  • Refine AliasStorm progress reporting to emit low-noise stalled-scan heartbeat updates instead of per-URL progress spam.
  • Limit concurrent AliasStorm workers and synchronize results file writes for more reliable concurrent scanning.
  • Improve construction of profile URLs from base URLs and usernames to better support template-like endpoints.
  • Suppress known pkg_resources deprecation warnings from PyDork imports to keep the UI output clean.
  • Return clearer messaging in the PyQt search UI when no search query is provided or no results are found.

Build:

  • Update PyQt6, numpy, lxml_html_clean, html_clean, and psutil pins for Python 3.13 compatibility and current ecosystem versions.

- buffer per-URL output and print atomically

- add request timeout flag and stalled-scan heartbeat

- add scan completion summary

- update Python 3.13 dependency pins and lazy pydork import
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 10, 2026

Reviewer's Guide

Refactors AliasStorm’s scanning pipeline to buffer per-URL console output under locks, adds a configurable request timeout and low-noise stalled-scan heartbeat with a final summary, hardens results file writes, and updates PyDork integration and dependency pins for Python 3.13 compatibility and cleaner UX.

Sequence diagram for AliasStorm buffered scanning and heartbeat

sequenceDiagram
    actor User
    participant CLI as AliasStorm_CLI
    participant Main as main
    participant Exec as ThreadPoolExecutor
    participant Worker as search_username_on_url
    participant Visited as visited_sets_locks
    participant Session as HTMLSession
    participant File as results_file
    participant Console

    User->>CLI: run aliastorm.py with args
    CLI->>Main: main(username, flags, request_timeout)
    Main->>Main: load url_list
    Main->>Exec: create pool(max_workers)

    loop submit_all_urls
        Main->>Exec: submit search_username_on_url(username, url, flags, request_timeout)
    end

    Main->>Main: track pending futures

    loop until all futures done
        Main->>Exec: wait(pending, timeout=1.0)
        alt one_or_more_done
            Exec-->>Main: done_futures
            loop each done future
                Main->>Worker: future.result()
                Worker->>Visited: acquire visited_urls_lock
                alt url_already_visited
                    Visited-->>Worker: seen
                    Worker-->>Main: ["Skipping duplicate URL line"]
                else new_url
                    Visited-->>Worker: not_seen
                    Worker->>Session: create HTMLSession
                    Worker->>Session: get(url, timeout=request_timeout)
                    Session-->>Worker: response
                    alt status_code_200
                        Worker->>Visited: acquire visited_urls_lock
                        alt html_content_duplicate
                            Visited-->>Worker: duplicate_html
                            Worker-->>Main: ["Skipping duplicate HTML content line"]
                        else new_html_content
                            Visited-->>Worker: ok
                            Worker->>Worker: build_query_detection(...)
                            Worker->>File: write_to_file(...) under file_lock
                            Worker->>Worker: build_html_lines(...)
                            Worker-->>Main: buffered_lines
                        end
                    else non_200_or_error
                        Worker-->>Main: []
                    end
                    Worker->>Session: close()
                end
                Main->>Main: increment completed
                Main->>Console: print each buffered line under output_lock
            end
        else no_done_and_idle
            Main->>Main: compute idle_for
            alt idle_exceeds_stall_threshold
                Main->>Console: print stall heartbeat line under output_lock
            else below_stall_threshold
                Main->>Main: continue waiting
            end
        end
    end

    Main->>Console: print final scan summary
    Console-->>User: stable, grouped output
Loading

Class diagram for aliastorm functions and PyDorkWindow

classDiagram
    class AliasStormModule {
        +build_profile_url(base_url, username) str
        +search_username_on_url(username, url, include_titles, include_descriptions, include_html_content, request_timeout) list
        +build_query_detection(username, url, html_content) list
        +write_to_file(username, url, status_code, html_content, include_titles, include_descriptions, include_html_content) void
        +build_html_lines(html_content, url, query, include_titles, include_descriptions, include_html_content) list
        +main(username, include_titles, include_descriptions, include_html_content, request_timeout) void
        -visited_urls set
        -visited_html_content set
        -visited_urls_lock Lock
        -file_lock Lock
    }

    class ThreadingAndIO {
        +ThreadPoolExecutor
        +output_lock Lock
        +stall_after_seconds int
        +stall_update_interval_seconds int
        +results_file
    }

    class PyDorkWindow {
        +search_input
        +result_area
        +perform_search() void
    }

    class SearchEngine {
        +set(engine_name) void
        +search(query) list
    }

    AliasStormModule --> ThreadingAndIO : uses
    AliasStormModule --> SearchEngine : none
    PyDorkWindow --> SearchEngine : imports_and_uses
Loading

Architecture flow diagram for AliasStorm CLI and PyDork integration

flowchart LR
    subgraph AliasStorm_CLI_Path
        U["User CLI"] --> AC["AliasStorm_CLI"]
        AC --> AM["aliastorm_main"]
        AM --> TPE["ThreadPoolExecutor"]
        TPE --> W1["search_username_on_url workers"]
        W1 --> VS["visited_sets_with_lock"]
        W1 --> RF["results_file_with_file_lock"]
        W1 --> WS["Target_websites"]
        AM --> HC["Heartbeat_and_summary_output"]
        HC --> CCON["Console_output"]
    end

    subgraph PyDork_GUI_Path
        UG["User GUI"] --> PW["PyDorkWindow"]
        PW --> VQ["Validate_query_and_errors"]
        PW --> WI["Warning_suppressed_SearchEngine_import"]
        WI --> SE["PyDork_SearchEngine"]
        SE --> GG["Google_search_backend"]
        SE --> PR["Search_results"]
        PR --> RA["Result_area_text"]
    end
Loading

File-Level Changes

Change Details Files
Make AliasStorm’s scanning output buffered, thread-safe, and less noisy while adding timeouts and a completion summary.
  • Introduce build_profile_url helper to construct username-specific profile URLs more robustly.
  • Change search_username_on_url to build and return a list of output lines instead of printing directly, including query detection and HTML snippet lines.
  • Replace direct printing helpers with build_query_detection and build_html_lines functions that assemble output lines without side effects.
  • Guard shared visited URL/content sets with a lock and guard results file writes with a separate file lock for thread safety.
  • Update write_to_file to acquire a file lock around all writes to the shared results file.
  • Rework main to manage a ThreadPoolExecutor with bounded worker count, track completion progress, flush buffered per-URL output under an output lock, emit a stalled-scan heartbeat, and print a final scan summary.
  • Add a --request_timeout CLI flag and plumb the timeout through to HTTP requests, choosing timeout-dependent stall thresholds.
src/windows/CT/AliaStorm/aliastorm.py
Improve PyDork window robustness and reduce noisy deprecation warnings at import time.
  • Defer importing SearchEngine from pydork.engine until perform_search is run, and wrap the import in a warnings filter that suppresses the known pkg_resources deprecation warning.
  • Add validation for empty queries and user-facing error messages when PyDork is missing or initialization/search fails, including a helpful pip install hint.
  • Ensure the UI shows a clear message when no results are returned instead of an empty area.
src/windows/pydork_window.py
Update dependency versions for Python 3.13 compatibility and clean up unused/problematic pins.
  • Bump PyQt6 and numpy to Python 3.13-compatible versions.
  • Update lxml_html_clean, html_clean, and psutil to versions with available wheels and current releases.
  • Remove the asyncio requirement comment line now that asyncio is part of the standard library.
  • Keep security-related pins like zipp and anyio and ensure the rest of the requirements remain unchanged.
requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In write_to_file, soup is now created only inside the include_titles block, but is still used unconditionally in the include_descriptions block, which will raise a NameError when titles are excluded but descriptions are included; initialize soup before both conditionals or guard the description logic similarly.
  • In PyDorkWindow.perform_search, the filtered import of SearchEngine runs on every search; consider performing this import (and warning suppression) once in __init__ and caching the SearchEngine instance to avoid repeated imports and make failures surface earlier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `write_to_file`, `soup` is now created only inside the `include_titles` block, but is still used unconditionally in the `include_descriptions` block, which will raise a `NameError` when titles are excluded but descriptions are included; initialize `soup` before both conditionals or guard the description logic similarly.
- In `PyDorkWindow.perform_search`, the filtered import of `SearchEngine` runs on every search; consider performing this import (and warning suppression) once in `__init__` and caching the `SearchEngine` instance to avoid repeated imports and make failures surface earlier.

## Individual Comments

### Comment 1
<location path="src/windows/CT/AliaStorm/aliastorm.py" line_range="105-109" />
<code_context>
-        results_file.write(f"Username: {username}\n")
-        results_file.write(f"URL: {url}\n")
-        results_file.write(f"Status Code: {status_code}\n")
-        if include_titles:
-            soup = BeautifulSoup(html_content, 'html.parser')
-            title = soup.title.get_text(strip=True) if soup.title else "No title found"
-            results_file.write(f"Title: {title}\n")
-        if include_descriptions:
-            meta_description = soup.find("meta", attrs={"name": "description"})
-            description = meta_description['content'] if meta_description else "No meta description found"
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid using `soup` only conditionally so `include_descriptions=True` without `include_titles` doesn’t rely on an undefined variable.

`soup` is only initialized inside `if include_titles:` but is also used when `include_descriptions` is true. With `include_titles=False` and `include_descriptions=True`, this will raise a `NameError` at runtime. Initialize `soup` once before these branches (e.g., when either flag is true) and then use it in both blocks to avoid this latent bug and keep the control flow clear.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/windows/CT/AliaStorm/aliastorm.py
@AnonCatalyst
Copy link
Copy Markdown
Owner

AnonCatalyst commented May 10, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Initialize BeautifulSoup once when title or description extraction is enabled to avoid NameError when include_titles is false.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants