Skip to content

[BUG] Prevent infinite loops in AptaTransPipeline and MCTS candidate generation#290

Closed
direkkakkar319-ops wants to merge 4 commits into
gc-os-ai:mainfrom
direkkakkar319-ops:issueLoop-DirekKakkar
Closed

[BUG] Prevent infinite loops in AptaTransPipeline and MCTS candidate generation#290
direkkakkar319-ops wants to merge 4 commits into
gc-os-ai:mainfrom
direkkakkar319-ops:issueLoop-DirekKakkar

Conversation

@direkkakkar319-ops

Copy link
Copy Markdown
Contributor

What does this implement/fix? Explain your changes.

This PR addresses two significant stability issues that could lead to production hangs and unbounded CPU usage:

  1. Infinite Loop in _pipeline.py :

    • Modified AptaTransPipeline.recommend to include a max_attempts guard. This prevents the loop from running indefinitely if the MCTS algorithm repeatedly returns duplicate sequences.
    • The default max_attempts is set to max(100, 10 * n_candidates) .
    • Introduced a strict boolean parameter. When True (default), it raises a RuntimeError upon exhausting attempts. When False , it issues a RuntimeWarning and returns the unique candidates collected so far.
    • Improved uniqueness tracking by using a candidate_map keyed by sequence content.
  2. Infinite Loop in _algorithm.py :

    • Added validation in MCTS.init to ensure n_iterations >= 1 .
    • This prevents a "hard hang" in MCTS.run where a non-positive iteration count would cause the search rounds to execute zero times, leaving the base sequence unchanged and trapped in a non-progressing while loop.

What should a reviewer concentrate their feedback on?

  • The default value logic for max_attempts in recommend .
  • The implementation of the strict flag and whether it provides enough flexibility for different production environments.
  • The error message clarity when thresholds are reached.

Did you add any tests for the change?

Yes, several updates were made to the test suite:

  • Added test_recommend_relax_uniqueness in test_aptatrans.py to verify that strict=False correctly issues a warning and returns partial results.
  • Updated test_recommend_raises_when_unique_candidates_not_reached to verify the strict=True error path.
  • Verified test_init_invalid_n_iterations in pyaptamer/mcts/tests/test_mcts.py correctly catches non-positive iteration counts.

Any other comments?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks.
    ran the precommits
image

ran the tests locally they are passing
image

@direkkakkar319-ops direkkakkar319-ops marked this pull request as ready for review April 4, 2026 22:37
@direkkakkar319-ops

Copy link
Copy Markdown
Contributor Author

These files were added when i ran the command pre-commit run --all-files for pre-commit hooks
image
Screenshot 2026-04-05 051910

@direkkakkar319-ops

direkkakkar319-ops commented Apr 4, 2026

Copy link
Copy Markdown
Contributor Author

ran the checks locally
for code quality
video link: https://github.com/user-attachments/assets/84db1e7e-2e13-4c63-ad96-f5a8a9546c57

in this All actual code-quality steps passed. Post python environment step fails with exitcode '127': node not found — this is the act teardown bug, not your code

detect-notebooks-change with command git diff --quiet origin/main -- examples/ pyaptamer/ pyproject.toml && echo "No notebook changes" || echo "Notebook-related changes detected"
Evidence
image

run-jupyter-notebooks with command bash build_tools/run_all_notebooks.sh
Evidence
image
The notebooks were re-executed and modified in place (--inplace flag) so i committed the changes other wise the CI may see dirty files.

run-tests-no-extras with command python -m pytest pyaptamer -p no:warnings
Evidence (shown in the PR description also)
image

@siddharth7113

Copy link
Copy Markdown
Collaborator

Hi, This PR has a big diff and lots of unnecessary changes, I would request you to either revert those commits or open a new PR and make sure pre-commit is installed before you make new changes.

@siddharth7113

Copy link
Copy Markdown
Collaborator

I am converting this to a draft PR for now.

@siddharth7113 siddharth7113 marked this pull request as draft April 6, 2026 18:05
@direkkakkar319-ops

Copy link
Copy Markdown
Contributor Author

Hi @siddharth7113 i have reverted the unnecessary changes. All checks are now passing, and the PR should be in good shape for review.
Could you please take another look when you get a chance?

Thanks

@siddharth7113 siddharth7113 requested a review from NennoMP April 6, 2026 19:18
@siddharth7113 siddharth7113 marked this pull request as ready for review April 6, 2026 19:18
@NennoMP

NennoMP commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

Sorry to barge in but is this really an issue? The probability of MCTS returning duplicate sequences so often that we run into an infinite loop seems very low to me, especially given reasonable parameters (e.g., sequence length, number of candidates, etc.).

Could you showcase an example where the current implementations is stuck in an infinite loop? Otherwise, I am not sure whether this change is needed. Also, adding a guard like this introduces additional issues: the algorithm may return a lower number of candidates than the ones specified by the user.

@direkkakkar319-ops

Copy link
Copy Markdown
Contributor Author

Hi @NennoMP, you arre right, the infinite loop can't actually happen. The score in the tuple is a PyTorch tensor, which hashes by object identity, so set.add() treats every mcts.run() result as unique regardless of content. The loop always terminates in exactly n_candidates iterations.

>>> t1 = torch.tensor(0.5); t2 = torch.tensor(0.5)
>>> hash(t1) == hash(t2)
False  # different objects → different hashes
>>> s = set(); s.add(("A", "B", t1)); s.add(("A", "B", t2))
>>> len(s)
2  # both added despite identical content

@direkkakkar319-ops

Copy link
Copy Markdown
Contributor Author

However, this also means the current code doesn't truly deduplicate by sequence content — twoo identical sequences get kept as separate entries because their tensor objects differ.

My candidate_map approach fixes that, but it's a deduplication correctness fix, not an infinite loop fix.

Happy to rework the PR framing or close it if you prefer.
Do you want me to raise a

@NennoMP

NennoMP commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

@direkkakkar319-ops You are right on the fact that we are not deduplicating corrently, since we are also hashing a tensor for the score. If it wasn't for that, it would probably work, since the neural network running in inference mode likely returns the same score for identical sequences.

I would suggest to start fresh, new issue and new PR, and fix the no deduplication bug. Minimal changes without modifying other parts of the pipeline or MCTS, unless strictly needed.

The fix is relatively straightforward, either make sure the set checks for duplicates based only on the candidate sequence or use a dictionary with the candidate sequence as key. Some example code would look like this, but feel free to do it differently.

candidates = {}
while len(candidates) < n_candidates:
    result = mcts.run(verbose=verbose)
    candidate, sequence, score = tuple(result.values())
    if candidate not in candidates:
        candidates[candidate] = (candidate, sequence, score.item())

if verbose:
    for candidate, sequence, score in candidates.values():
        print(
            f"Candidate: {candidate}, "
            f"Sequence: {sequence}, "
            f"Score: {score:.4f}" 
        )

return set(candidates.values())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants