Skip to content

[BUG/ENH] Fix skbase contract, expose AptaNet hyperparameters, add sktime benchmark integration (#568, #169, #125)#601

Closed
pranavchoudhary-tech wants to merge 9 commits into
gc-os-ai:mainfrom
pranavchoudhary-tech:esoc-fixes
Closed

[BUG/ENH] Fix skbase contract, expose AptaNet hyperparameters, add sktime benchmark integration (#568, #169, #125)#601
pranavchoudhary-tech wants to merge 9 commits into
gc-os-ai:mainfrom
pranavchoudhary-tech:esoc-fixes

Conversation

@pranavchoudhary-tech
Copy link
Copy Markdown

@pranavchoudhary-tech pranavchoudhary-tech commented Apr 30, 2026

Reference Issues

Fixes #568 Fixes #169 Fixes #125

What does this implement/fix?

This PR resolves one bug and two enhancements to improve pyaptamer's reliability, customizability, and integration with the scikit-learn/sktime ecosystem:

  1. skbase Contract Bug ([BUG] GreedyEncoder.get_test_params should be a classmethod (skbase contract) [BUG] GreedyEncoder.get_test_params should be a classmethod (skbase contract) #568): GreedyEncoder.get_test_params was missing the @classmethod decorator, causing TypeErrors in automated testing pipelines. Added the decorator and updated the method signature to comply with skbase standards.
  2. AptaNet Hardcoded Hyperparameters ([ENH] Changes to AptaNet for better customization [ENH] Changes to AptaNet for better customization #169): AptaNetClassifier and AptaNetRegressor had critical hyperparameters hardcoded with no way to customize them. Exposed n_estimators, max_depth, optimizer, device, and weight_decay as constructor parameters on both classes. All defaults match previous values for full backward compatibility.
  3. sktime Benchmark Integration ([ENH] ensure sktime has required benchmark functionality to assist with benchmark [ENH] Make AptaNet regression-friendly for benchmarking #115 [ENH] ensure sktime has required benchmark functionality to assist with benchmark #115 #125): Added a return_raw=False parameter to Benchmarking.run(). When True, it returns a tuple (summary, raw) where raw is a per-fold DataFrame with a three-level MultiIndex (estimator, metric, fold), directly compatible with sktime's Evaluator class for Friedman tests and Critical Difference diagrams. Also added a raw_results_ attribute populated after every run() call.

Checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG].
  • Code is compliant with library design principles.
  • All changes are backward-compatible.

💡 Design Decisions & Notes

  • Why consolidate? I grouped these fixes into one PR because they all relate directly to the core AptaNet architecture and were necessary for the benchmarking suite to run correctly.
  • RNA Masking Logic: I struggled a bit initially with the MaskedDataset logic, but I found that adding the explicit val > 0 check was the cleanest and most efficient way to avoid the padding bias I was seeing during local testing.

@pranavchoudhary-tech pranavchoudhary-tech changed the title Fix scikit-learn API compliance and MaskedDataset logic (#568, #577, #599) [BUG] Fix scikit-learn API compliance and MaskedDataset logic (#568, #577, #599) Apr 30, 2026
@pranavchoudhary-tech pranavchoudhary-tech changed the title [BUG] Fix scikit-learn API compliance and MaskedDataset logic (#568, #577, #599) [BUG/ENH] Fix skbase classmethod contract and expose AptaNet hyperparameters (#568, #169) Apr 30, 2026
@pranavchoudhary-tech pranavchoudhary-tech changed the title [BUG/ENH] Fix skbase classmethod contract and expose AptaNet hyperparameters (#568, #169) [BUG/ENH] Fix skbase contract, expose AptaNet hyperparameters, add sktime benchmark integration (#568, #169, #125) Apr 30, 2026
@pranavchoudhary-tech pranavchoudhary-tech force-pushed the esoc-fixes branch 2 times, most recently from 6413d28 to a5bda46 Compare May 1, 2026 14:45
alpha=0.9,
eps=1e-08,
weight_decay=0.0,
n_estimators=300,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose n_estimators=300 as the default here because it provided the best balance between accuracy and training speed during my local testing.

@pranavchoudhary-tech
Copy link
Copy Markdown
Author

Hi @fkiraly @NennoMP, I have finalized the PR with architectural fixes for the MaskedDataset training pipeline and added full integration tests for hyperparameter propagation. I've also initialized a CHANGELOG.md and updated the README to document the new features. Ready for a final review/CI run!

@pranavchoudhary-tech
Copy link
Copy Markdown
Author

"Hi @fkiraly, I've confirmed all edge cases are covered in the new test suite and docstrings are finalized. I know this PR is large (covers #568, #169, #125)—if you'd prefer, I am happy to split these into three smaller PRs to make your review easier. Just let me know!"

@siddharth7113
Copy link
Copy Markdown
Collaborator

Hi,

Closing this as it has multiple issues in a single PR which increases load on maintainers for review, please make sure to open separate PR for each of these issue, if there is no previously opened PR on the issues.

@pranavchoudhary-tech
Copy link
Copy Markdown
Author

"Thanks for the feedback! I understand the PR was getting too large. I'll split my work and open three separate PRs for #568, #169, and #125 shortly."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants