[lFX Term 1 2026 ] Restoring Ianvs LLM-Agent setup and usage#407
[lFX Term 1 2026 ] Restoring Ianvs LLM-Agent setup and usage#407NishantSinghhhhh wants to merge 1 commit intokubeedge:mainfrom
Conversation
feat: add requirements.txt for dependencies fix: refactor basemodel.py for improved readability and functionality refactor: enhance rouge.py to utilize RougeScorer for metric calculations Signed-off-by: NishantSinghhhhh <nishantsingh_230137@aitpune.edu.in>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: NishantSinghhhhh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Screencast.from.2026-04-23.13-42-27.webm@MooreZheng sir, After making all these changes I was able to restore LLM-Agent Benchmark and run it successfully |
There was a problem hiding this comment.
Code Review
This pull request significantly updates the Ianvs LLM-Agent benchmark by providing a comprehensive reproduction guide, adding a requirements file, and refactoring the core model and evaluation logic. Key changes include a rewritten predict method that correctly slices prompt tokens from the output and an updated ROUGE scoring implementation using the rouge_score library. Review feedback focuses on ensuring input tensors are moved to the correct device, removing redundant imports, adopting idiomatic boolean checks, and utilizing the internal calculate_mean function to prevent potential division-by-zero errors in metric calculations.
feat: add requirements.txt for dependencies
fix: refactor basemodel.py for improved readability and functionality
refactor: enhance rouge.py to utilize RougeScorer for metric calculations
What type of PR is this?
/kind feature
/kind cleanup
What this PR does / why we need it:
This PR fixes and refactors the
llm-agentsingletask learning benchmark to make it fully functional end-to-end. The original example code had several issues that prevented it from running: broken relative paths, a missing dataset, deprecated HuggingFace API arguments, a name collision with the Ianvs framework lifecycle hook, and a broken ROUGE metric script.Changes included:
requirements.txt: Added a
requirements.txtlisting all dependencies needed to run the LLM-agent benchmark (torch,transformers,peft,datasets,evaluate,rouge_score), which were previously undocumented and missing from the environment.basemodel.py:
use_auth_token=argument withtoken=to match current HuggingFacetransformersAPIpreprocess(self, **kwargs)lifecycle hook required by the Ianvs singletask learning frameworkpreprocess()→_preprocess_sample()to avoid collision with the framework hook_preprocess_sample()signature to accept plain strings instead of a samples object_preprocess_sample()(removed erroneous[None]wrapper)str()cast intrain()loop when iteratingtrain_data.x/train_data.yto handlenumpy.str_types that caused tokenizer failuresrouge.py:
EOFtoken at end of file (invalid Python causingNameErroron import)evaluate.load()(which required a local metrics folder that did not exist) with directrouge_score.rouge_scorer.RougeScorercallsy_predhandling to usestr()cast instead of["generated_text"]dict access, matching the plain-string output ofbasemodel.predict()Which issue(s) this PR fixes:
Fixes #