Skip to content

Conversation

@BartekCupial
Copy link
Collaborator

@BartekCupial BartekCupial commented Dec 2, 2024

Description

Implements Few-Shot Learning capabilities to enable model evaluation with expert trajectories in the context.

To download demonstrations

pip install gdown
gdown https://drive.google.com/uc?export=download&id=1TQbrqMSC5K_SNx9tta1Tlhtg8flSIGaJ
unzip records.zip

To run Few-Shot Learning

python -m eval agent.type=few_shot eval.icl_episodes=5

Prompt formatting

Example prompt for the agent starting playing the game with eval.icl_episodes=1

00 = Message(role=user, content=System Prompt: [], attachment=None)
01 = Message(role=user, content=****** START OF DEMONSTRATION EPISODE 1 ******, attachment=None)
02 = Message(role=user, content=Obesrvation: [], attachment=None)
03 = Message(role=assistant, content=None, attachment=None)
04 = Message(role=user, content=Obesrvation: [], attachment=None)
05 = Message(role=assistant, content=go forward, attachment=None)
06 = Message(role=user, content=Obesrvation: [], attachment=None)
07 = Message(role=assistant, content=go forward, attachment=None)
08 = Message(role=user, content=Obesrvation: [], attachment=None)
09 = Message(role=assistant, content=turn left, attachment=None)
10 = Message(role=user, content=****** END OF DEMONSTRATION EPISODE 1 ******, attachment=None)
11 = Message(role=user, content=****** Now it's your turn to play the game! ******, attachment=None)
12 = Message(role=user, content=Current Observation:

Features

  • each demonstration have corresponding mp4 file, which allows for quick inspection
  • FewShotAgent allows for context caching, can be enabled with agent.cache_icl=True

Additional Notes:

  • all trajectories are loaded in context, this can increase the cost of evaluation, especially for environments like nethack
  • for textworld environments we avoid the case where we put the solution into the context
  • in principle we also could incorporate similar strategy for other environments, for example in nle we could load trajectories corresponding to the same character

@BartekCupial BartekCupial force-pushed the feat/few_shot_learning branch from ee82177 to 542d224 Compare December 4, 2024 07:54
@DavidePaglieri DavidePaglieri self-requested a review December 4, 2024 15:58
Copy link
Contributor

@DavidePaglieri DavidePaglieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@BartekCupial BartekCupial merged commit 67a8d26 into balrog-ai:main Dec 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants