Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

ReDial Dataset #2630

Merged
merged 9 commits into from
May 22, 2020
Merged

ReDial Dataset #2630

merged 9 commits into from
May 22, 2020

Conversation

domrigoglioso
Copy link
Contributor

@domrigoglioso domrigoglioso commented May 5, 2020

Patch description
Adding ReDial dataset to tasks, as mentioned in #492

Testing steps
Ran display_data.py and checked that conversations were correct. One episode from each below.

Logs
python3 examples/display_data.py -t redial --datatype train

Hi there, how are you? I'm looking for movie recommendations
   I am doing okay. What kind of movies do you like?
I like animations like The Triplets of Belleville and Waking Life I also enjoy Mary and Max Anything artistic
   You might like The Boss Baby that was a good movie.
What's it about?
   It has Alec Baldwin it is about a baby that works for a company and gets adopted it is very funny
That seems like a nice comedy Do you have any animated recommendations that are a bit more dramatic? Like A Scanner Darkly  for example I like comedies but I prefer films with a little more depth
   That is a tough one but I will remember something Final Fantasy: The Spirits Within was a good one
Ooh that seems cool! Thanks for the input. I'm ready to submit if you are.
   It is animated, sci fi, and has action Glad I could help
Nice Take care, cheers!

python3 examples/display_data.py -t redial --datatype valid

- - - NEW EPISODE: redial - - -
Hi I am looking for a movie like Super Troopers
   You should watch Police Academy 
Is that a great one? I have never seen it. I have seen American Pie  I mean American Pie 
   Yes Police Academy  is very funny and so is Police Academy 2: Their First Assignment
It sounds like I need to check them out
   yes you will enjoy them
I appreciate your time. I will need to check those out. Are there any others you would recommend?
   yes Lethal Weapon
Thank you i will watch that too
   and also Beverly Hills Cop
Thanks for the suggestions.
   you are welcome and also 48 Hrs.
thanks goodbye

python3 examples/display_data.py -t redial --datatype test

Hello there. Looking for a good movie?
   Hello How are you? Always
I am well, you?
   I'm not picky I'm fine thank you
Well, I just saw Wind River   and it’s a good mystery/ drama
   Oooh. Sounds good!
I also enjoyed Avengers: Infinity War and Solo: A Star Wars Story Straight forward action movies
   Hmm. Nice! All options I have not seen but heard great things about I did see Star Wars  and really enjoyed that! :)
For a good good comedy, I recommend Game Night Well, hopefully you can enjoy one of those.


Other Information
The dataset only had a test set and no valid set, so I split the test up 50/50 into valid/test.

Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py.

python3 tests/datatests/test_new_tasks.py
.
----------------------------------------------------------------------
Ran 1 test in 641.127s

OK

@github-actions
Copy link

github-actions bot commented May 5, 2020

Your PR contains a change to a task. Please paste the results of the following command into a comment:

python tests/datatests/test_new_tasks.py

@stephenroller
Copy link
Contributor

Still marked as draft but a quick eyeball, everything looks great to me!

@domrigoglioso
Copy link
Contributor Author

The only thing I was unsure about was the numbers, they have references to movie titles so I wasn't sure if I should substitute the numbers with those titles or leave it as is.

@domrigoglioso domrigoglioso marked this pull request as ready for review May 5, 2020 23:56
@stephenroller
Copy link
Contributor

I think we'd have to go back to the original paper and figure out why the data is like that. Do we know?

@domrigoglioso
Copy link
Contributor Author

I believe it was to make tagging movies easier and to match mentioned-movies to specific movie names. The examples they have (https://redialdata.github.io/website/examples) all use the actual movie names rather than these numbers, so maybe I should replace those numbers with actual titles then.

@jaseweston
Copy link
Contributor

jaseweston commented May 6, 2020 via email

Copy link
Contributor

@klshuster klshuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job! just a few nits


def __init__(self, opt, shared=None):
super().__init__(opt, shared)
jsonl_path = _path(opt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like this has other files besides jsonls, maybe just datapath?

def _setup_data(self, jsonl_path):
train_path = os.path.join(jsonl_path, 'train_data.jsonl')
test_path = os.path.join(jsonl_path, 'test_data.jsonl')
valid_split = 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this number come from? could you perhaps leave a comment? (i.e. if this is specified in the paper)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They split the data set up 80/10/10, so I assumed the test data was split 50/50. Looking at this now this would be closer to 90/5/5, so I think it might be better to get valid data from the train data and leave the test data as is, since that would be closer to 80/10/10.

"id": "ReDial",
"display_name": "ReDial",
"task": "redial",
"tags": ["All", "ChitChat"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also add the Goal tag since you could see movie recommendation as task/goal-oriented

@stephenroller stephenroller merged commit 99077d5 into master May 22, 2020
@stephenroller stephenroller deleted the redial branch May 22, 2020 04:51
Gnivom pushed a commit to Gnivom/ParlAI that referenced this pull request Jun 8, 2020
* redial dataset

* fix error where initiator in convo speaks last

* added chitchat tag

* fix task list description

* map @Number to movie titles

* deleted comment

* add shared, fix end of odd-length episodes no reply

* nits/data split

Co-authored-by: Stephen Roller <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants