ReDial Dataset #2630

domrigoglioso · 2020-05-05T19:00:14Z

Patch description
Adding ReDial dataset to tasks, as mentioned in #492

Testing steps
Ran display_data.py and checked that conversations were correct. One episode from each below.

Logs
python3 examples/display_data.py -t redial --datatype train

Hi there, how are you? I'm looking for movie recommendations
   I am doing okay. What kind of movies do you like?
I like animations like The Triplets of Belleville and Waking Life I also enjoy Mary and Max Anything artistic
   You might like The Boss Baby that was a good movie.
What's it about?
   It has Alec Baldwin it is about a baby that works for a company and gets adopted it is very funny
That seems like a nice comedy Do you have any animated recommendations that are a bit more dramatic? Like A Scanner Darkly  for example I like comedies but I prefer films with a little more depth
   That is a tough one but I will remember something Final Fantasy: The Spirits Within was a good one
Ooh that seems cool! Thanks for the input. I'm ready to submit if you are.
   It is animated, sci fi, and has action Glad I could help
Nice Take care, cheers!

python3 examples/display_data.py -t redial --datatype valid

- - - NEW EPISODE: redial - - -
Hi I am looking for a movie like Super Troopers
   You should watch Police Academy 
Is that a great one? I have never seen it. I have seen American Pie  I mean American Pie 
   Yes Police Academy  is very funny and so is Police Academy 2: Their First Assignment
It sounds like I need to check them out
   yes you will enjoy them
I appreciate your time. I will need to check those out. Are there any others you would recommend?
   yes Lethal Weapon
Thank you i will watch that too
   and also Beverly Hills Cop
Thanks for the suggestions.
   you are welcome and also 48 Hrs.
thanks goodbye

python3 examples/display_data.py -t redial --datatype test

Hello there. Looking for a good movie?
   Hello How are you? Always
I am well, you?
   I'm not picky I'm fine thank you
Well, I just saw Wind River   and it’s a good mystery/ drama
   Oooh. Sounds good!
I also enjoyed Avengers: Infinity War and Solo: A Star Wars Story Straight forward action movies
   Hmm. Nice! All options I have not seen but heard great things about I did see Star Wars  and really enjoyed that! :)
For a good good comedy, I recommend Game Night Well, hopefully you can enjoy one of those.

Other Information
The dataset only had a test set and no valid set, so I split the test up 50/50 into valid/test.

Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py.

python3 tests/datatests/test_new_tasks.py
.
----------------------------------------------------------------------
Ran 1 test in 641.127s

OK

github-actions · 2020-05-05T19:00:38Z

Your PR contains a change to a task. Please paste the results of the following command into a comment:

python tests/datatests/test_new_tasks.py

stephenroller · 2020-05-05T23:49:28Z

Still marked as draft but a quick eyeball, everything looks great to me!

domrigoglioso · 2020-05-05T23:53:43Z

The only thing I was unsure about was the numbers, they have references to movie titles so I wasn't sure if I should substitute the numbers with those titles or leave it as is.

stephenroller · 2020-05-06T00:09:18Z

I think we'd have to go back to the original paper and figure out why the data is like that. Do we know?

domrigoglioso · 2020-05-06T00:20:51Z

I believe it was to make tagging movies easier and to match mentioned-movies to specific movie names. The examples they have (https://redialdata.github.io/website/examples) all use the actual movie names rather than these numbers, so maybe I should replace those numbers with actual titles then.

jaseweston · 2020-05-06T00:22:14Z

two versions of dataset? But yes, like this it isn't going to be very useful, I would say the one with the real names will make more sense.

…

On Tue, May 5, 2020 at 8:21 PM Dom Rigoglioso ***@***.***> wrote: I believe it was to make tagging movies easier and to match mentioned-movies to specific movie names. The examples they have ( https://redialdata.github.io/website/examples) all use the actual movie names rather than these numbers, so maybe I should replace those numbers with actual titles then. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUOJ6FOYY6DUOUVTROEPZLRQCUPVANCNFSM4MZ2MECQ> .

klshuster

nice job! just a few nits

klshuster · 2020-05-21T21:09:13Z

parlai/tasks/redial/agents.py

+
+    def __init__(self, opt, shared=None):
+        super().__init__(opt, shared)
+        jsonl_path = _path(opt)


nit: looks like this has other files besides jsonls, maybe just datapath?

klshuster · 2020-05-21T21:10:40Z

parlai/tasks/redial/agents.py

+    def _setup_data(self, jsonl_path):
+        train_path = os.path.join(jsonl_path, 'train_data.jsonl')
+        test_path = os.path.join(jsonl_path, 'test_data.jsonl')
+        valid_split = 0.5


where does this number come from? could you perhaps leave a comment? (i.e. if this is specified in the paper)

They split the data set up 80/10/10, so I assumed the test data was split 50/50. Looking at this now this would be closer to 90/5/5, so I think it might be better to get valid data from the train data and leave the test data as is, since that would be closer to 80/10/10.

klshuster · 2020-05-21T21:14:29Z

parlai/tasks/task_list.py

+        "id": "ReDial",
+        "display_name": "ReDial",
+        "task": "redial",
+        "tags": ["All", "ChitChat"],


we could also add the Goal tag since you could see movie recommendation as task/goal-oriented

* redial dataset * fix error where initiator in convo speaks last * added chitchat tag * fix task list description * map @Number to movie titles * deleted comment * add shared, fix end of odd-length episodes no reply * nits/data split Co-authored-by: Stephen Roller <[email protected]>

domrigoglioso added 2 commits May 5, 2020 14:44

redial dataset

a35f0c2

fix error where initiator in convo speaks last

d8d1664

facebook-github-bot added the CLA Signed label May 5, 2020

domrigoglioso added 2 commits May 5, 2020 15:02

added chitchat tag

4e83339

fix task list description

fa13ae2

domrigoglioso marked this pull request as ready for review May 5, 2020 23:56

domrigoglioso added 3 commits May 6, 2020 16:34

map @Number to movie titles

50f7c99

deleted comment

a426627

add shared, fix end of odd-length episodes no reply

a18537e

domrigoglioso requested a review from klshuster May 21, 2020 21:02

klshuster approved these changes May 21, 2020

View reviewed changes

domrigoglioso and others added 2 commits May 21, 2020 17:38

nits/data split

e5ebe07

Merge branch 'master' into redial

b86c3ef

stephenroller merged commit 99077d5 into master May 22, 2020

stephenroller deleted the redial branch May 22, 2020 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReDial Dataset #2630

ReDial Dataset #2630

domrigoglioso commented May 5, 2020 •

edited

Loading

github-actions bot commented May 5, 2020

stephenroller commented May 5, 2020

domrigoglioso commented May 5, 2020

stephenroller commented May 6, 2020

domrigoglioso commented May 6, 2020

jaseweston commented May 6, 2020 via email •

edited

Loading

klshuster left a comment

klshuster May 21, 2020

klshuster May 21, 2020

domrigoglioso May 21, 2020

klshuster May 21, 2020

ReDial Dataset #2630

ReDial Dataset #2630

Conversation

domrigoglioso commented May 5, 2020 • edited Loading

github-actions bot commented May 5, 2020

stephenroller commented May 5, 2020

domrigoglioso commented May 5, 2020

stephenroller commented May 6, 2020

domrigoglioso commented May 6, 2020

jaseweston commented May 6, 2020 via email • edited Loading

klshuster left a comment

Choose a reason for hiding this comment

klshuster May 21, 2020

Choose a reason for hiding this comment

klshuster May 21, 2020

Choose a reason for hiding this comment

domrigoglioso May 21, 2020

Choose a reason for hiding this comment

klshuster May 21, 2020

Choose a reason for hiding this comment

domrigoglioso commented May 5, 2020 •

edited

Loading

jaseweston commented May 6, 2020 via email •

edited

Loading