-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
Your PR contains a change to a task. Please paste the results of the following command into a comment: python tests/datatests/test_new_tasks.py |
Still marked as draft but a quick eyeball, everything looks great to me! |
The only thing I was unsure about was the numbers, they have references to movie titles so I wasn't sure if I should substitute the numbers with those titles or leave it as is. |
I think we'd have to go back to the original paper and figure out why the data is like that. Do we know? |
I believe it was to make tagging movies easier and to match mentioned-movies to specific movie names. The examples they have (https://redialdata.github.io/website/examples) all use the actual movie names rather than these numbers, so maybe I should replace those numbers with actual titles then. |
two versions of dataset? But yes, like this it isn't going to be very useful, I would say the one with the real names will make more sense.
…On Tue, May 5, 2020 at 8:21 PM Dom Rigoglioso ***@***.***> wrote:
I believe it was to make tagging movies easier and to match
mentioned-movies to specific movie names. The examples they have (
https://redialdata.github.io/website/examples) all use the actual movie
names rather than these numbers, so maybe I should replace those numbers
with actual titles then.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2630 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUOJ6FOYY6DUOUVTROEPZLRQCUPVANCNFSM4MZ2MECQ>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job! just a few nits
parlai/tasks/redial/agents.py
Outdated
|
||
def __init__(self, opt, shared=None): | ||
super().__init__(opt, shared) | ||
jsonl_path = _path(opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this has other files besides jsonl
s, maybe just datapath
?
parlai/tasks/redial/agents.py
Outdated
def _setup_data(self, jsonl_path): | ||
train_path = os.path.join(jsonl_path, 'train_data.jsonl') | ||
test_path = os.path.join(jsonl_path, 'test_data.jsonl') | ||
valid_split = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this number come from? could you perhaps leave a comment? (i.e. if this is specified in the paper)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They split the data set up 80/10/10, so I assumed the test data was split 50/50. Looking at this now this would be closer to 90/5/5, so I think it might be better to get valid data from the train data and leave the test data as is, since that would be closer to 80/10/10.
parlai/tasks/task_list.py
Outdated
"id": "ReDial", | ||
"display_name": "ReDial", | ||
"task": "redial", | ||
"tags": ["All", "ChitChat"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also add the Goal
tag since you could see movie recommendation as task/goal-oriented
* redial dataset * fix error where initiator in convo speaks last * added chitchat tag * fix task list description * map @Number to movie titles * deleted comment * add shared, fix end of odd-length episodes no reply * nits/data split Co-authored-by: Stephen Roller <[email protected]>
Patch description
Adding ReDial dataset to tasks, as mentioned in #492
Testing steps
Ran
display_data.py
and checked that conversations were correct. One episode from each below.Logs
python3 examples/display_data.py -t redial --datatype train
python3 examples/display_data.py -t redial --datatype valid
python3 examples/display_data.py -t redial --datatype test
Other Information
The dataset only had a test set and no valid set, so I split the test up 50/50 into valid/test.
Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py
.