-
Notifications
You must be signed in to change notification settings - Fork 95
Make field more robust with None/nan repeated input #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
marcenacp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| @@ -0,0 +1,3 @@ | |||
| {"default/custom_instruction": "None", "default/topic": "None", "default/model_name": "None", "default/model": "None", "default/skip_prompt_formatting": false, "default/category": "orca", "default/views": null, "default/language": "None", "default/id": "None", "default/title": "None", "default/idx": "None", "default/hash": "None", "default/avatarUrl": "None", "default/system_prompt": "None", "default/source": "airoboros2.2"} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are some of the values "None" and some of the values null?
| else: | ||
| raise ValueError(f"`format` only applies to dates. Got {field.data_type}") | ||
| elif transform.separator is not None: | ||
| elif transform.separator is not None and not _is_na(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the is_na case be handled at the top of the function?
It seems like it could also be a problem for other transform (regex, json_path, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
| [{field.id: v} for v in value] if not _is_na(value) else [{field.id: value}] | ||
| ) | ||
| else: | ||
| if not _is_na(value) and len(value) != len(result[parent_id]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function be simplified? Something like:
value = [{field.id: None}] if _is_na(value) else value
existing_value = result.get(parent_id, value)
if not _is_na(value) and len(value) != len(existing_value):
raise ValueError(...)
for i in range(existing_value):
result[parent_id][i][field.id] = None if _is_na(value) else value[i]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified it a bit, hope it goes in the direction you had in mind!
| value = ( | ||
| [_cast_value(self.node.ctx, v, field.data_type) for v in value] | ||
| if not _is_na(value) | ||
| else value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this function benefit an early return as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand (we return result, not value) but I changed it to a if..elif..else, it should be more readable now :)
marcenacp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve to unblock but left a few comments.
Uh oh!
There was an error while loading. Please reload this page.