You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 15, 2024. It is now read-only.
for line in lines:
if not line:
break
line = line.strip()
# Empty lines are used as document delimiters
if not line:
results.append([])
else:
#<OMITTED FOR BREVITY...>
return results
Suggests that empty or null lines (e.g. "" or None) break the for-loop returning only the lines that have been processed so far whereas stripped-empty lines (e.g. " ") are used as document delimiters.
Could someone shed light as to what the (empty line + break-from-loop) is meant to accomplish? Are empty/null lines used as delimiters?
Description
In the function
scripts.pretraining.bert.create_pretraining_data.tokenize_lines()The code snippet:
Suggests that empty or null lines (e.g.
""orNone) break the for-loop returning only the lines that have been processed so far whereas stripped-empty lines (e.g." ") are used as document delimiters.Could someone shed light as to what the (empty line + break-from-loop) is meant to accomplish? Are empty/null lines used as delimiters?