-
Notifications
You must be signed in to change notification settings - Fork 15
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
License
shjwudp/c4-dataset-script
ErrorLooks like something went wrong!
About
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
Topics
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published