Is your feature request related to a problem? Please describe.
We are currently running into the problem that we have very large (3GB+) JSON files generated by ODD, but can't process them because we don't have enough RAM to parse the JSON.
I personally love JSON, but it seems like the format is not well-suited for the task (it's not streamable).
Now, you might ask, why don't you guys just use the .txt file?; the problem is that this is only created after the scan is finished, including file size estimations. After scanning a large OD for ~6h yesterday, I had a couple million links, with over 10M links left in queue for file size estimation. The actual urls were already there, but the only way to save them was through hitting J for saving as JSON.
Describe the solution you'd like
There are multiple features that would be useful for very large ODs:
- add a key command to prematurely save the
.txt-file
this should be no problem at all and is simply a missing option/command at this point
- adopt a new file format that supports streaming parsers
think jsonlines, csv, whatever
it might also be a good idea to restructure the meta info of the scan and files in order to remove duplicate info and make the output files smaller and easier to work with
- while we're at it, an option for saving the reddit output as well as error logs to a separate file would also be appreciated! :D
@MCOfficer and I would be glad to discuss the new file structure further, if you're so inclined :)
Is your feature request related to a problem? Please describe.
We are currently running into the problem that we have very large (3GB+) JSON files generated by ODD, but can't process them because we don't have enough RAM to parse the JSON.
I personally love JSON, but it seems like the format is not well-suited for the task (it's not streamable).
Now, you might ask, why don't you guys just use the .txt file?; the problem is that this is only created after the scan is finished, including file size estimations. After scanning a large OD for ~6h yesterday, I had a couple million links, with over 10M links left in queue for file size estimation. The actual urls were already there, but the only way to save them was through hitting
Jfor saving as JSON.Describe the solution you'd like
There are multiple features that would be useful for very large ODs:
.txt-filethis should be no problem at all and is simply a missing option/command at this point
think
jsonlines,csv, whateverit might also be a good idea to restructure the meta info of the scan and files in order to remove duplicate info and make the output files smaller and easier to work with
@MCOfficer and I would be glad to discuss the new file structure further, if you're so inclined :)