-
Couldn't load subscription status.
- Fork 39
Open
Description
Hi I made some changes in here:
https://github.com/justRishi/django-raster/tree/improve-working-with-big-raster-files-needs-lot-of-ram
I put it here as an issue , as not sure If you would like my pull request.
Description:
Problem statement
Big raster files in AWS issues
- /tmp storage gets full with 20GB docker image limit in AWS
- it takes 7 /8 hours to process a 10GB Sentinel 2 tif
Query in parser.py to be written to db , is to big
in process_quadrant bulk_create a result which is a to big query to be fired off by postgres, query string is just to big and results in postgres firing off an out of memory error.
Can not remove raster layers from django admin
Changes
- GDAL vsimem for creating tiles in memory (and not in tmp files) in parser.py
dest_file_name = os.path.join('/vsimem/', '{}.tif'.format(uuid.uuid4())) - added to .bulk_create second parameter with with default 50 , so query to be written to db is not to big, 2nd parameter(for
write bulk in batches of 2nd param) is new since Django (X?). - removed from admin.py the following:
def has_delete_permission(self, request, obj=None):
return False
Solves
- processing time reduced to 1.5 hours of processing for 10GB S2 raster files (when using 16GB RAM and 2 CPU)
- no "query string buffer is to big" errors from postgres
Drawbacks using vsimem:
- vsimem needs a lot of more memory to process files , when not enough RAM celery crashes
- vsimem seems only to work nice with all_in_one parameter set (RASTER_PARSE_SINGLE_TASK = True )
Drawback using max batch-size parameter in bulkcreate:
will not work with old Django versions .
Metadata
Metadata
Assignees
Labels
No labels