When a DS is large and has a big number of columns, the scan function scan.execute(scan_definition, df) fails with a spark OOM issue in the master due to the collection part of the metrics. A more meaningful message here would help to avoid miss leading the developer and let them know that the final result is too large and should be either filtered or split.