Error message when using a data with 100s of cols

When a DS is large and has a big number of columns, the scan function `scan.execute(scan_definition, df)`  fails with a spark OOM issue in the master due to the collection part of the metrics. A more meaningful message here would help to avoid miss leading the developer and let them know that the final result is too large and should be either filtered or split.