Description
When using the register_pandas_dataframe()
method as suggested by the tip here, I get a system memory error:
Error Code: ScriptExecution.ReadDataFrame.Unexpected
Failed Step: 1d7c20d4-6b70-4075-8b68-74f51264bf5b
Error Message: ScriptExecutionException was caused by ReadDataFrameException. Unexpected exception during ReadDataframeFromSocket.
Failed to read DataFrame from host. Exception of type 'System.OutOfMemoryException' was thrown.
When I run the same method with the same dataframe from a jupyter notebook it works as expected. I have ensured I have enough memory on my system when running the script (1.42 MiB dataframe and 6GB RAM free) so I don't think that's an issue. I know this method is experimental, so I am using the older method and it's working fine.
Another issue that sometimes happens (if it's not the system memory issue) is a streamAccessValidation error
azureml.dataprep.api.errorhandlers.ExecutionError:
Error Code: ScriptExecution.ReadDataFrame.StreamAccess.Validation
Validation Error Code: Invalid
Validation Target: PreppyFile
Failed Step: bfd9a1d5-c01f-485a-8761-99cd0a41d0c3
Error Message: ScriptExecutionException was caused by ReadDataFrameException.
Failed to read Pandas DataFrame form Python host. Make sure Dataflow is created directly from the source Pandas DataFrame.
StreamAccessException was caused by ValidationException.
Trying to read an invalid file. Missing sentinel value in the beginning
| session_id=710cc9c3-4478-4be9-998c-0e4a009800f5
Again, this does not happen when using the method from a jupyter notebook, only when running a script on my local machine.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: 9b1aee32-5a1f-2ef1-ee76-30b17999db87
- Version Independent ID: f4785272-a81d-e9cc-c6a5-9d41c8dcd90b
- Content: azureml.data.dataset_factory.TabularDatasetFactory class - Azure Machine Learning Python
- Content Source: AzureML-Docset/stable/docs-ref-autogen/azureml-core/azureml.data.dataset_factory.TabularDatasetFactory.yml
- Service: machine-learning
- Sub-service: core
- GitHub Login: @DebFro
- Microsoft Alias: debfro