Hi, guys!
Can you provide some tips for improving performance of data loading from hive table to Vertica cluster?
My use case of using Vertica is
Spark jobs for preparing data (some transformations of business data), and result of jobs write to hive table in ORC format. After that, I run the sh script for loading to vertica
COPY vertica_schema.$my_table_new FROM $table_path_hdfs ON ANY NODE ORC.
Performance of this solution is terrible, for example loading of data with value of 1.5B rows takes > 4 hours. So, how to improve it?
Vertica cluster hosts: 6 machines
Hadoop (hive) cluster: 30 machines
Hi, guys!
Can you provide some tips for improving performance of data loading from hive table to Vertica cluster?
My use case of using Vertica is
Spark jobs for preparing data (some transformations of business data), and result of jobs write to hive table in ORC format. After that, I run the sh script for loading to vertica
COPY vertica_schema.$my_table_new FROM $table_path_hdfs ON ANY NODE ORC.Performance of this solution is terrible, for example loading of data with value of 1.5B rows takes > 4 hours. So, how to improve it?
Vertica cluster hosts: 6 machines
Hadoop (hive) cluster: 30 machines