zreplicate performance enhancements

On large datasets, the zfs get creation and zfs list that happens when zreplicate starts takes ages.  These need to query specifically and only the dataset subtree they are gonna be manipulating, and at the same time if possible have these queries be parallelized.