Skip to content

Commit 3093425

Browse files
authored
Tkurth/stats revamp (#31)
* some improvements todata processing scripts, including multi dimensional virtual concatenation, making it easier to extend exiting datasets * fixing tests * rewritten stats calculation, now running on GPU and using NCCL for comms. Also, improved comms pattern * cleanup
1 parent b179910 commit 3093425

File tree

2 files changed

+350
-208
lines changed

2 files changed

+350
-208
lines changed

data_process/concatenate_dataset.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ def concatenate(input_dirs: List[str], output_file: str, metadata: dict, channel
4040
Parameters
4141
----------
4242
input_dir: List[str]
43-
directories to where the dataset files are located which are to be concatenated.
44-
Files inside that list will concatenated in time and files from different lists will be concatenated in channel dimension.
43+
List of directories directories in which the dataset files are located which are to be concatenated.
44+
Files inside a sublist will be concatenated in time and files from different lists will be concatenated in channel dimension.
4545
output_file: str
4646
file name of the concatenated dataset, has to include the full path.
4747
metadata : dict

0 commit comments

Comments
 (0)