@@ -37,3 +37,74 @@ Two of the new format data readers are the ``python``, ``SMILES``, and
3737Several of these readers (SMILES and
3838:ref: `HDF5<sec:hdf5_data_reader> `) support the use of :ref: `sample
3939lists<sec:sample-lists>`.
40+
41+ "Really New" Data Subsystem
42+ ---------------------------
43+
44+ During execution LBANN will ingest one or more streams of data. There
45+ will be unique streams of data for each execution mode:
46+ - training
47+ - validation
48+ - tournament
49+ - testing
50+ - inference
51+
52+ Note that execution modes should become more flexible and should be
53+ able to be arbitrarily named.
54+
55+ The data stream object is responsible for keeping track of the "count"
56+ / state of that data stream for that execution context. For bounded /
57+ batched data streams, this would be the current position within the
58+ stream and the total number of passes over the stream. (index and
59+ epoch)
60+
61+ For infinite streams the object will just maintain the index /
62+ position within the stream.
63+
64+ In both cases it is necessary for the object to track the "step" size
65+ (i.e. mini-batch size). Additionally, because the data stream will be
66+ accessed in parallel, it is necessary to track the position of each
67+ rank within the stream in terms of offset.
68+
69+ ..
70+ Data source class file: The data source class tracks the statefule
71+ aspects of one logical stream of data.
72+ Data sources are either bounded or infinite
73+ data sources. The class is responsible for keeping track of state
74+ with respect to
75+
76+
77+ Sample list:
78+
79+ Track how to retrive a data set from the outside world. This
80+ typically is a set of file locations for each sample as well as a
81+ count of how many samples are in the set.
82+
83+ Data coordinator:
84+
85+ Responsible for managing one or more data streams for each execution
86+ context. It is
87+
88+
89+ data reader / loader:
90+
91+ Function to ingest bits from outside and place them into an in-memory
92+ object that is managed by the data coordinator.
93+
94+ Data store:
95+ in-memory data repository for holding samples that have been read in
96+
97+ io_data_buffer:
98+ Holds sample being fetched or the future of it.
99+
100+ data packer:
101+ copies data fields from conduit nodes and maps them to Hydrogen
102+ matrices. Specific to a data set
103+
104+ Data Set:
105+
106+ Composed of:
107+ - data reader
108+ - data stream
109+ - sample list
110+ - data packer
0 commit comments