-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unordered StridedRectangle, get_allowed_machine_views
#1458
base: repo-refactor
Are you sure you want to change the base?
Unordered StridedRectangle, get_allowed_machine_views
#1458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 35 of 41 files at r1, all commit messages.
Reviewable status: 35 of 41 files reviewed, 14 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/pcg/include/pcg/device_coordinates.struct.toml
line 1 at r1 (raw file):
# DeviceCoordinates is exclusive to machine_view.cc, must not be used outside of it.
I don't think this requirement is necessary anymore with the new design?
lib/pcg/include/pcg/strided_rectangle.struct.toml
line 17 at r1 (raw file):
"utils/hash/unordered_multiset.h", "utils/fmt/unordered_multiset.h", ]
utils/hash
and utils/fmt
are only needed for the implementation, not the declaration
Suggestion:
includes = [
"pcg/strided_rectangle_side.dtg.h",
"<unordered_set>",
]
src_includes = [
"utils/hash/unordered_multiset.h",
"utils/fmt/unordered_multiset.h",
]
lib/pcg/src/pcg/device_id.cc
line 29 at r1 (raw file):
int get_raw_id(device_id_t device_id) { if (get_device_type(device_id) == DeviceType::GPU) {
switch
might be cleaner? You can also use overload
if you'd prefer
Code quote:
if (get_device_type(device_id) == DeviceType::GPU) {
lib/pcg/src/pcg/device_id.cc
line 35 at r1 (raw file):
} else { assert(false && "Unsupported DeviceType"); return -1;
assert
s will get removed in non-debug builds
Suggestion:
throw mk_runtime_error(fmt::format("Unsupported DeviceType {} for device_id_t {}", get_device_type(device_id), device_id));
lib/pcg/test/src/pcg/machine_view.cc
line 66 at r1 (raw file):
CHECK(expected == result); } SUBCASE("get_last_device_id") {
Test functions separately (i.e., group by get_last_device_id
rather than by MachineView
tested on) except for particularly trivial utility functions
lib/pcg/test/src/pcg/machine_view.cc
line 70 at r1 (raw file):
} } SUBCASE("MachineView #2") {
Use better test names
lib/pcg/test/src/pcg/machine_view.cc
line 93 at r1 (raw file):
} TEST_CASE("MachineView make_1d_machine_view - GPU") {
The return type here isn't really necessary and is a bit confusing
Suggestion:
TEST_CASE("make_1d_machine_view - GPU") {
lib/utils/include/utils/containers/foldl.h
line 27 at r1 (raw file):
C remaining(it, c.end()); return foldl(remaining, init, func); }
Suggestion:
template <typename C, typename F, typename E = typename C::value_type>
auto foldl1(C const &c, F func) -> typename C::value_type {
if (c.empty()) {
throw mk_runtime_error(fmt::format("foldl1 received empty container: {}", c));
}
std::optional<E> result;
for (E const &e : c) {
if (!result.has_value()) {
result = e;
} else {
result = f(result.value(), e);
}
}
return result.value();
}
lib/utils/include/utils/containers/range.h
line 11 at r1 (raw file):
template <typename T> std::vector<T> range(T start, T end, T step = 1) {
C++ doesn't do generics super well so I'd lean toward concrete types unless there's a strong need for something more flexible
Suggestion:
std::vector<int> range(int start, int end, int step = 1) {
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
template <typename T> std::unordered_multiset<T> replicate(std::size_t n, T const &element) {
Prefer int
--size_t
intuitively seems like it should be good (it represents non-negative values!) but actually what it does is just remove the failure case by overflowing so you can't even assert that the value is okay. At some point we may add a non-negative integer type with the proper semantics, but we don't have one yet
Code quote:
std::size_t
lib/utils/include/utils/containers/scanl.h
line 9 at r1 (raw file):
template <typename C, typename F, typename T> std::vector<T> scanl(C const &c, T init, F const &op) {
Can you add a docstring with a reference to https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelude.html#v:scanl? Not everyone on the FlexFlow team does a bunch of Haskell 🙂
lib/utils/include/utils/containers/scanl.h
line 24 at r1 (raw file):
template <typename C, typename F> auto scanl1(C const &c, F op) { using T = typename C::value_type;
See implementation of fold1
for a more readable version
lib/utils/test/src/utils/containers/foldl.cc
line 9 at r1 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("foldl") {
I can see you're enjoying Haskell 😛
lib/utils/test/src/utils/containers/range.cc
line 45 at r1 (raw file):
std::vector<int> correct = {}; CHECK(result == correct); }
What's the expected behavior of range(0, 10, -1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 24 of 46 files reviewed, 14 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/pcg/include/pcg/device_coordinates.struct.toml
line 1 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think this requirement is necessary anymore with the new design?
Not a requirement anymore, but I think the user will generally reason about the position of the devices using device_id_t
, so it probably won't be used.
lib/pcg/include/pcg/strided_rectangle.struct.toml
line 17 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
utils/hash
andutils/fmt
are only needed for the implementation, not the declaration
Done.
lib/pcg/src/pcg/device_id.cc
line 29 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
switch
might be cleaner? You can also useoverload
if you'd prefer
Done.
lib/pcg/src/pcg/device_id.cc
line 35 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
assert
s will get removed in non-debug builds
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 66 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Test functions separately (i.e., group by
get_last_device_id
rather than byMachineView
tested on) except for particularly trivial utility functions
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 70 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Use better test names
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 93 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
The return type here isn't really necessary and is a bit confusing
Done.
lib/utils/include/utils/containers/foldl.h
line 27 at r1 (raw file):
C remaining(it, c.end()); return foldl(remaining, init, func); }
cool pattern !
lib/utils/include/utils/containers/range.h
line 11 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
C++ doesn't do generics super well so I'd lean toward concrete types unless there's a strong need for something more flexible
Done.
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
int
--size_t
intuitively seems like it should be good (it represents non-negative values!) but actually what it does is just remove the failure case by overflowing so you can't even assert that the value is okay. At some point we may add a non-negative integer type with the proper semantics, but we don't have one yet
Ahh makes sense thanks.
lib/utils/include/utils/containers/scanl.h
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Can you add a docstring with a reference to https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelude.html#v:scanl? Not everyone on the FlexFlow team does a bunch of Haskell 🙂
Done.
lib/utils/include/utils/containers/scanl.h
line 24 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See implementation of
fold1
for a more readable version
Done.
lib/utils/test/src/utils/containers/foldl.cc
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I can see you're enjoying Haskell 😛
hahaha yes, I'm starting to see why you're so frustrated with C++. Functional code is so clean, wish the C++ syntax was a bit more accomodating...
lib/utils/test/src/utils/containers/range.cc
line 45 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's the expected behavior of
range(0, 10, -1)
?
empty range (this range is modelled after python's range
s, so I'd like to keep it consistent, but we can also change it to throw an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 41 files at r1, 19 of 24 files at r2, all commit messages.
Reviewable status: 44 of 49 files reviewed, 32 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/include/compiler/machine_mapping.h
line 63 at r2 (raw file):
std::unordered_set<StartInvariantMachineView> get_allowed_start_invariant_machine_views( MachineSpecification const &machinespec,
Suggestion:
MachineSpecification const &machine_spec,
lib/compiler/src/machine_mapping.cc
line 378 at r2 (raw file):
} std::vector<int> get_tensor_parallel_degrees(ParallelTensorShape const &shape) {
Should not return a vector
as that imposes an order on ParallelTensorShape
's dims, should return either an unordered_set
or a custom dtgen struct
lib/compiler/src/machine_mapping.cc
line 385 at r2 (raw file):
} bool is_valid_machine_view(MachineView const &mv,
A blank line here or there would also help readability--you can use them to break up flows of steps into logical pieces
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
tensor_degrees = filter(tensor_degrees, [](int degree) { return degree != 1; }); return without_order(mv_degrees) == without_order(tensor_degrees);
This returns a multiset
, right?
Code quote:
without_order(mv_degrees)
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
ParallelTensorShape const &shape) { auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {
Suggestion:
std::vector<int> const &tensor_dims
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
ParallelTensorShape const &shape) { auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {
I think a return type annotation here would really improve readability
Code quote:
[](std::vector<int> tensor_dims, int total_devices) {
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
int max_stride_upper_bound = (total_devices + 1) / product(transform(tensor_dims, [](int degree) { return degree - 1; }));
This is a bit complicated, I feel like a one-line comment would help me figure out what this dense piece of arithmetic is actually doing (or break up the nested expressions and give them each names. While composition of a bunch of functions can be concise, having intermediate variables can really help make code more readable as you can describe each intermediate value/step along the way. Conciseness is good when it benefits readability, but sometimes it gets in the way more than it helps)
Code quote:
int max_stride_upper_bound =
(total_devices + 1) /
product(transform(tensor_dims, [](int degree) { return degree - 1; }));
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
(total_devices + 1) / product(transform(tensor_dims, [](int degree) { return degree - 1; })); std::unordered_multiset<std::vector<int>> strides = cartesian_product(
Should this be a stride_size_t
?
Code quote:
int
lib/compiler/src/machine_mapping.cc
line 410 at r2 (raw file):
}; std::vector<int> tensor_dims = filter(get_tensor_parallel_degrees(shape),
Seems like a common pattern, probably good to pull out a separate function?
lib/compiler/src/machine_mapping.cc
line 413 at r2 (raw file):
[](int degree) { return degree != 1; }); std::unordered_set<MachineView> machine_views; int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;
Why not just have a get_num_devices
or something function for MachineSpecification
?
Code quote:
int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;
lib/compiler/src/machine_mapping.cc
line 414 at r2 (raw file):
std::unordered_set<MachineView> machine_views; int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node; for (std::vector<int> stride :
Suggestion:
for (std::vector<int> const &stride :
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
for (std::vector<int> stride : candidate_strides(tensor_dims, total_devices)) { for (int start_id = 0; start_id < total_devices; start_id++) {
Seems like this loop is intuitively creating StartInvariantMachineView
s and then looping over possible start degrees? If so, could this be written to do that? Using more descriptive data structures instead of things like std::vector<int>
or std::vector<std::unordered_set<int>>
is super helpful for readability. This would also let you break up this function into more intuitive pieces (generating all start-invariant machine views, and then generating all possible start dims for them)
lib/compiler/src/machine_mapping.cc
line 437 at r2 (raw file):
get_candidate_machine_views(machinespec, shape); views = filter(views, [&](MachineView const &view) { return is_valid_machine_view(view, shape);
Suggestion:
return is_valid_machine_view(view, shape) && is_valid_machine_view(view, machinespec);
lib/compiler/test/src/machine_mapping.cc
line 9 at r2 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("get_allowed_machine_view") {
Suggestion:
TEST_CASE("get_allowed_machine_views") {
lib/compiler/test/src/machine_mapping.cc
line 12 at r2 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification(5, 1, 1, 0, 0);
Suggestion:
MachineSpecification ms = MachineSpecification{5, 1, 1, 0, 0};
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
[[fields]] name = "coords" type = "::FlexFlow::FFOrdered<int>"
I don't think FFOrdered
should be used here, since there's no correlation between MachineView
dim ordering and tensor dimension ordering. FFOrdered
represents things that are in the same order as the dims in a TensorShape
(or the shard dims in a ParallelTensorShape
)
lib/pcg/include/pcg/machine_view.h
line 20 at r2 (raw file):
size_t num_dims(MachineView const &mv); size_t num_devices(MachineView const &mv); size_t get_size(MachineView const &mv);
What is get_size
here? Name is a bit unclear, might be good to either clarify the name or just add a quick doxygen docstring explaining what this does (I think I know what it is and I'm not sure I can think of a better name, so that would imply a doxygen docstring. That said, I'm not sure this function is actually useful anywhere?)
lib/pcg/include/pcg/start_invariant_machine_view.h
line 9 at r2 (raw file):
namespace FlexFlow { MachineView to_start_dependent(StartInvariantMachineView const &mv,
Suggestion:
MachineView machine_view_from_start_invariant(StartInvariantMachineView const &mv,
lib/pcg/include/pcg/start_invariant_machine_view.h
line 11 at r2 (raw file):
MachineView to_start_dependent(StartInvariantMachineView const &mv, device_id_t const &start_id); StartInvariantMachineView to_start_invariant(MachineView const &mv);
Suggestion:
StartInvariantMachineView start_invariant_from_machine_view(MachineView const &mv);
lib/pcg/include/pcg/strided_rectangle.h
line 15 at r2 (raw file):
private: std::vector<StridedRectangleSide> _sides;
I don't remember if there are reserved id issues with underscore prefixes, but convention-wise we don't usually use them
Suggestion:
std::vector<StridedRectangleSide> sides;
lib/pcg/include/pcg/strided_rectangle.h
line 28 at r2 (raw file):
bool operator>=(StridedRectangle const &) const; std::vector<StridedRectangleSide> get_sides() const;
For pure getters (things where the result is actually a field) it's safe to return a const &
. That said, if you're ever in doubt just return a value type, as that's always safe 🙂
Suggestion:
std::vector<StridedRectangleSide> const &get_sides() const;
lib/pcg/include/pcg/strided_rectangle.h
line 29 at r2 (raw file):
std::vector<StridedRectangleSide> get_sides() const; };
Suggestion:
private:
std::tuple<decltype(sides) const &> tie() const;
friend struct std::hash<StridedRectangle>;
};
lib/pcg/src/pcg/strided_rectangle.cc
line 21 at r2 (raw file):
bool StridedRectangle::operator==(StridedRectangle const &other) const { return std::tie(this->_sides) == std::tie(other._sides);
Suggestion:
return this->tie() == other.tie();
lib/pcg/src/pcg/strided_rectangle.cc
line 85 at r2 (raw file):
result ^= std::hash<std::vector<::FlexFlow::StridedRectangleSide>>{}( x.get_sides()) + 0x9e3779b9 + (result << 6) + (result >> 2);
FYI this is actually a function in hash-utils
called hash_combine
, we just inline it in dtgen
(for not really any good reason, we just do at the moment)
lib/pcg/src/pcg/strided_rectangle.cc
line 86 at r2 (raw file):
x.get_sides()) + 0x9e3779b9 + (result << 6) + (result >> 2); return result;
Suggestion:
return get_std_hash(x.tie());
lib/utils/include/utils/containers/cartesian_product.h
line 13 at r2 (raw file):
template <typename Container> auto cartesian_product(Container const &containers) {
Seems like this is always going to be called over a std::vector<SomeContainer>
, so maybe do that specialization to simplify this code a bit?
Code quote:
auto cartesian_product(Container const &containers) {
lib/utils/include/utils/containers/cartesian_product.h
line 26 at r2 (raw file):
} for (const auto &item : ordered[depth]) {
Suggestion:
ordered.at(depth)
lib/utils/include/utils/containers/cartesian_product.h
line 28 at r2 (raw file):
for (const auto &item : ordered[depth]) { current.push_back(item); recurse(current, depth + 1);
Would this be simpler as a non-recursive loop? Not a big issue as this is pretty readable, just wondering
lib/utils/src/utils/containers/any_of.h
line 0 at r2 (raw file):
Rename to any_of.cc
?
lib/utils/test/src/utils/containers/cartesian_product.cc
line 58 at r2 (raw file):
{1, 2}, {1, 3}, {1, 3}, {1, 2}}; CHECK(result == correct); }
What if one container is empty and the others aren't? Might be a good test case to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 63 files reviewed, 32 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/include/compiler/machine_mapping.h
line 63 at r2 (raw file):
std::unordered_set<StartInvariantMachineView> get_allowed_start_invariant_machine_views( MachineSpecification const &machinespec,
Done.
lib/compiler/src/machine_mapping.cc
line 378 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Should not return a
vector
as that imposes an order onParallelTensorShape
's dims, should return either anunordered_set
or a custom dtgen struct
changed to unordered_multiset
(but have to sort the multiset in get_candidate_machine_views
, since I need the ordering to be coherent between stride and dimension).
lib/compiler/src/machine_mapping.cc
line 385 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
A blank line here or there would also help readability--you can use them to break up flows of steps into logical pieces
Done.
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This returns a
multiset
, right?
yes, since different dimensions might have the same degree (we want to see if there exists a mapping between the 2 (multi)sets of dimensions, so order does not matter)
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
ParallelTensorShape const &shape) { auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {
Done.
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I think a return type annotation here would really improve readability
Done.
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This is a bit complicated, I feel like a one-line comment would help me figure out what this dense piece of arithmetic is actually doing (or break up the nested expressions and give them each names. While composition of a bunch of functions can be concise, having intermediate variables can really help make code more readable as you can describe each intermediate value/step along the way. Conciseness is good when it benefits readability, but sometimes it gets in the way more than it helps)
I've simplified, but it's still a bit tricky to explain:
Naively, we could think that, given a (2,3) stride for example, it would result in 3*2=6 as many tiles occupied with respect to having all strides be 1, so we could say max_stride_product = num_total_devices/num_devices_used_by_tensor
(where num_devices_used_by_tensor
is the product of the parallel dims) and so we could simply say that the max stride across any dimension is max_stride_product
, so we do the cartesian product and then filter downstream.
This however, doesn't quite work: consider, for example, a 2D MachineView with 2x2 points, and stride 2 across each dimension, and suppose there are 9 total devices. While the "volume" of the MachineView is technically 4x4, it can really fit into a 3x3 and thus we could fit it with the existing devices (since the outer volume can be cut without messing with any of the devices). To address this, I find not the number of total devices used by the tensor, but, the total number of "inner" devices, essentially the ones such that they have associated with them a full stride "volume". So I find the max stride for these using the naive procedure (which works since they all have full stride volume) and I know that if a given stride is too large for them then surely it's too large for the full set of devices, which essentially contains them. (Note that this solutions is incredibly underoptimized, if the generation turns out to be to slow I'll fix it).
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Should this be a
stride_size_t
?
fixed. is there a way to cast a container containing type X to the same container but with type Y (so not using transform and casting within the lambda)?
lib/compiler/src/machine_mapping.cc
line 410 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems like a common pattern, probably good to pull out a separate function?
Done.
lib/compiler/src/machine_mapping.cc
line 413 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why not just have a
get_num_devices
or something function forMachineSpecification
?
Done.
lib/compiler/src/machine_mapping.cc
line 414 at r2 (raw file):
std::unordered_set<MachineView> machine_views; int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node; for (std::vector<int> stride :
Done.
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems like this loop is intuitively creating
StartInvariantMachineView
s and then looping over possible start degrees? If so, could this be written to do that? Using more descriptive data structures instead of things likestd::vector<int>
orstd::vector<std::unordered_set<int>>
is super helpful for readability. This would also let you break up this function into more intuitive pieces (generating all start-invariant machine views, and then generating all possible start dims for them)
fixed it up a bit, is this enough? I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase. I could do something like using StrideVec = std::vector<stride_t>
to make it more readable, though I'm not sure it would make it more understandable.
lib/compiler/src/machine_mapping.cc
line 437 at r2 (raw file):
get_candidate_machine_views(machinespec, shape); views = filter(views, [&](MachineView const &view) { return is_valid_machine_view(view, shape);
Done.
lib/compiler/test/src/machine_mapping.cc
line 9 at r2 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("get_allowed_machine_view") {
Done.
lib/compiler/test/src/machine_mapping.cc
line 12 at r2 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification(5, 1, 1, 0, 0);
Done.
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think
FFOrdered
should be used here, since there's no correlation betweenMachineView
dim ordering and tensor dimension ordering.FFOrdered
represents things that are in the same order as the dims in aTensorShape
(or the shard dims in aParallelTensorShape
)
Got it, so FFOrdered
is only for Tensor dims?
lib/pcg/include/pcg/machine_view.h
line 20 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What is
get_size
here? Name is a bit unclear, might be good to either clarify the name or just add a quick doxygen docstring explaining what this does (I think I know what it is and I'm not sure I can think of a better name, so that would imply a doxygen docstring. That said, I'm not sure this function is actually useful anywhere?)
Obsolete and removed (originally it was the "volume", so product of all the sizes of the sides, which I thought I could use for doing bounds check, but it's now useless since the devices themselves can occupy a space smaller than that (e.g. imagine a 2D machine view with stride 2 and 2 devices per side, it technically occupies a 4x4 area but can also really fit into a 3x3)
lib/pcg/include/pcg/start_invariant_machine_view.h
line 9 at r2 (raw file):
namespace FlexFlow { MachineView to_start_dependent(StartInvariantMachineView const &mv,
Done.
lib/pcg/include/pcg/start_invariant_machine_view.h
line 11 at r2 (raw file):
MachineView to_start_dependent(StartInvariantMachineView const &mv, device_id_t const &start_id); StartInvariantMachineView to_start_invariant(MachineView const &mv);
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 15 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't remember if there are reserved id issues with underscore prefixes, but convention-wise we don't usually use them
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 28 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For pure getters (things where the result is actually a field) it's safe to return a
const &
. That said, if you're ever in doubt just return a value type, as that's always safe 🙂
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 29 at r2 (raw file):
std::vector<StridedRectangleSide> get_sides() const; };
Done.
lib/pcg/src/pcg/strided_rectangle.cc
line 21 at r2 (raw file):
bool StridedRectangle::operator==(StridedRectangle const &other) const { return std::tie(this->_sides) == std::tie(other._sides);
Done.
lib/pcg/src/pcg/strided_rectangle.cc
line 86 at r2 (raw file):
x.get_sides()) + 0x9e3779b9 + (result << 6) + (result >> 2); return result;
Done.
lib/utils/include/utils/containers/cartesian_product.h
line 13 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems like this is always going to be called over a
std::vector<SomeContainer>
, so maybe do that specialization to simplify this code a bit?
Yes you're right, fixed (not it's vector<Containers> -> unordered_multiset<vector<...>>
lib/utils/include/utils/containers/cartesian_product.h
line 26 at r2 (raw file):
} for (const auto &item : ordered[depth]) {
Done.
lib/utils/include/utils/containers/cartesian_product.h
line 28 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Would this be simpler as a non-recursive loop? Not a big issue as this is pretty readable, just wondering
I like the recursive version and found it simpler to implement.
lib/utils/test/src/utils/containers/cartesian_product.cc
line 58 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What if one container is empty and the others aren't? Might be a good test case to add
Done.
lib/utils/src/utils/containers/any_of.h
line at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Rename to
any_of.cc
?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 66 files reviewed, 33 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/src/compiler/allowed_machine_views.cc
line 124 at r3 (raw file):
start_invariant_from_machine_view); }
Unsure where to put these functions, will we be using parallel_tensor_dim_idx
as the main way of indexing the ParallelTensorShape going forwards or is it just for doing the machine view to tensor mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 36 of 36 files at r3, all commit messages.
Reviewable status: all files reviewed, 33 unresolved discussions (waiting on @lockshaw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 36 of 36 files at r3, all commit messages.
Reviewable status: all files reviewed, 50 unresolved discussions (waiting on @Marsella8)
lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml
line 2 at r3 (raw file):
namespace = "FlexFlow" name = "TensorToMachineViewInjection"
Why is this not a surjection? Seems like all of the MachineView
dims need to be mapped to?
lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml
line 14 at r3 (raw file):
"utils/bidict/bidict.h", "utils/hash/unordered_map.h" ]
Suggestion:
includes = [
"pcg/machine_view_dim_idx.dtg.h",
"op-attrs/parallel_tensor_dim_idx.dtg.h",
"utils/bidict/bidict.h",
]
lib/compiler/src/compiler/allowed_machine_views.cc
line 36 at r3 (raw file):
} bool is_valid_machine_view(MachineView const &mv,
I don't see this function declared in allowed_machine_view.h
?
lib/compiler/src/compiler/allowed_machine_views.cc
line 124 at r3 (raw file):
Previously, Marsella8 wrote…
Unsure where to put these functions, will we be using
parallel_tensor_dim_idx
as the main way of indexing the ParallelTensorShape going forwards or is it just for doing the machine view to tensor mapping?
For now let's put get_parallel_dim_at_idx
and get_parallel_tensor_indices
into parallel_tensor_dim_idx.h
lib/compiler/src/compiler/allowed_machine_views.cc
line 152 at r3 (raw file):
std::unordered_set<machine_view_dim_idx> get_machine_view_indices(MachineView const &mv) {
Move to machine_view_dim_idx.h
lib/compiler/src/compiler/allowed_machine_views.cc
line 157 at r3 (raw file):
} bool is_valid_injection(TensorToMachineViewInjection const &injection,
Probably better to move this and the functions below to an tensor_to_machine_view_injection.h
file
lib/compiler/src/compiler/allowed_machine_views.cc
line 175 at r3 (raw file):
std::unordered_set<parallel_tensor_dim_idx> shape_indices = get_parallel_tensor_indices(shape); shape_indices = filter(shape_indices, [&](auto const idx) {
Suggestion:
(parallel_tensor_dim_idx const &idx)
lib/compiler/src/compiler/allowed_machine_views.cc
line 183 at r3 (raw file):
permutations(shape_indices)) { TensorToMachineViewInjection injection = TensorToMachineViewInjection(bidict(zip(sorted(mv_indices), p)));
Suggestion:
std::vector<machine_view_dim_idx> machine_view_dim_ordering = sorted(mv_indices);
std::unordered_set<TensorToMachineViewInjection> result;
for (std::vector<parallel_tensor_dim_idx> const &tensor_dim_ordering :
permutations(shape_indices)) {
TensorToMachineViewInjection injection =
TensorToMachineViewInjection(bidict(zip(machine_view_dim_ordering, tensor_dim_ordering)));
lib/compiler/test/src/allowed_machine_ views.cc
line 62 at r3 (raw file):
views.insert(mv); } return views;
Suggestion:
return unordered_set_of(transform(count(num_starts),
[&](int start) {
return MachineView{
device_id_t{gpu_id_t{start}},
StridedRectangle{{
StridedRectangleSide{num_points_t{2}, stride_t{stride1}},
StridedRectangleSide{num_points_t{3}, stride_t{stride2}}
}},
};
}));
lib/compiler/test/src/allowed_machine_ views.cc
line 64 at r3 (raw file):
return views; }; std::unordered_set<MachineView> correct;
Is it possible to instead consider a smaller MachineSpecification
where the set of MachineView
s is small enough that you don't need all this infrastructure to generate it? This testcase is a bit complicated and might be rather annoying to debug
lib/compiler/test/src/allowed_machine_ views.cc
line 136 at r3 (raw file):
make_2d_view(/*stride1*/ 3, /*stride2*/ 1), make_2d_view(/*stride1*/ 1, /*stride2*/ 3), make_2d_view(/*stride1*/ 1, /*stride2*/ 4)};
Suggestion:
std::unordered_set<StartInvariantMachineView> correct = {
make_2d_view(/*stride1*/ 1, /*stride2*/ 1),
make_2d_view(/*stride1*/ 2, /*stride2*/ 1),
make_2d_view(/*stride1*/ 1, /*stride2*/ 2),
make_2d_view(/*stride1*/ 3, /*stride2*/ 1),
make_2d_view(/*stride1*/ 1, /*stride2*/ 3),
make_2d_view(/*stride1*/ 1, /*stride2*/ 4),
};
lib/compiler/test/src/allowed_machine_ views.cc
line 140 at r3 (raw file):
std::unordered_set<StartInvariantMachineView> result = get_allowed_start_invariant_machine_views(ms, shape); CHECK(result == correct);
Suggestion:
std::unordered_set<StartInvariantMachineView> result =
get_allowed_start_invariant_machine_views(ms, shape);
CHECK(result == correct);
lib/compiler/test/src/allowed_machine_ views.cc
line 163 at r3 (raw file):
StridedRectangleSide{num_points_t(2), stride_t(4)}, StridedRectangleSide{num_points_t(3), stride_t(1)}}, }};
Suggestion:
TEST_CASE("get_all_tensor_to_machine_view_injections") {
ParallelTensorShape shape = ParallelTensorShape{
ParallelTensorDims{
FFOrdered<ShardParallelDim>{
ShardParallelDim{10, 3},
},
ReplicaParallelDimSet{
SumDegree{2},
DiscardCopyDegree{2},
},
},
DataType::FLOAT,
};
MachineView view =
MachineView{
device_id_from_index(0, DeviceType::GPU),
StridedRectangle{{
StridedRectangleSide{num_points_t(2), stride_t(1)},
StridedRectangleSide{num_points_t(2), stride_t(4)},
StridedRectangleSide{num_points_t(3), stride_t(1)},
}},
};
lib/compiler/test/src/allowed_machine_ views.cc
line 178 at r3 (raw file):
std::unordered_set<TensorToMachineViewInjection> result = get_all_tensor_to_machine_view_injections(view, shape); CHECK(correct == result);
Suggestion:
std::unordered_set<TensorToMachineViewInjection> correct = {
TensorToMachineViewInjection{b1}, TensorToMachineViewInjection{b2}};
std::unordered_set<TensorToMachineViewInjection> result =
get_all_tensor_to_machine_view_injections(view, shape);
CHECK(correct == result);
lib/compiler/test/src/machine_mapping.cc
line 7 at r3 (raw file):
TEST_SUITE(FF_TEST_SUITE) { // TEST_CASE("MachineMapping::combine") {
What's happening with this file?
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml
line 2 at r3 (raw file):
namespace = "FlexFlow" name = "parallel_tensor_dim_idx"
Naming convention-wise we usually either do ParallelTensorDimIdx
or parallel_tensor_dim_idx_t
, generally to allow parallel_tensor_dim_idx
to be a variable name
Suggestion:
name = "parallel_tensor_dim_idx_t"
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml
line 10 at r3 (raw file):
"fmt", ] explicit_constructors = false
Prefer explicit_constructors = true
unless there's a good reason. What's the reason here?
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
Previously, Marsella8 wrote…
Got it, so
FFOrdered
is only for Tensor dims?
Yes, to distinguish between the FlexFlow/pytorch ordering of dimensions and the flipped legion ordering of dimensions
lib/pcg/include/pcg/device_coordinates.struct.toml
line 17 at r3 (raw file):
"utils/fmt/vector.h", ]
Suggestion:
includes = [
"<vector>",
]
src_includes = [
"utils/hash/vector.h",
"utils/fmt/vector.h",
]
lib/pcg/include/pcg/device_coordinates.struct.toml
line 20 at r3 (raw file):
[[fields]] name = "coords"
Suggestion:
name = "raw_coords"
lib/pcg/include/pcg/machine_view_dim_idx.struct.toml
line 2 at r3 (raw file):
namespace = "FlexFlow" name = "machine_view_dim_idx"
For convention we usually either use MachineViewDimIdx
or machine_view_dim_idx_t
so that machine_view_dim_idx
can be a variable name
Suggestion:
name = "machine_view_dim_idx_t"
lib/pcg/include/pcg/strided_rectangle.h
line 17 at r3 (raw file):
std::vector<StridedRectangleSide> sides; std::tuple<std::vector<StridedRectangleSide> const &> tie() const; friend struct std::hash<StridedRectangle>;
Usually we put member variables (i.e., sides
) into their own private:
or public:
section for readability
Suggestion:
private:
std::tuple<std::vector<StridedRectangleSide> const &> tie() const;
friend struct std::hash<StridedRectangle>;
lib/pcg/src/pcg/device_id.cc
line 36 at r3 (raw file):
default: throw mk_runtime_error( fmt::format("Unsupported DeviceType {}", get_device_type(device_id)));
Why remove logging of the device_id_t
?
lib/pcg/src/pcg/device_id.cc
line 43 at r3 (raw file):
switch (device_type) { case DeviceType::GPU: return device_id_t(gpu_id_t(idx));
Prefer {...}
initialization for consistency
Suggestion:
return device_id_t{gpu_id_t{idx}};
lib/pcg/src/pcg/machine_view.cc
line 33 at r3 (raw file):
sum(transform(zip(coefficients, as_vector(point.coords)), [](auto const pair) { return pair.first * pair.second; })) + get_raw_id(mv.start);
Suggestion:
size_t coord_offset = sum(transform(zip(coefficients, as_vector(point.coords)),
[](auto const pair) { return pair.first * pair.second; }));
size_t raw_id = get_raw_id(mv.start) + coord_offset;
lib/pcg/src/pcg/machine_view.cc
line 37 at r3 (raw file):
return ((get_device_type(mv) == DeviceType::CPU) ? device_id_t(cpu_id_t(raw_id)) : device_id_t(gpu_id_t(raw_id)));
Suggestion:
return device_id_from_index(raw_id, get_device_type(mv));
lib/pcg/src/pcg/machine_view.cc
line 43 at r3 (raw file):
std::vector<std::vector<int>> ranges = transform(mv.rect.get_sides(), [](StridedRectangleSide const &side) { return range(0, get_side_size(side).unwrapped, side.stride.unwrapped);
Pull the lambda out into a separate function (get_points
or something) in strided_rectangle_side.h
Probably also worth pulling out a get_coords
or something in strided_rectangle.h
Code quote:
transform(mv.rect.get_sides(), [](StridedRectangleSide const &side) {
return range(0, get_side_size(side).unwrapped, side.stride.unwrapped);
lib/pcg/src/pcg/machine_view.cc
line 47 at r3 (raw file):
std::unordered_set<DeviceCoordinates> devices_as_points = unordered_set_of( transform(cartesian_product(ranges), [](auto const &point) { return DeviceCoordinates(point); }));
Prefer concrete type
Suggestion:
(std::vector<int> const &point)
lib/pcg/src/pcg/machine_view.cc
line 55 at r3 (raw file):
} device_id_t get_last_device_id(MachineView const &mv) {
Suggestion:
device_id_t get_maximum_device_id(MachineView const &mv) {
lib/pcg/src/pcg/machine_view.cc
line 59 at r3 (raw file):
// transform(mv.rect.get_sides(), [](StridedRectangleSide const &s) { // return s.stride.unwrapped; // }));
Delete comment
Code quote:
// DeviceCoordinates last_device = DeviceCoordinates(
// transform(mv.rect.get_sides(), [](StridedRectangleSide const &s) {
// return s.stride.unwrapped;
// }));
lib/pcg/src/pcg/machine_view.cc
line 67 at r3 (raw file):
} std::vector<num_points_t> get_num_devices_per_dim(MachineView const &mv) {
Do MachineView
s have ordered sides?
Code quote:
std::vector<num_points_t>
lib/pcg/src/pcg/machine_view.cc
line 73 at r3 (raw file):
} std::vector<side_size_t> get_side_size_per_dim(MachineView const &mv) {
Do MachineView
s have ordered sides?
Code quote:
std::vector<side_size_t>
lib/pcg/src/pcg/strided_rectangle.cc
line 24 at r3 (raw file):
std::tuple<std::vector<StridedRectangleSide> const &> StridedRectangle::tie() const { return std::tie(sides);
Suggestion:
return std::tie(this->sides);
lib/pcg/src/pcg/strided_rectangle.cc
line 52 at r3 (raw file):
std::vector<StridedRectangleSide> const &StridedRectangle::get_sides() const { return sides;
Always use explicit this
to reduce ambiguity
Suggestion:
return this->sides;
lib/pcg/test/src/pcg/machine_view.cc
line 56 at r3 (raw file):
}}; gpu_id_t start(0); MachineView mv{device_id_t{start}, rect};
Suggestion:
MachineView mv = MachineView{
gpu_id_t{0},
StridedRectangle{{
StridedRectangleSide(num_points_t(2), stride_t{3}),
StridedRectangleSide(num_points_t(2), stride_t{2}),
}},
};
lib/pcg/test/src/pcg/machine_view.cc
line 59 at r3 (raw file):
SUBCASE("get_device_ids") { std::unordered_set<device_id_t> expected = make_gpu_device_ids({0, 2, 12, 14});
Why aren't the device ids here {0, 2, 3, 5}
? Or at the very least not {0, 2, 6, 8}
? I feel like I don't understand the calculation here
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 16 at r3 (raw file):
MachineView input = MachineView{start, rect}; MachineView result = machine_view_from_start_invariant(
This is a good property for a rapidcheck
test, but it would be good to also have a concrete test of each of the directions that isn't just testing invertibility
lib/pcg/test/src/pcg/strided_rectangle.cc
line 17 at r3 (raw file):
StridedRectangle r1 = StridedRectangle{{s1, s0}}; CHECK(r0 == r1); CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1});
How much do we care that the sides are sorted vs just caring that there is a canonical order?
lib/utils/include/utils/bidict/bidict.h
line 25 at r3 (raw file):
} bidict(std::vector<std::pair<L, R>> init)
Any particular reason for adding this over the existing constructors?
lib/utils/include/utils/bidict/bidict.h
line 26 at r3 (raw file):
bidict(std::vector<std::pair<L, R>> init) : bidict(init.begin(), init.end()) {}
Suggestion:
explicit bidict(std::vector<std::pair<L, R>> init)
: bidict(init.begin(), init.end()) {}
lib/utils/include/utils/containers/cartesian_product.h
line 17 at r3 (raw file):
std::function<void(V &, size_t)> recurse = [&](V ¤t, size_t depth) { if (depth == containers.size()) {
Suggestion:
template <typename C, typename Elem = typename C::value_type>
std::unordered_multiset<std::vector<Elem>>
cartesian_product(std::vector<C> const &containers) {
std::unordered_multiset<std::vector<Elem>> result;
std::function<void(std::vector<Elem> &, size_t)> recurse = [&](std::vector<Elem> ¤t, size_t depth) {
if (depth == containers.size()) {
lib/utils/include/utils/containers/cartesian_product.h
line 22 at r3 (raw file):
} for (const auto &item : containers.at(depth)) {
Suggestion:
for (Elem const &item : containers.at(depth)) {
lib/utils/include/utils/containers/filter.h
line 53 at r3 (raw file):
std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f); return result; }
Suggestion:
template <typename Elem, typename F>
std::unordered_multiset<Elem> filter(std::unordered_multiset<Elem> const &m,
F const &f) {
std::unordered_multiset<Elem> result;
std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f);
return result;
}
lib/utils/include/utils/containers/permutations.h
line 13 at r3 (raw file):
template <typename C, typename V = std::vector<typename C::value_type>> auto permutations(C const &container) {
FYI there's also a lazy implementation of permutations
on the substitutions-fix
branch at https://github.com/lockshaw/FlexFlow/blob/substitutions-fix/lib/utils/include/utils/containers/get_all_permutations.h (note that having this version too is fine, just wanted to make you aware)
lib/utils/include/utils/containers/permutations.h
line 13 at r3 (raw file):
template <typename C, typename V = std::vector<typename C::value_type>> auto permutations(C const &container) {
Suggestion:
template <typename C, typename Elem = typename C::value_type>
std::unordered_set<std::vector<Elem>> permutations(C const &container) {
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
Previously, Marsella8 wrote…
Ahh makes sense thanks.
Not seeing this fix?
lib/utils/test/src/utils/containers/cartesian_product.cc
line 10 at r1 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("cartesian_product") { SUBCASE("empty") {
Why remove this test case?
lib/utils/test/src/utils/containers/permutations.cc
line 5 at r3 (raw file):
#include <doctest/doctest.h> #include <unordered_set> #include <vector>
Suggestion:
#include <unordered_set>
#include <vector>
#include "utils/fmt/vector.h"
#include "utils/fmt/unordered_set.h"
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
Previously, Marsella8 wrote…
yes, since different dimensions might have the same degree (we want to see if there exists a mapping between the 2 (multi)sets of dimensions, so order does not matter)
Might be good to rename that function to multiset_of
to remove ambiguity
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
Previously, Marsella8 wrote…
I've simplified, but it's still a bit tricky to explain:
Naively, we could think that, given a (2,3) stride for example, it would result in 3*2=6 as many tiles occupied with respect to having all strides be 1, so we could saymax_stride_product = num_total_devices/num_devices_used_by_tensor
(wherenum_devices_used_by_tensor
is the product of the parallel dims) and so we could simply say that the max stride across any dimension ismax_stride_product
, so we do the cartesian product and then filter downstream.
This however, doesn't quite work: consider, for example, a 2D MachineView with 2x2 points, and stride 2 across each dimension, and suppose there are 9 total devices. While the "volume" of the MachineView is technically 4x4, it can really fit into a 3x3 and thus we could fit it with the existing devices (since the outer volume can be cut without messing with any of the devices). To address this, I find not the number of total devices used by the tensor, but, the total number of "inner" devices, essentially the ones such that they have associated with them a full stride "volume". So I find the max stride for these using the naive procedure (which works since they all have full stride volume) and I know that if a given stride is too large for them then surely it's too large for the full set of devices, which essentially contains them. (Note that this solutions is incredibly underoptimized, if the generation turns out to be to slow I'll fix it).
Ah okay, yeah I'd add that explanation in as a comment. The code makes a lot more sense with that bit of context
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
Previously, Marsella8 wrote…
fixed. is there a way to cast a container containing type X to the same container but with type Y (so not using transform and casting within the lambda)?
I don't think so, and generally we avoid casting anyway in favor of explicit construction
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase
That makes sense, but I think having a MultiDimensionalStride
or something like that would be fine, especially as it's a pretty simple/obvious concept and could be used elsewhere e.g., to move get_strided_rectangle
out into its own function that constructs a StridedRectangle
from a starting point, a number of repetitions in each dim, and a MultiDimensionalStride
fixed it up a bit, is this enough?
It's definitely a big improvement! I do think breaking it up into stages that produce different named data structures (i.e., first generating the candidate MultiDimensionalStride
s, then using them to generate the StartInvariantMachineView
s, and then using them to generate the MachineView
s) would help, along with a brief comment explaining the stride generation logic which is inherently a bit complicated. It might also be a good idea to add a brief comment to the candidate generation functions explaining the guarantees (if any) they provide on their outputs: for example, are each of the outputs guaranteed to have at least one valid MachineView
, or can there be outputs for which no valid MachineView
s exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 49 unresolved discussions (waiting on @lockshaw)
lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml
line 2 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why is this not a surjection? Seems like all of the
MachineView
dims need to be mapped to?
Changed the name to MachineViewToTensorMapping
lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml
line 14 at r3 (raw file):
"utils/bidict/bidict.h", "utils/hash/unordered_map.h" ]
placed in bidict.h
(wouldn't compile otherwise)
lib/compiler/src/compiler/allowed_machine_views.cc
line 36 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't see this function declared in
allowed_machine_view.h
?
added (their primary use is within get_allowed_machine_views
but is probably nice to have them available externally).
lib/compiler/src/compiler/allowed_machine_views.cc
line 152 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Move to
machine_view_dim_idx.h
Done.
lib/compiler/src/compiler/allowed_machine_views.cc
line 157 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Probably better to move this and the functions below to an
tensor_to_machine_view_injection.h
file
Done.
lib/compiler/src/compiler/allowed_machine_views.cc
line 175 at r3 (raw file):
std::unordered_set<parallel_tensor_dim_idx> shape_indices = get_parallel_tensor_indices(shape); shape_indices = filter(shape_indices, [&](auto const idx) {
Done.
lib/compiler/src/compiler/allowed_machine_views.cc
line 183 at r3 (raw file):
permutations(shape_indices)) { TensorToMachineViewInjection injection = TensorToMachineViewInjection(bidict(zip(sorted(mv_indices), p)));
Done.
lib/compiler/test/src/allowed_machine_ views.cc
line 62 at r3 (raw file):
views.insert(mv); } return views;
Done.
lib/compiler/test/src/allowed_machine_ views.cc
line 64 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is it possible to instead consider a smaller
MachineSpecification
where the set ofMachineView
s is small enough that you don't need all this infrastructure to generate it? This testcase is a bit complicated and might be rather annoying to debug
I've simplified it a bit (still want to leave enough devices so that a couple of different strides can get generated, and manual enumeration might be too much).
lib/compiler/test/src/allowed_machine_ views.cc
line 136 at r3 (raw file):
make_2d_view(/*stride1*/ 3, /*stride2*/ 1), make_2d_view(/*stride1*/ 1, /*stride2*/ 3), make_2d_view(/*stride1*/ 1, /*stride2*/ 4)};
Done.
lib/compiler/test/src/allowed_machine_ views.cc
line 140 at r3 (raw file):
std::unordered_set<StartInvariantMachineView> result = get_allowed_start_invariant_machine_views(ms, shape); CHECK(result == correct);
Done.
lib/compiler/test/src/allowed_machine_ views.cc
line 163 at r3 (raw file):
StridedRectangleSide{num_points_t(2), stride_t(4)}, StridedRectangleSide{num_points_t(3), stride_t(1)}}, }};
Done.
lib/compiler/test/src/allowed_machine_ views.cc
line 178 at r3 (raw file):
std::unordered_set<TensorToMachineViewInjection> result = get_all_tensor_to_machine_view_injections(view, shape); CHECK(correct == result);
Done.
lib/compiler/test/src/machine_mapping.cc
line 7 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's happening with this file?
old test cases, i haven't touched them.
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml
line 2 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Naming convention-wise we usually either do
ParallelTensorDimIdx
orparallel_tensor_dim_idx_t
, generally to allowparallel_tensor_dim_idx
to be a variable name
Done.
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml
line 10 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
explicit_constructors = true
unless there's a good reason. What's the reason here?
no strong reason other than verbosiveness, changed it
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Yes, to distinguish between the FlexFlow/pytorch ordering of dimensions and the flipped legion ordering of dimensions
thanks
lib/pcg/include/pcg/device_coordinates.struct.toml
line 17 at r3 (raw file):
"utils/fmt/vector.h", ]
Done.
lib/pcg/include/pcg/device_coordinates.struct.toml
line 20 at r3 (raw file):
[[fields]] name = "coords"
Done.
lib/pcg/include/pcg/machine_view_dim_idx.struct.toml
line 2 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For convention we usually either use
MachineViewDimIdx
ormachine_view_dim_idx_t
so thatmachine_view_dim_idx
can be a variable name
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 17 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Usually we put member variables (i.e.,
sides
) into their ownprivate:
orpublic:
section for readability
Done.
lib/pcg/src/pcg/device_id.cc
line 36 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why remove logging of the
device_id_t
?
re-added.
lib/pcg/src/pcg/device_id.cc
line 43 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
{...}
initialization for consistency
Done.
lib/pcg/src/pcg/machine_view.cc
line 33 at r3 (raw file):
sum(transform(zip(coefficients, as_vector(point.coords)), [](auto const pair) { return pair.first * pair.second; })) + get_raw_id(mv.start);
Done.
lib/pcg/src/pcg/machine_view.cc
line 37 at r3 (raw file):
return ((get_device_type(mv) == DeviceType::CPU) ? device_id_t(cpu_id_t(raw_id)) : device_id_t(gpu_id_t(raw_id)));
Done.
lib/pcg/src/pcg/machine_view.cc
line 43 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Pull the lambda out into a separate function (
get_points
or something) instrided_rectangle_side.h
Probably also worth pulling out a
get_coords
or something instrided_rectangle.h
refactored (didn't use lambdas, but it's pretty straightforward now).
Didn't create get_coords since i would like to keep DeviceCoordinates
within machine_view.cc
(since its just there to make my life easier, shouldn't be used externally).
lib/pcg/src/pcg/machine_view.cc
line 47 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer concrete type
Done.
lib/pcg/src/pcg/machine_view.cc
line 55 at r3 (raw file):
} device_id_t get_last_device_id(MachineView const &mv) {
Done.
lib/pcg/src/pcg/machine_view.cc
line 59 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Delete comment
Done.
lib/pcg/src/pcg/machine_view.cc
line 67 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do
MachineView
s have ordered sides?
yes, (but the order is not chosen by the user, the sides are sorted at construction and that is the canonical ordering)
lib/pcg/src/pcg/machine_view.cc
line 73 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do
MachineView
s have ordered sides?
see above
lib/pcg/src/pcg/strided_rectangle.cc
line 24 at r3 (raw file):
std::tuple<std::vector<StridedRectangleSide> const &> StridedRectangle::tie() const { return std::tie(sides);
Done.
lib/pcg/src/pcg/strided_rectangle.cc
line 52 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Always use explicit
this
to reduce ambiguity
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 56 at r3 (raw file):
}}; gpu_id_t start(0); MachineView mv{device_id_t{start}, rect};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 59 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why aren't the device ids here
{0, 2, 3, 5}
? Or at the very least not{0, 2, 6, 8}
? I feel like I don't understand the calculation here
added an explanation, let me know if this is sufficient
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 16 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This is a good property for a
rapidcheck
test, but it would be good to also have a concrete test of each of the directions that isn't just testing invertibility
updated the tests, I think rapidcheck
is a bit overkill since the code itself is very simple.
lib/pcg/test/src/pcg/strided_rectangle.cc
line 17 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
How much do we care that the sides are sorted vs just caring that there is a canonical order?
we only care that it has a canonical order (which is enforced at construction), and it think sorted is the most unambiguous way to go about it.
lib/utils/include/utils/bidict/bidict.h
line 25 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Any particular reason for adding this over the existing constructors?
no strong reason, made it easier to construct a bidict given a zipped container
lib/utils/include/utils/bidict/bidict.h
line 26 at r3 (raw file):
bidict(std::vector<std::pair<L, R>> init) : bidict(init.begin(), init.end()) {}
Done.
lib/utils/include/utils/containers/cartesian_product.h
line 17 at r3 (raw file):
std::function<void(V &, size_t)> recurse = [&](V ¤t, size_t depth) { if (depth == containers.size()) {
Done.
lib/utils/include/utils/containers/cartesian_product.h
line 22 at r3 (raw file):
} for (const auto &item : containers.at(depth)) {
Done.
lib/utils/include/utils/containers/filter.h
line 53 at r3 (raw file):
std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f); return result; }
Done.
lib/utils/include/utils/containers/permutations.h
line 13 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
FYI there's also a lazy implementation of
permutations
on thesubstitutions-fix
branch at https://github.com/lockshaw/FlexFlow/blob/substitutions-fix/lib/utils/include/utils/containers/get_all_permutations.h (note that having this version too is fine, just wanted to make you aware)
nice thanks, substituted my version with yours.
lib/utils/include/utils/containers/permutations.h
line 13 at r3 (raw file):
template <typename C, typename V = std::vector<typename C::value_type>> auto permutations(C const &container) {
Done.
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Not seeing this fix?
Done.
lib/utils/test/src/utils/containers/cartesian_product.cc
line 10 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why remove this test case?
Not sure if our function should accept an empty collection (mathematically its defined, is a single empty set, and the function actually returns that, but at the same time if somebody passed an empty collection is was probably unintentional) Let me know what you think.
lib/utils/test/src/utils/containers/permutations.cc
line 5 at r3 (raw file):
#include <doctest/doctest.h> #include <unordered_set> #include <vector>
remove the tests (replaced by get_all_permutations
)
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be good to rename that function to
multiset_of
to remove ambiguity
Done.
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Ah okay, yeah I'd add that explanation in as a comment. The code makes a lot more sense with that bit of context
cleaned up and added the explanation
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think so, and generally we avoid casting anyway in favor of explicit construction
got it thanks
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase
That makes sense, but I think having a
MultiDimensionalStride
or something like that would be fine, especially as it's a pretty simple/obvious concept and could be used elsewhere e.g., to moveget_strided_rectangle
out into its own function that constructs aStridedRectangle
from a starting point, a number of repetitions in each dim, and aMultiDimensionalStride
fixed it up a bit, is this enough?
It's definitely a big improvement! I do think breaking it up into stages that produce different named data structures (i.e., first generating the candidate
MultiDimensionalStride
s, then using them to generate theStartInvariantMachineView
s, and then using them to generate theMachineView
s) would help, along with a brief comment explaining the stride generation logic which is inherently a bit complicated. It might also be a good idea to add a brief comment to the candidate generation functions explaining the guarantees (if any) they provide on their outputs: for example, are each of the outputs guaranteed to have at least one validMachineView
, or can there be outputs for which no validMachineView
s exist?
Added MultiDimensionalStride
, and put get_strided_rectangle
in StridedRectangle
.
This + the explanation of candidate_strides
should be enough to make it understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring review until new device id calculation is added.
Reviewed 19 of 41 files at r4.
Reviewable status: 60 of 82 files reviewed, 20 unresolved discussions (waiting on @wmdi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 41 files at r4, 16 of 17 files at r6, all commit messages.
Reviewable status: 85 of 86 files reviewed, 31 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/src/compiler/allowed_machine_views.cc
line 44 at r6 (raw file):
bool is_valid_machine_view(MachineView const &mv, MachineSpecification const &machine_spec) { return false; // TODO: fix
Implement or mark as NOT_IMPLEMENTED()
Code quote:
return false; // TODO: fix
lib/compiler/src/compiler/allowed_machine_views.cc
line 67 at r6 (raw file):
*/ static std::unordered_set<MachineView> get_candidate_machine_views(MachineSpecification const &machine_spec,
Can any of this get pulled out into separate functions that make logical sense by themselves? Each statement is okay to read but the sheer number of statements is a bit intense and doesn't provide a lot of walking the reader through. If it really is this complicated that's fine (there are places where stuff just fundamentally needs a bunch of code that can't get decomposed), but I'd like to be convinced of that 🙂
lib/compiler/test/src/allowed_machine_views.cc
line 0 at r6 (raw file):
What's happening with this?
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 12 at r6 (raw file):
TEST_CASE("get_all_machine_view_to_tensor_mappings") { ParallelTensorShape shape = ParallelTensorShape{ ParallelTensorDims{
Suggestion:
SUBCASE("no possible mappings") {
...@pietro implement this ...
}
SUBCASE("multiple possible mappings") {
ParallelTensorShape shape = ParallelTensorShape{
ParallelTensorDims{
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 26 at r6 (raw file):
MachineView{DeviceCoordinates{{0, 0, 0}}, StridedRectangle{{ StridedRectangleSide{num_points_t(2), stride_t(1)},
For consistency
Suggestion:
num_points_t{2},
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 33 at r6 (raw file):
bidict<machine_view_dim_idx_t, parallel_tensor_dim_idx_t> b1 = { {machine_view_dim_idx_t(2), parallel_tensor_dim_idx_t{ff_dim_t(0)}},
For consistency
Suggestion:
{machine_view_dim_idx_t{2}, parallel_tensor_dim_idx_t{ff_dim_t{0}}},
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 37 at r6 (raw file):
parallel_tensor_dim_idx_t{ReplicaType::SUM}}, {machine_view_dim_idx_t(0), parallel_tensor_dim_idx_t{ReplicaType::DISCARD_COPY}}};
Might make this a bit more readable.
Suggestion:
machine_view_dim_idx_t mv_dim_0 = machine_view_dim_idx_t{0};
machine_view_dim_idx_t mv_dim_1 = machine_view_dim_idx_t{1};
machine_view_dim_idx_t mv_dim_2 = machine_view_dim_idx_t{2};
parallel_tensor_dim_idx_t pt_dim_0 = parallel_tensor_dim_idx_t{ff_dim_t{0}};
parallel_tensor_dim_idx_t pt_dim_sum = parallel_tensor_dim_idx_t{ReplicaType::SUM};
parallel_tensor_dim_idx_t pt_dim_eq = parallel_tensor_dim_idx_t{ReplicaType::DISCARD_COPY};
bidict<machine_view_dim_idx_t, parallel_tensor_dim_idx_t> b1 = {
{mv_dim_2, pt_dim_0},
{mv_dim_1, pt_dim_sum},
{mv_dim_0, pt_dim_eq},
};
lib/local-execution/test/src/test_local_cost_estimator.cc
line 69 at r6 (raw file):
std::vector<ParallelTensorAttrs>{weight_attrs}, std::vector<ParallelTensorAttrs>{output_attrs}, make_1d_machine_view(gpu_id_t{0}, gpu_id_t{1}));
I kinda like the clarify of having the GPU types in here--is there a reason this overload was removed? If you prefer to not wrap the argument types, it would be nice to force the DeviceType
to be passed explicitly so there's no ambiguity (yes we're pretty much always going to be using GPU ids, but as long as CPU ids are an option I'd rather it be 100% clear which one is being constructed).
Another option would be to rename this to make_1d_gpu_machine_view
if you like that better
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx_t.h
line 13 at r6 (raw file):
std::unordered_set<parallel_tensor_dim_idx_t> get_parallel_tensor_indices(ParallelTensorShape const &shape);
Suggestion:
get_parallel_tensor_dim_indices(ParallelTensorShape const &shape);
lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc
line 10 at r6 (raw file):
namespace FlexFlow { ParallelDim get_parallel_dim_at_idx(ParallelTensorShape const &shape,
Probably better to put these in parallel_tensor_shape.h
instead (at least to me that's where I'd go look for them if I wanted to find them, lmk if you disagree)
lib/pcg/include/pcg/device_coordinates.struct.toml
line 2 at r6 (raw file):
namespace = "FlexFlow" name = "DeviceCoordinates"
Makes reasoning about plurality in variable names easier 🙂
Suggestion:
name = "DeviceCoordinate"
lib/pcg/include/pcg/machine_specification_dimension.enum.toml
line 11 at r6 (raw file):
[[values]] name = "INTER"
Suggestion:
name = "INTERNODE"
lib/pcg/include/pcg/machine_specification_dimension.enum.toml
line 14 at r6 (raw file):
[[values]] name = "INTRA"
Suggestion:
name = "INTRANODE"
lib/pcg/include/pcg/machine_view_projection.struct.toml
line 7 at r6 (raw file):
# "ord", "hash", # "json",
Any reason no json serialization? I'm not seeing anything here that's obviously not serializable, but I could totally be missing something
lib/pcg/include/pcg/machine_view_projection.struct.toml
line 18 at r6 (raw file):
"utils/hash/unordered_map.h", "utils/fmt/unordered_map.h", ]
Suggestion:
includes = [
"pcg/machine_view.dtg.h",
"pcg/machine_view_dim_idx_t.dtg.h",
"pcg/machine_specification_dimension.dtg.h",
]
src_includes = [
"utils/hash/unordered_map.h",
"utils/fmt/unordered_map.h",
]
lib/pcg/include/pcg/machine_view_projection.struct.toml
line 21 at r6 (raw file):
[[fields]] name = "raw_projection"
Suggestion:
name = "machine_view_dim_to_machine_spec_dim"
lib/pcg/src/pcg/machine_view.cc
line 38 at r6 (raw file):
std::unordered_set<DeviceCoordinates> device_coordinates = transform(raw_coordinates, [](std::vector<int> const &point) { return DeviceCoordinates(point);
Suggestion:
return DeviceCoordinates{point};
lib/pcg/src/pcg/machine_view.cc
line 47 at r6 (raw file):
} device_id_t get_device_id(MachineView const &mv,
This needs simplifying (pull out some lambdas and/or functions--for example, a function that adds DeviceCoordinate
s together would make a lot of sense, or maybe a function that gives you a number based on a stride
in a StridedRectangleDim
?). When you're dealing with wrapper types, instead of just unwrapping them in your function it's often better to write small wrapper functions that do the unwrapping for you (where those functions logically stand by themselves) and then use those, e.g., the functions in computation_graph.h
lib/pcg/src/pcg/machine_view.cc
line 110 at r6 (raw file):
devices_ids.insert(get_device_id(mv, coordinates, ms, projection)); } return devices_ids;
Suggestion:
return transform(get_device_coordinates(mv), [&](DeviceCoordinates const &c) {
return get_device_id(mv, c, ms, projection);
});
lib/pcg/test/src/pcg/machine_view.cc
line 33 at r6 (raw file):
} SUBCASE("get_num_devices_per_dim") { std::vector<num_points_t> expected = {
Prefer correct
for consistency
Suggestion:
std::vector<num_points_t> correct = {
lib/pcg/test/src/pcg/machine_view.cc
line 52 at r6 (raw file):
DeviceType::GPU}; SUBCASE("get_devices_coordinates") { std::unordered_set<DeviceCoordinates> expected = {
Why is this not { (0, 0), (0, 3), (2, 0), (2, 3) }
here? For a name like DeviceCoordinates
that's kinda what I would expect (i.e. that they're in the device space, i.e., MachineSpec
)? Maybe there are two concepts, MachineViewCoord
and DeviceCoord
that should be differentiated?
lib/pcg/test/src/pcg/machine_view.cc
line 56 at r6 (raw file):
DeviceCoordinates{{0, 1}}, DeviceCoordinates{{1, 0}}, DeviceCoordinates{{1, 1}}}};
Suggestion:
std::unordered_set<DeviceCoordinates> expected = {
DeviceCoordinates{{0, 0}},
DeviceCoordinates{{0, 1}},
DeviceCoordinates{{1, 0}},
DeviceCoordinates{{1, 1}},
};
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 17 at r6 (raw file):
DeviceType device_type = DeviceType::GPU; SUBCASE("To StartInvariantMachineView") {
Suggestion:
SUBCASE("start_invariant_from_machine_view") {
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 28 at r6 (raw file):
} SUBCASE("From StartInvariantMachineView") {
Suggestion:
SUBCASE("machine_view_from_start_invariant") {
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 50 at r6 (raw file):
machine_view_from_start_invariant(correct, start)); CHECK(correct == result); }
Might be a bit clearer
Suggestion:
SUBCASE("conversion is invertible") {
SUBCASE("MachineView -> StrideInvariant -> MachineView") {
MachineView correct = MachineView{start, rect, device_type};
MachineView result = machine_view_from_start_invariant(
start_invariant_from_machine_view(correct), start);
CHECK(correct == result);
}
SUBCASE("StrideInvariant -> MachineView -> StrideInvariant") {
StartInvariantMachineView correct =
StartInvariantMachineView{rect, device_type};
StartInvariantMachineView result = start_invariant_from_machine_view(
machine_view_from_start_invariant(correct, start));
CHECK(correct == result);
}
}
lib/pcg/test/src/pcg/strided_rectangle.cc
line 17 at r3 (raw file):
Previously, Marsella8 wrote…
we only care that it has a canonical order (which is enforced at construction), and it think sorted is the most unambiguous way to go about it.
There are two checks here, the first CHECK(r0 == r1)
seems to be that there is a canonical ordering, and a second CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1});
that that canonical ordering is sorting. Maybe it would be better to have separate SUBCASE
s for them mentioning this? Not a big deal
lib/pcg/test/src/pcg/strided_rectangle.cc
line 18 at r6 (raw file):
CHECK(r0 == r1); CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1}); CHECK(r1.get_sides() != std::vector<StridedRectangleSide>{s1, s0});
Isn't this trivially true based on the prior check?
lib/utils/include/utils/bidict/bidict.h
line 25 at r3 (raw file):
Previously, Marsella8 wrote…
no strong reason, made it easier to construct a bidict given a zipped container
You adding this convinced me to add bidict_from_pairs
, use that instead 🙂 (I'm not a huge fan of having a bunch of constructors, at least to me it just makes the container conceptually awkward)
lib/utils/include/utils/containers/replicate.h
line 14 at r6 (raw file):
result.push_back(element); } return result;
Suggestion:
return std::vector<T>(n, element);
lib/utils/test/src/test_containers.cc
line 122 at r6 (raw file):
} TEST_CASE("unordered_multiset_of") {
Delete this test case
lib/utils/test/src/utils/containers/cartesian_product.cc
line 10 at r1 (raw file):
Previously, Marsella8 wrote…
Not sure if our function should accept an empty collection (mathematically its defined, is a single empty set, and the function actually returns that, but at the same time if somebody passed an empty collection is was probably unintentional) Let me know what you think.
I think it's fine to return an empty set in that case, it seems like a well-defined operation. I feel like this isn't a conceptually complicated enough function that I'd feel the need to add guardrails like a non-emptiness check (if the user of the function wants that it feels like they can add it)--but in either case, test-case wise I'd like the behavior well-defined by a corresponding SUBCASE
(either a CHECK
that it returns empty or a CHECK_THROWS
if you really think it should fail on empty input)
lib/utils/include/utils/containers/unordered_multiset_of.h
line 10 at r6 (raw file):
template <typename C, typename T = typename C::value_type> std::unordered_multiset<T> unordered_multiset_of(C const &c) { return {c.cbegin(), c.cend()};
Suggestion:
return {std::cbegin(c), std::cend(c)};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 41 files at r1, 5 of 24 files at r2, 34 of 41 files at r4, 17 of 17 files at r6, all commit messages.
Reviewable status: all files reviewed, 31 unresolved discussions (waiting on @Marsella8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 43 of 89 files reviewed, 30 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/src/compiler/allowed_machine_views.cc
line 44 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Implement or mark as
NOT_IMPLEMENTED()
implemented
lib/compiler/test/src/allowed_machine_views.cc
line at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's happening with this?
now re-enabled
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 12 at r6 (raw file):
TEST_CASE("get_all_machine_view_to_tensor_mappings") { ParallelTensorShape shape = ParallelTensorShape{ ParallelTensorDims{
added, throws an error in this case (since it means that the machine view was generated for another tensor)
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 26 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For consistency
Done.
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 33 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For consistency
Done.
lib/compiler/test/src/machine_view_to_tensor_mapping.cc
line 37 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might make this a bit more readable.
Done.
lib/local-execution/test/src/test_local_cost_estimator.cc
line 69 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I kinda like the clarify of having the GPU types in here--is there a reason this overload was removed? If you prefer to not wrap the argument types, it would be nice to force the
DeviceType
to be passed explicitly so there's no ambiguity (yes we're pretty much always going to be using GPU ids, but as long as CPU ids are an option I'd rather it be 100% clear which one is being constructed).Another option would be to rename this to
make_1d_gpu_machine_view
if you like that better
Removed the default GPU type. Before the interface was different since the start of the MachineView
was a device_id_t
, which implictly stored the device type. Now the start is a devicetype-less n-dimensional point, and thus the device type is stored separately in machineview as a separate attribute (lucklily ends up being a lot cleaner than the previous implementation)
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx_t.h
line 13 at r6 (raw file):
std::unordered_set<parallel_tensor_dim_idx_t> get_parallel_tensor_indices(ParallelTensorShape const &shape);
Done.
lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc
line 10 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Probably better to put these in
parallel_tensor_shape.h
instead (at least to me that's where I'd go look for them if I wanted to find them, lmk if you disagree)
I think the original rationale we agreed on is that we'd put them here since we weren't sure whether parallel_tensor_dim_idx_t
would be used more widely. If you think that this will be used for indexing ParallelTensorShape
throughout the codebase, I'll move it.
lib/pcg/include/pcg/machine_specification_dimension.enum.toml
line 11 at r6 (raw file):
[[values]] name = "INTER"
Done.
lib/pcg/include/pcg/machine_specification_dimension.enum.toml
line 14 at r6 (raw file):
[[values]] name = "INTRA"
Done.
lib/pcg/include/pcg/machine_view_projection.struct.toml
line 7 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Any reason no json serialization? I'm not seeing anything here that's obviously not serializable, but I could totally be missing something
added json serialization
lib/pcg/include/pcg/machine_view_projection.struct.toml
line 18 at r6 (raw file):
"utils/hash/unordered_map.h", "utils/fmt/unordered_map.h", ]
Done.
lib/pcg/include/pcg/machine_view_projection.struct.toml
line 21 at r6 (raw file):
[[fields]] name = "raw_projection"
Done.
lib/pcg/src/pcg/machine_view.cc
line 38 at r6 (raw file):
std::unordered_set<DeviceCoordinates> device_coordinates = transform(raw_coordinates, [](std::vector<int> const &point) { return DeviceCoordinates(point);
Done.
lib/pcg/src/pcg/machine_view.cc
line 110 at r6 (raw file):
devices_ids.insert(get_device_id(mv, coordinates, ms, projection)); } return devices_ids;
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 33 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
correct
for consistency
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 52 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why is this not
{ (0, 0), (0, 3), (2, 0), (2, 3) }
here? For a name likeDeviceCoordinates
that's kinda what I would expect (i.e. that they're in the device space, i.e.,MachineSpec
)? Maybe there are two concepts,MachineViewCoord
andDeviceCoord
that should be differentiated?
Currently the indexing is done "jumping along with the stride" (e.g 3x3 machine view, 2x2 points, 2x2 stride, we have (0,0) be in position (0, 0), (1,0) be in the real position (2, 0), etc...). Depends on whether we want to index the machineview directly (in which case it's better to have the current indexing) or not (in which case you'd prefer the second one). I personally like the second one best, but when we were discussing it a few weeks ago you had used the current one and had assumed you preferred it.
lib/pcg/test/src/pcg/machine_view.cc
line 56 at r6 (raw file):
DeviceCoordinates{{0, 1}}, DeviceCoordinates{{1, 0}}, DeviceCoordinates{{1, 1}}}};
Done.
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 17 at r6 (raw file):
DeviceType device_type = DeviceType::GPU; SUBCASE("To StartInvariantMachineView") {
Done.
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 28 at r6 (raw file):
} SUBCASE("From StartInvariantMachineView") {
Done.
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 50 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be a bit clearer
yes sorry
lib/pcg/test/src/pcg/strided_rectangle.cc
line 17 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
There are two checks here, the first
CHECK(r0 == r1)
seems to be that there is a canonical ordering, and a secondCHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1});
that that canonical ordering is sorting. Maybe it would be better to have separateSUBCASE
s for them mentioning this? Not a big deal
also added a docstring
lib/pcg/test/src/pcg/strided_rectangle.cc
line 18 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Isn't this trivially true based on the prior check?
removed.
lib/utils/include/utils/containers/replicate.h
line 14 at r6 (raw file):
result.push_back(element); } return result;
wow thanks
lib/utils/test/src/test_containers.cc
line 122 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Delete this test case
Done.
lib/utils/test/src/utils/containers/cartesian_product.cc
line 10 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I think it's fine to return an empty set in that case, it seems like a well-defined operation. I feel like this isn't a conceptually complicated enough function that I'd feel the need to add guardrails like a non-emptiness check (if the user of the function wants that it feels like they can add it)--but in either case, test-case wise I'd like the behavior well-defined by a corresponding
SUBCASE
(either aCHECK
that it returns empty or aCHECK_THROWS
if you really think it should fail on empty input)
makes sense, I think returning the empty is reasonable, readded the test case.
lib/pcg/include/pcg/device_coordinates.struct.toml
line 2 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Makes reasoning about plurality in variable names easier 🙂
Done.
lib/utils/include/utils/containers/unordered_multiset_of.h
line 10 at r6 (raw file):
template <typename C, typename T = typename C::value_type> std::unordered_multiset<T> unordered_multiset_of(C const &c) { return {c.cbegin(), c.cend()};
why is this preferrable? (just curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 27 of 46 files at r7, 29 of 34 files at r8, 1 of 21 files at r9.
Reviewable status: 72 of 96 files reviewed, 30 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/include/compiler/allowed_machine_views.h
line 15 at r8 (raw file):
bool is_valid_machine_view(MachineView const &mv, TaskSpaceOperator const &task);
These two overloads seem nontrivially different(?), so giving them different (and clearer) names might help
Code quote:
bool is_valid_machine_view(MachineView const &mv,
TaskSpaceOperator const &task,
MachineSpecification const &ms);
bool is_valid_machine_view(MachineView const &mv,
TaskSpaceOperator const &task);
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
// TEST_SUITE(FF_CUDA_TEST_SUITE) { // TEST_CASE("Local Cost Estimator") {
What's going on here?
lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc
line 10 at r6 (raw file):
Previously, Marsella8 wrote…
I think the original rationale we agreed on is that we'd put them here since we weren't sure whether
parallel_tensor_dim_idx_t
would be used more widely. If you think that this will be used for indexingParallelTensorShape
throughout the codebase, I'll move it.
I don't know how widely parallel_tensor_dim_idx_t
will be used, but it's such an obvious thing to have conceptually that I think it makes sense to integrate into parallel_tensor_shape.h
, etc.
lib/pcg/include/pcg/machine_specification.h
line 19 at r8 (raw file):
bool is_valid_machine_space_coordinates(MachineSpecification const &ms, MachineSpaceCoordinate const &coord);
Suggestion:
bool is_valid_machine_space_coordinate(MachineSpecification const &ms,
MachineSpaceCoordinate const &coord);
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
[[fields]] name = "projection" type = "std::vector<::FlexFlow::MachineSpecificationDimension>"
Why are these separate vectors
rather than a single vector
?
lib/pcg/include/pcg/machine_view_dim_idx_t.h
line 9 at r8 (raw file):
namespace FlexFlow { std::vector<machine_view_dim_idx_t>
Why a vector
here?
lib/pcg/include/pcg/task_space_operator.h
line 12 at r8 (raw file):
std::unordered_set<TaskSpaceCoordinate> get_fragment_coordinates(TaskSpaceOperator const &task);
Why not just get_task_space_coordinates
?
lib/pcg/include/pcg/task_space_operator.h
line 15 at r8 (raw file):
TaskSpaceCoordinate get_maximum_fragment_coordinate(TaskSpaceOperator const &task);
Suggestion:
get_task_space_maximum_coordinate(TaskSpaceOperator const &task);
lib/pcg/include/pcg/task_space_operator.h
line 18 at r8 (raw file):
size_t num_dims(TaskSpaceOperator const &task); size_t num_fragments(TaskSpaceOperator const &task);
Clearer and doesn't require new terminology (i.e., "fragment")--not that I mind new terminology when necessary, just don't think it's necessary here
Suggestion:
size_t num_tasks(TaskSpaceOperator const &task);
lib/pcg/src/pcg/delete.c
line 0 at r8 (raw file):
What's going on here?
lib/pcg/src/pcg/machine_specification.cc
line 43 at r8 (raw file):
device_id_t get_device_id(MachineSpecification const &ms, MachineSpaceCoordinate const &coord) { assert(is_valid_machine_space_coordinates(ms, coord));
Might be better to throw an exception so that the out-of-bounds case can be tested? We can't really test that an assert
is fired
Code quote:
assert(is_valid_machine_space_coordinates(ms, coord));
lib/pcg/test/src/pcg/machine_view.cc
line 35 at r8 (raw file):
MachineSpecification ms = MachineSpecification{ 1, 6, 6, 0, 0}; // Single node with 6 GPUs (0,1,2,3,4,5)
Add argument name comments?
Code quote:
MachineSpecification ms = MachineSpecification{
1, 6, 6, 0, 0}; // Single node with 6 GPUs (0,1,2,3,4,5)
lib/utils/include/utils/containers/get_all_permutations.h
line 105 at r8 (raw file):
template <typename T> struct permutations_with_repetition_container {
I'd move this to a separate get_all_permutations_with_repetitions.h
file unless there's a strong reason it should be here
Code quote:
struct permutations_with_repetition_container {
lib/utils/include/utils/containers/transform.h
line 50 at r8 (raw file):
template <typename F, typename In, typename Out = decltype(std::declval<F>()(std::declval<In>()))>
Suggestion:
typename Out = std::invoke_result_t<F, In>>
lib/utils/include/utils/containers/transform.h
line 53 at r8 (raw file):
std::set<Out> transform(std::set<In> const &v, F const &f) { std::set<Out> result; for (auto const &e : v) {
Suggestion:
for (In const &e : v) {
lib/utils/test/src/utils/containers/get_all_permutations.cc
line 56 at r8 (raw file):
TEST_CASE("get_all_permutations_with_repetition") { SUBCASE("container size = 3, n = 1") {
Come up with better testcase descriptions. The goal of these should be a high level discussion of semantic properties and/or the failure cases they're trying to check--"container size = 3, n = 1" is something I can trivially see from the code and isn't really (at least in the current wording) a high-level idea. If there's a reason this testcase is here (say, testing the limiting case that we have only one element), then the description should be clarified. If there's not a good high-level reason for that specific testcase, it should probably be removed 🙂
Code quote:
SUBCASE("container size = 3, n = 1") {
lib/utils/test/src/utils/containers/get_all_permutations.cc
line 60 at r8 (raw file):
std::unordered_multiset<std::vector<int>> result = unordered_multiset_of(get_all_permutations_with_repetition(input, 1));
I'm guessing this terminology is from https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition--that said, the naming is somewhat confusing (as even wikipedia notes), so either come up with a slightly clearer name for this (isn't this just essentially cartesian product in the case that the tuple size is >= the list size?) or add an explicit docstring with links to that wikipedia page
lib/utils/test/src/utils/containers/range.cc
line 2 at r8 (raw file):
#include "utils/containers/range.h" #include "utils/fmt/vector.h"
Don't these have to be test/utils/doctest/fmt/vector.h
now?
lib/utils/include/utils/fmt/unordered_multiset.h
line 27 at r7 (raw file):
return fmt::to_string(t); }); // }
Delete?
lib/pcg/include/pcg/machine_specification_coordinate.struct.toml
line 17 at r7 (raw file):
[[fields]] name = "inter"
"inter" by itself is unclear. "inter"-what?
Suggestion:
name = "node_idx"
lib/pcg/include/pcg/machine_specification_coordinate.struct.toml
line 21 at r7 (raw file):
[[fields]] name = "intra"
Suggestion:
name = "device_idx"
lib/pcg/include/pcg/task_space_operator.struct.toml
line 2 at r9 (raw file):
namespace = "FlexFlow" name = "TaskSpaceOperator"
Suggestion:
name = "OperatorTaskSpace"
lib/pcg/include/pcg/task_space_operator.struct.toml
line 24 at r9 (raw file):
[[fields]] name = "degrees" type = "std::vector<::FlexFlow::num_points_t>"
Is there a value to using num_points
now? It doesn't seem like there's really an alternative for OperatorTaskSpace
(like, there's no way for it to be strided, so the extra safety doesn't seem as worth it now?)
Code quote:
<::FlexFlow::num_points_t>"
lib/utils/include/utils/containers/unordered_multiset_of.h
line 10 at r6 (raw file):
Previously, Marsella8 wrote…
why is this preferrable? (just curious)
It's not really a big deal either way, but technically std::cbegin
and std::cend
work on raw arrays. This was really more of a "C++ interesting FYI" than a particularly important comment tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 21 files at r9.
Reviewable status: 73 of 96 files reviewed, 30 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/include/compiler/allowed_machine_views.h
line 15 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
These two overloads seem nontrivially different(?), so giving them different (and clearer) names might help
second one was redundant.
lib/compiler/src/compiler/allowed_machine_views.cc
line 67 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Can any of this get pulled out into separate functions that make logical sense by themselves? Each statement is okay to read but the sheer number of statements is a bit intense and doesn't provide a lot of walking the reader through. If it really is this complicated that's fine (there are places where stuff just fundamentally needs a bunch of code that can't get decomposed), but I'd like to be convinced of that 🙂
I've tried to split it up a bit more, but it's probably not enough. Not super sure how to express the candidate_strides
computation more cleanly otherwise.
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's going on here?
the interface for MachineView
has changed so there is not really an equivalent for make_1d_machine_view
anymore.
It would have to be something like a OperatorTaskSpace
which encodes the number of points (so degree of the only dimension) and then a MachineView
which tells us the stride and the StartCoordinate within the MachineSpecification space.
lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc
line 10 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't know how widely
parallel_tensor_dim_idx_t
will be used, but it's such an obvious thing to have conceptually that I think it makes sense to integrate intoparallel_tensor_shape.h
, etc.
Done. note that since we don't have the bijective mapping stuff anymore, parallel_tensor_dim_idx
is not needed anymore, but I've moved it to 'parallel_tensor_shape.cc` for future use.
lib/pcg/include/pcg/machine_specification.h
line 19 at r8 (raw file):
bool is_valid_machine_space_coordinates(MachineSpecification const &ms, MachineSpaceCoordinate const &coord);
Done.
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why are these separate
vectors
rather than a singlevector
?
we can have them as a single vector of pairs (or some custom other data type) but i think it would make it a bit more clunky to iterate over either vector. I can change it if you think it's preferrable.
lib/pcg/include/pcg/machine_view_dim_idx_t.h
line 9 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why a
vector
here?
old residue from the previous interface, now it's deleted
lib/pcg/include/pcg/task_space_operator.h
line 12 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why not just
get_task_space_coordinates
?
Done.
lib/pcg/include/pcg/task_space_operator.h
line 15 at r8 (raw file):
TaskSpaceCoordinate get_maximum_fragment_coordinate(TaskSpaceOperator const &task);
Done.
lib/pcg/include/pcg/task_space_operator.h
line 18 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Clearer and doesn't require new terminology (i.e., "fragment")--not that I mind new terminology when necessary, just don't think it's necessary here
Done.
lib/pcg/src/pcg/delete.c
line at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's going on here?
sorry old file where I dumped some stuff, now deleted
lib/pcg/src/pcg/machine_specification.cc
line 43 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be better to throw an exception so that the out-of-bounds case can be tested? We can't really test that an
assert
is fired
Done + added some tests
lib/pcg/src/pcg/machine_view.cc
line 47 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This needs simplifying (pull out some lambdas and/or functions--for example, a function that adds
DeviceCoordinate
s together would make a lot of sense, or maybe a function that gives you a number based on astride
in aStridedRectangleDim
?). When you're dealing with wrapper types, instead of just unwrapping them in your function it's often better to write small wrapper functions that do the unwrapping for you (where those functions logically stand by themselves) and then use those, e.g., the functions incomputation_graph.h
changed, lmk if this is better (old implementation was crap, I apologize)
lib/pcg/test/src/pcg/machine_view.cc
line 35 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Add argument name comments?
Done.
lib/utils/include/utils/containers/get_all_permutations.h
line 105 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'd move this to a separate
get_all_permutations_with_repetitions.h
file unless there's a strong reason it should be here
Done.
lib/utils/include/utils/containers/transform.h
line 50 at r8 (raw file):
template <typename F, typename In, typename Out = decltype(std::declval<F>()(std::declval<In>()))>
Done.
lib/utils/include/utils/containers/transform.h
line 53 at r8 (raw file):
std::set<Out> transform(std::set<In> const &v, F const &f) { std::set<Out> result; for (auto const &e : v) {
Done.
lib/utils/test/src/utils/containers/get_all_permutations.cc
line 56 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Come up with better testcase descriptions. The goal of these should be a high level discussion of semantic properties and/or the failure cases they're trying to check--"container size = 3, n = 1" is something I can trivially see from the code and isn't really (at least in the current wording) a high-level idea. If there's a reason this testcase is here (say, testing the limiting case that we have only one element), then the description should be clarified. If there's not a good high-level reason for that specific testcase, it should probably be removed 🙂
great point, yeah tests weren't super targeted, slightly retooled them to be more informative.
lib/utils/test/src/utils/containers/get_all_permutations.cc
line 60 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'm guessing this terminology is from https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition--that said, the naming is somewhat confusing (as even wikipedia notes), so either come up with a slightly clearer name for this (isn't this just essentially cartesian product in the case that the tuple size is >= the list size?) or add an explicit docstring with links to that wikipedia page
afaik, permutations_with_repetition is the most standard name for this kind of behaviour. I've added a docstring
lib/utils/test/src/utils/containers/range.cc
line 2 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't these have to be
test/utils/doctest/fmt/vector.h
now?
Done.
lib/utils/include/utils/fmt/unordered_multiset.h
line 27 at r7 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Delete?
Done.
lib/pcg/include/pcg/machine_specification_coordinate.struct.toml
line 17 at r7 (raw file):
Previously, lockshaw (Colin Unger) wrote…
"inter" by itself is unclear. "inter"-what?
Done.
lib/pcg/include/pcg/machine_specification_coordinate.struct.toml
line 21 at r7 (raw file):
[[fields]] name = "intra"
Done.
lib/pcg/include/pcg/task_space_operator.struct.toml
line 2 at r9 (raw file):
namespace = "FlexFlow" name = "TaskSpaceOperator"
Done.
lib/pcg/include/pcg/task_space_operator.struct.toml
line 24 at r9 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is there a value to using
num_points
now? It doesn't seem like there's really an alternative forOperatorTaskSpace
(like, there's no way for it to be strided, so the extra safety doesn't seem as worth it now?)
agree, only thing i can think of is that num_points_t
makes clear what the degree of an operator means. changed back to int
lib/utils/include/utils/containers/unordered_multiset_of.h
line 10 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
It's not really a big deal either way, but technically
std::cbegin
andstd::cend
work on raw arrays. This was really more of a "C++ interesting FYI" than a particularly important comment tbh
got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 21 files at r9, 34 of 36 files at r10, all commit messages.
Reviewable status: 99 of 101 files reviewed, 38 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/test/src/allowed_machine_views.cc
line 18 at r10 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification{1, 5, 5, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/1,
/*num_cpus_per_node=*/5,
/*num_gpus_per_node=*/5,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/compiler/test/src/allowed_machine_views.cc
line 24 at r10 (raw file):
MachineView{{{stride_t{1}}}, {{MachineSpecificationDimension::INTRA_NODE}}, MachineSpaceCoordinate{0, 0, DeviceType::GPU}},
Suggestion:
MachineSpaceCoordinate{
/*node=idx=*/0,
/*device_idx=*/0,
DeviceType::GPU,
}},
lib/compiler/test/src/allowed_machine_views.cc
line 45 at r10 (raw file):
SUBCASE("2 degrees of parallelism") { MachineSpecification ms = MachineSpecification{3, 3, 3, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/3,
/*num_cpus_per_node=*/3,
/*num_gpus_per_node=*/3,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/compiler/test/src/allowed_machine_views.cc
line 48 at r10 (raw file):
OperatorTaskSpace task = OperatorTaskSpace{{2, 3}}; auto make_2d_views = [&](int start_x,
Suggestion:
auto make_2d_view
lib/compiler/test/src/allowed_machine_views.cc
line 49 at r10 (raw file):
auto make_2d_views = [&](int start_x, int start_y,
Aren't these in machine space? What does y
mean in a machine space?
Code quote:
int start_y,
lib/compiler/test/src/allowed_machine_views.cc
line 63 at r10 (raw file):
auto inter = MachineSpecificationDimension::INTER_NODE; std::unordered_set<MachineView> correct = { make_2d_views(0, 0, /*stride1*/ 1, /*stride2*/ 1, inter, intra),
Suggestion:
make_2d_views(0, 0, /*stride1=*/1, /*stride2=*/1, inter, intra),
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
Previously, Marsella8 wrote…
the interface for
MachineView
has changed so there is not really an equivalent formake_1d_machine_view
anymore.
It would have to be something like aOperatorTaskSpace
which encodes the number of points (so degree of the only dimension) and then aMachineView
which tells us the stride and the StartCoordinate within the MachineSpecification space.
Got it, so we likely need to (at some point) add a primitive in op-attrs
that lets us take an UnmappedCostEstimateKey
and return the OperatorTaskSpace
for the operator?
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
Previously, Marsella8 wrote…
we can have them as a single vector of pairs (or some custom other data type) but i think it would make it a bit more clunky to iterate over either vector. I can change it if you think it's preferrable.
I'd go with one vector due to the "make invalid states unrepresentable" reasoning (one vector prevents the two from having different lengths). I'd add a custom well-named (maybe MachineDimensionStride
? or StridedMachineDimension
? or even something as basic as MachineDimensionWithStride
? Feel free to use something else if you can come up with something, I'm not exactly in love with any of the above names) .struct.toml
datatype to represent the combination of the two
lib/pcg/test/src/pcg/machine_specification.cc
line 11 at r10 (raw file):
TEST_CASE("MachineSpecification") { MachineSpecification ms = MachineSpecification{4, 16, 8, 100.0f, 200.0f};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/4,
/*num_cpus_per_node=*/16,
/*num_gpus_per_node=*/8,
/*inter_node_bandwidth=*/100.0f,
/*intra_node_bandwidth=*/200.0f,
};
lib/pcg/test/src/pcg/machine_specification.cc
line 29 at r10 (raw file):
SUBCASE("valid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 12, DeviceType::CPU};
Suggestion:
MachineSpaceCoordinate{
/*node_idx=*/2,
/*device_idx=*/12,
DeviceType::CPU,
};
lib/pcg/test/src/pcg/machine_specification.cc
line 35 at r10 (raw file):
CHECK(correct == result); } SUBCASE("invalid MachineSpaceCoordinate") {
Suggestion:
SUBCASE("MachineSpaceCoordinate out of bounds of machine spec") {
lib/pcg/test/src/pcg/machine_specification.cc
line 37 at r10 (raw file):
SUBCASE("invalid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 18, DeviceType::CPU};
Suggestion:
MachineSpaceCoordinate{
/*node_idx=*/2,
/*device_idx=*/18,
DeviceType::CPU,
};
lib/pcg/test/src/pcg/machine_view.cc
line 52 at r6 (raw file):
Previously, Marsella8 wrote…
Currently the indexing is done "jumping along with the stride" (e.g 3x3 machine view, 2x2 points, 2x2 stride, we have (0,0) be in position (0, 0), (1,0) be in the real position (2, 0), etc...). Depends on whether we want to index the machineview directly (in which case it's better to have the current indexing) or not (in which case you'd prefer the second one). I personally like the second one best, but when we were discussing it a few weeks ago you had used the current one and had assumed you preferred it.
I think this was an out-of-date comment from when I reviewed pre-our-few-weeks-ago-discussion, sorry for the mixup
lib/pcg/test/src/pcg/machine_view.cc
line 38 at r10 (raw file):
/*num_gpus_per_node*/ 6, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/1,
/*num_cpus_per_node=*/6,
/*num_gpus_per_node=*/6,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/pcg/test/src/pcg/machine_view.cc
line 40 at r10 (raw file):
0}; SUBCASE("Fragment 0") {
Didn't we get rid of "fragment"?
Code quote:
SUBCASE("Fragment 0") {
lib/pcg/test/src/pcg/machine_view.cc
line 49 at r10 (raw file):
} SUBCASE("Fragment 1") {
Come up with better case names. It appears you have three subcases here for the 2d thing, so I'm assuming you're checking the start point and then checking the stepping in each of the two dimensions?
Actually looks like not as there's only one dim, so I'm not sure why Fragment 2
is here at all. Try to think about your subcase names as explaining to me why they're they're, not just what they are (you can add near-infinite testcases, so your names should give me some idea of why this particular set of cases was chosen).
Code quote:
SUBCASE("Fragment 1") {
lib/pcg/test/src/pcg/machine_view.cc
line 58 at r10 (raw file):
} SUBCASE("Fragment 2") {
What happens if the TaskSpaceCoordinate
is out of bounds?
lib/pcg/test/src/pcg/machine_view.cc
line 74 at r10 (raw file):
{{MachineSpecificationDimension::INTER_NODE, MachineSpecificationDimension::INTRA_NODE}}, MachineSpaceCoordinate{1, 2, DeviceType::GPU}};
Feels a bit weird having the start
field last? 😂
Code quote:
MachineSpaceCoordinate{1, 2, DeviceType::GPU}}
lib/pcg/test/src/pcg/machine_view.cc
line 80 at r10 (raw file):
/*num_gpus_per_node*/ 5, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/3,
/*num_cpus_per_node=*/5,
/*num_gpus_per_node=*/5,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/pcg/test/src/pcg/machine_view.cc
line 82 at r10 (raw file):
0}; SUBCASE("Fragment (0,0)") {
See above about better testcase naming
Code quote:
SUBCASE("Fragment (0,0)") {
lib/pcg/test/src/pcg/machine_view.cc
line 85 at r10 (raw file):
TaskSpaceCoordinate coord = TaskSpaceCoordinate{{0, 0}}; MachineSpaceCoordinate correct = MachineSpaceCoordinate{1, 2, DeviceType::GPU};
Suggestion:
MachineSpaceCoordinate{/*node_idx=*/1, /*device_idx=*/2, DeviceType::GPU};
lib/pcg/test/src/pcg/machine_view.cc
line 124 at r10 (raw file):
{{MachineSpecificationDimension::INTER_NODE, MachineSpecificationDimension::INTRA_NODE, MachineSpecificationDimension::INTRA_NODE}},
Rather than having duplicate INTER_NODE
/INTRA_NODE
values for the first time in the 3d case, why not have a separate 2d case that is just duplicates?
Code quote:
OperatorTaskSpace task = OperatorTaskSpace{{2, 2, 2}};
MachineView mv =
MachineView{{{stride_t{1}, stride_t{2}, stride_t{1}}},
{{MachineSpecificationDimension::INTER_NODE,
MachineSpecificationDimension::INTRA_NODE,
MachineSpecificationDimension::INTRA_NODE}},
lib/pcg/test/src/pcg/machine_view.cc
line 128 at r10 (raw file):
MachineSpecification ms = MachineSpecification{/*num_nodes*/ 2, /*num_cpus_per_node*/ 8,
Fix parameter naming comments
lib/pcg/test/src/pcg/machine_view.cc
line 133 at r10 (raw file):
0}; SUBCASE("Fragment (0,0,1)") {
Better testcase names (see above)
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 16 at r10 (raw file):
* @details *https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition **/
Suggestion:
/**
* @brief For a given container `c` and integer `n`, return all possible vectors
* of size `n` that only contain (possibly duplicated) elements of `c`.
* @details
* https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition
**/
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 18 at r10 (raw file):
**/ template <typename T> struct permutations_with_repetition_container {
Would probably be better to just do this as a strict function rather than a lazy iterator until we actually have a concrete need to not materialize the whole container at once (I probably should have taken my own advice when writing get_permutations
tbh). I'm not against the container implementation, but it's usually going to be much more complex and harder to read so we should only do it if we have a real reason
lib/utils/test/src/utils/containers/cartesian_product.cc
line 4 at r10 (raw file):
#include "test/utils/doctest/fmt/unordered_multiset.h" #include "utils/fmt/unordered_multiset.h" #include "utils/fmt/vector.h"
Are these still necessary?
Code quote:
#include "utils/fmt/unordered_multiset.h"
#include "utils/fmt/vector.h"
lib/utils/test/src/utils/containers/replicate.cc
line 4 at r10 (raw file):
#include "test/utils/doctest/fmt/vector.h" #include "utils/fmt/unordered_set.h" #include "utils/fmt/vector.h"
Are these still necessary?
Code quote:
#include "utils/fmt/unordered_set.h"
#include "utils/fmt/vector.h"
lib/utils/test/src/utils/containers/scanl.cc
line 3 at r10 (raw file):
#include "utils/containers/scanl.h" #include "test/utils/doctest/fmt/vector.h" #include "utils/fmt/vector.h"
Is this still necessary?
Code quote:
#include "utils/fmt/vector.h"
lib/utils/test/src/utils/containers/scanl.cc
line 40 at r10 (raw file):
} TEST_CASE("scanl1") {
What happens if the input to scanl1
is empty?
Code quote:
TEST_CASE("scanl1") {
lib/pcg/include/pcg/operator_task_space.h
line 1 at r10 (raw file):
#ifndef _FLEXFLOW_PCG_INCLUDE_operator_task_space_H
Capitalization?
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("operator_task_space functions") { SUBCASE("get_task_space_coordinates") {
What if the task space dim list is empty?
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("operator_task_space functions") { SUBCASE("get_task_space_coordinates") {
Suggestion:
TEST_CASE("get_task_space_coordinates") {
lib/pcg/test/src/pcg/operator_task_space.cc
line 11 at r10 (raw file):
SUBCASE("get_task_space_coordinates") { SUBCASE("2D Task") {
Suggestion:
SUBCASE("2D Task Space") {
lib/pcg/test/src/pcg/operator_task_space.cc
line 19 at r10 (raw file):
TaskSpaceCoordinate{{0, 1}}, TaskSpaceCoordinate{{1, 0}}, TaskSpaceCoordinate{{1, 1}}}};
Suggestion:
std::unordered_set<TaskSpaceCoordinate> correct = {
TaskSpaceCoordinate{{0, 0}},
TaskSpaceCoordinate{{0, 1}},
TaskSpaceCoordinate{{1, 0}},
TaskSpaceCoordinate{{1, 1}}.
};
lib/pcg/test/src/pcg/operator_task_space.cc
line 32 at r10 (raw file):
TaskSpaceCoordinate{{0, 0, 1}}, TaskSpaceCoordinate{{0, 1, 0}}, TaskSpaceCoordinate{{0, 1, 1}}}};
Suggestion:
std::unordered_set<TaskSpaceCoordinate> correct = {
TaskSpaceCoordinate{{0, 0, 0}},
TaskSpaceCoordinate{{0, 0, 1}},
TaskSpaceCoordinate{{0, 1, 0}},
TaskSpaceCoordinate{{0, 1, 1}},
};
lib/pcg/test/src/pcg/operator_task_space.cc
line 38 at r10 (raw file):
} } SUBCASE("get_task_space_maximum_coordinate") {
Suggestion:
TEST_CASE("get_task_space_maximum_coordinate") {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 99 of 101 files reviewed, 38 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/test/src/allowed_machine_views.cc
line 18 at r10 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification{1, 5, 5, 0, 0};
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 24 at r10 (raw file):
MachineView{{{stride_t{1}}}, {{MachineSpecificationDimension::INTRA_NODE}}, MachineSpaceCoordinate{0, 0, DeviceType::GPU}},
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 45 at r10 (raw file):
SUBCASE("2 degrees of parallelism") { MachineSpecification ms = MachineSpecification{3, 3, 3, 0, 0};
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 48 at r10 (raw file):
OperatorTaskSpace task = OperatorTaskSpace{{2, 3}}; auto make_2d_views = [&](int start_x,
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 49 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Aren't these in machine space? What does
y
mean in a machine space?
Machine space is the 2D space described by machine specification. (!= Task Space, which is N-dimensional, with N dimension of the parallel tensor). So we have a vector of strides, a vector of dimensions to which to map each dimension of the task, and then a 2D coordinate (start_x, start_y) which is the node and device indices.
lib/compiler/test/src/allowed_machine_views.cc
line 63 at r10 (raw file):
auto inter = MachineSpecificationDimension::INTER_NODE; std::unordered_set<MachineView> correct = { make_2d_views(0, 0, /*stride1*/ 1, /*stride2*/ 1, inter, intra),
Done.
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Got it, so we likely need to (at some point) add a primitive in
op-attrs
that lets us take anUnmappedCostEstimateKey
and return theOperatorTaskSpace
for the operator?
yes pretty much (we don't really need OperatorTaskSpace
as long as we have a datatype from which we can get a vector of the parallel degrees, but obviously OperatorTaskSpace makes things more clean)
lib/pcg/test/src/pcg/machine_specification.cc
line 11 at r10 (raw file):
TEST_CASE("MachineSpecification") { MachineSpecification ms = MachineSpecification{4, 16, 8, 100.0f, 200.0f};
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 29 at r10 (raw file):
SUBCASE("valid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 12, DeviceType::CPU};
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 35 at r10 (raw file):
CHECK(correct == result); } SUBCASE("invalid MachineSpaceCoordinate") {
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 37 at r10 (raw file):
SUBCASE("invalid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 18, DeviceType::CPU};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 38 at r10 (raw file):
/*num_gpus_per_node*/ 6, 0, 0};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 40 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Didn't we get rid of "fragment"?
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 49 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Come up with better case names. It appears you have three subcases here for the 2d thing, so I'm assuming you're checking the start point and then checking the stepping in each of the two dimensions?
Actually looks like not as there's only one dim, so I'm not sure why
Fragment 2
is here at all. Try to think about your subcase names as explaining to me why they're they're, not just what they are (you can add near-infinite testcases, so your names should give me some idea of why this particular set of cases was chosen).
made a docstring for each case that explains it pretty extensively.
lib/pcg/test/src/pcg/machine_view.cc
line 58 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What happens if the
TaskSpaceCoordinate
is out of bounds?
now returns an optional if out of bound.
(also added testing)
lib/pcg/test/src/pcg/machine_view.cc
line 74 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Feels a bit weird having the
start
field last? 😂
lol fair enough, changed :-)
lib/pcg/test/src/pcg/machine_view.cc
line 80 at r10 (raw file):
/*num_gpus_per_node*/ 5, 0, 0};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 82 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See above about better testcase naming
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 85 at r10 (raw file):
TaskSpaceCoordinate coord = TaskSpaceCoordinate{{0, 0}}; MachineSpaceCoordinate correct = MachineSpaceCoordinate{1, 2, DeviceType::GPU};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 124 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Rather than having duplicate
INTER_NODE
/INTRA_NODE
values for the first time in the 3d case, why not have a separate 2d case that is just duplicates?
good point, added new test case
lib/pcg/test/src/pcg/machine_view.cc
line 128 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Fix parameter naming comments
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 133 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Better testcase names (see above)
Done.
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 16 at r10 (raw file):
* @details *https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition **/
Done.
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 18 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Would probably be better to just do this as a strict function rather than a lazy iterator until we actually have a concrete need to not materialize the whole container at once (I probably should have taken my own advice when writing
get_permutations
tbh). I'm not against the container implementation, but it's usually going to be much more complex and harder to read so we should only do it if we have a real reason
Done.
lib/utils/test/src/utils/containers/cartesian_product.cc
line 4 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Are these still necessary?
Done.
lib/utils/test/src/utils/containers/replicate.cc
line 4 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Are these still necessary?
Done.
lib/utils/test/src/utils/containers/scanl.cc
line 3 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is this still necessary?
Done.
lib/utils/test/src/utils/containers/scanl.cc
line 40 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What happens if the input to
scanl1
is empty?
returns an empty list (added test case)
lib/pcg/include/pcg/operator_task_space.h
line 1 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Capitalization?
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("operator_task_space functions") { SUBCASE("get_task_space_coordinates") {
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What if the task space dim list is empty?
Returns a single empty coordinate (vacuously makes sense?). Not sure what a zero dimensional TaskSpaceOperator would represent, whether something with no parallelism or a straight up wrong data-type (so something with no parallelism would be a TaskSpaceOperator with 1 or more ones. Let me know what you think is best.
lib/pcg/test/src/pcg/operator_task_space.cc
line 11 at r10 (raw file):
SUBCASE("get_task_space_coordinates") { SUBCASE("2D Task") {
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 19 at r10 (raw file):
TaskSpaceCoordinate{{0, 1}}, TaskSpaceCoordinate{{1, 0}}, TaskSpaceCoordinate{{1, 1}}}};
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 32 at r10 (raw file):
TaskSpaceCoordinate{{0, 0, 1}}, TaskSpaceCoordinate{{0, 1, 0}}, TaskSpaceCoordinate{{0, 1, 1}}}};
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 38 at r10 (raw file):
} } SUBCASE("get_task_space_maximum_coordinate") {
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 87 of 102 files reviewed, 38 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'd go with one vector due to the "make invalid states unrepresentable" reasoning (one vector prevents the two from having different lengths). I'd add a custom well-named (maybe
MachineDimensionStride
? orStridedMachineDimension
? or even something as basic asMachineDimensionWithStride
? Feel free to use something else if you can come up with something, I'm not exactly in love with any of the above names).struct.toml
datatype to represent the combination of the two
done
Description of changes:
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is