Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unordered StridedRectangle, get_allowed_machine_views #1458

Open
wants to merge 32 commits into
base: repo-refactor
Choose a base branch
from

Conversation

Marsella8
Copy link

@Marsella8 Marsella8 commented Aug 7, 2024

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@Marsella8 Marsella8 requested review from lockshaw and wmdi August 7, 2024 03:03
Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 86.12144% with 112 lines in your changes missing coverage. Please review.

Project coverage is 73.56%. Comparing base (cf96db6) to head (c1e1c8c).

Files with missing lines Patch % Lines
lib/compiler/src/compiler/allowed_machine_views.cc 0.00% 57 Missing ⚠️
lib/op-attrs/src/op-attrs/parallel_tensor_shape.cc 0.00% 18 Missing ⚠️
lib/pcg/src/pcg/device_id.cc 23.52% 13 Missing ⚠️
lib/op-attrs/src/op-attrs/parallel_dim.cc 0.00% 6 Missing ⚠️
lib/pcg/src/pcg/machine_view.cc 90.74% 5 Missing ⚠️
lib/pcg/src/pcg/machine_specification.cc 87.09% 4 Missing ⚠️
lib/pcg/src/pcg/operator_task_space.cc 76.47% 4 Missing ⚠️
lib/utils/include/utils/containers/transform.h 50.00% 3 Missing ⚠️
lib/utils/include/utils/containers/foldl.h 92.85% 1 Missing ⚠️
lib/utils/src/utils/containers/range.cc 91.66% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           repo-refactor    #1458      +/-   ##
=================================================
+ Coverage          73.22%   73.56%   +0.34%     
=================================================
  Files                731      743      +12     
  Lines              23398    23999     +601     
  Branches             690      712      +22     
=================================================
+ Hits               17133    17655     +522     
- Misses              6265     6344      +79     
Flag Coverage Δ
unittests 73.56% <86.12%> (+0.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/compiler/include/compiler/machine_mapping.h 0.00% <ø> (ø)
lib/compiler/src/graph_utils.cc 0.00% <ø> (ø)
...-execution/include/local-execution/cost_estimate.h 0.00% <ø> (ø)
lib/local-execution/src/local_cost_estimator.cc 0.00% <ø> (ø)
lib/pcg/test/src/pcg/computation_graph_builder.cc 90.90% <ø> (ø)
lib/pcg/test/src/pcg/machine_specification.cc 100.00% <100.00%> (ø)
lib/pcg/test/src/pcg/machine_view.cc 100.00% <100.00%> (ø)
lib/pcg/test/src/pcg/operator_task_space.cc 100.00% <100.00%> (ø)
...utils/include/utils/containers/cartesian_product.h 100.00% <100.00%> (ø)
lib/utils/include/utils/containers/filter.h 100.00% <100.00%> (ø)
... and 23 more

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 35 of 41 files at r1, all commit messages.
Reviewable status: 35 of 41 files reviewed, 14 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/pcg/include/pcg/device_coordinates.struct.toml line 1 at r1 (raw file):

# DeviceCoordinates is exclusive to machine_view.cc, must not be used outside of it.

I don't think this requirement is necessary anymore with the new design?


lib/pcg/include/pcg/strided_rectangle.struct.toml line 17 at r1 (raw file):

  "utils/hash/unordered_multiset.h",
  "utils/fmt/unordered_multiset.h",
]

utils/hash and utils/fmt are only needed for the implementation, not the declaration

Suggestion:

includes = [
  "pcg/strided_rectangle_side.dtg.h",
  "<unordered_set>",
]

src_includes = [
  "utils/hash/unordered_multiset.h",
  "utils/fmt/unordered_multiset.h",
]

lib/pcg/src/pcg/device_id.cc line 29 at r1 (raw file):

int get_raw_id(device_id_t device_id) {
  if (get_device_type(device_id) == DeviceType::GPU) {

switch might be cleaner? You can also use overload if you'd prefer

Code quote:

  if (get_device_type(device_id) == DeviceType::GPU) {

lib/pcg/src/pcg/device_id.cc line 35 at r1 (raw file):

  } else {
    assert(false && "Unsupported DeviceType");
    return -1;

asserts will get removed in non-debug builds

Suggestion:

    throw mk_runtime_error(fmt::format("Unsupported DeviceType {} for device_id_t {}", get_device_type(device_id), device_id));

lib/pcg/test/src/pcg/machine_view.cc line 66 at r1 (raw file):

        CHECK(expected == result);
      }
      SUBCASE("get_last_device_id") {

Test functions separately (i.e., group by get_last_device_id rather than by MachineView tested on) except for particularly trivial utility functions


lib/pcg/test/src/pcg/machine_view.cc line 70 at r1 (raw file):

      }
    }
    SUBCASE("MachineView #2") {

Use better test names


lib/pcg/test/src/pcg/machine_view.cc line 93 at r1 (raw file):

  }

  TEST_CASE("MachineView make_1d_machine_view - GPU") {

The return type here isn't really necessary and is a bit confusing

Suggestion:

  TEST_CASE("make_1d_machine_view - GPU") {

lib/utils/include/utils/containers/foldl.h line 27 at r1 (raw file):

  C remaining(it, c.end());
  return foldl(remaining, init, func);
}

Suggestion:

template <typename C, typename F, typename E = typename C::value_type>
auto foldl1(C const &c, F func) -> typename C::value_type {
  if (c.empty()) {
     throw mk_runtime_error(fmt::format("foldl1 received empty container: {}", c)); 
  }
  
  std::optional<E> result;
  
  for (E const &e : c) { 
    if (!result.has_value()) {
      result = e; 
    } else {
      result = f(result.value(), e); 
    }
  }
  
  return result.value();
}

lib/utils/include/utils/containers/range.h line 11 at r1 (raw file):

template <typename T>
std::vector<T> range(T start, T end, T step = 1) {

C++ doesn't do generics super well so I'd lean toward concrete types unless there's a strong need for something more flexible

Suggestion:

std::vector<int> range(int start, int end, int step = 1) {

lib/utils/include/utils/containers/replicate.h line 9 at r1 (raw file):

template <typename T>
std::unordered_multiset<T> replicate(std::size_t n, T const &element) {

Prefer int--size_t intuitively seems like it should be good (it represents non-negative values!) but actually what it does is just remove the failure case by overflowing so you can't even assert that the value is okay. At some point we may add a non-negative integer type with the proper semantics, but we don't have one yet

Code quote:

std::size_t

lib/utils/include/utils/containers/scanl.h line 9 at r1 (raw file):

template <typename C, typename F, typename T>
std::vector<T> scanl(C const &c, T init, F const &op) {

Can you add a docstring with a reference to https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelude.html#v:scanl? Not everyone on the FlexFlow team does a bunch of Haskell 🙂


lib/utils/include/utils/containers/scanl.h line 24 at r1 (raw file):

template <typename C, typename F>
auto scanl1(C const &c, F op) {
  using T = typename C::value_type;

See implementation of fold1 for a more readable version


lib/utils/test/src/utils/containers/foldl.cc line 9 at r1 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("foldl") {

I can see you're enjoying Haskell 😛


lib/utils/test/src/utils/containers/range.cc line 45 at r1 (raw file):

      std::vector<int> correct = {};
      CHECK(result == correct);
    }

What's the expected behavior of range(0, 10, -1)?

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 24 of 46 files reviewed, 14 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/pcg/include/pcg/device_coordinates.struct.toml line 1 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't think this requirement is necessary anymore with the new design?

Not a requirement anymore, but I think the user will generally reason about the position of the devices using device_id_t, so it probably won't be used.


lib/pcg/include/pcg/strided_rectangle.struct.toml line 17 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

utils/hash and utils/fmt are only needed for the implementation, not the declaration

Done.


lib/pcg/src/pcg/device_id.cc line 29 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

switch might be cleaner? You can also use overload if you'd prefer

Done.


lib/pcg/src/pcg/device_id.cc line 35 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

asserts will get removed in non-debug builds

Done.


lib/pcg/test/src/pcg/machine_view.cc line 66 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Test functions separately (i.e., group by get_last_device_id rather than by MachineView tested on) except for particularly trivial utility functions

Done.


lib/pcg/test/src/pcg/machine_view.cc line 70 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Use better test names

Done.


lib/pcg/test/src/pcg/machine_view.cc line 93 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

The return type here isn't really necessary and is a bit confusing

Done.


lib/utils/include/utils/containers/foldl.h line 27 at r1 (raw file):

  C remaining(it, c.end());
  return foldl(remaining, init, func);
}

cool pattern !


lib/utils/include/utils/containers/range.h line 11 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

C++ doesn't do generics super well so I'd lean toward concrete types unless there's a strong need for something more flexible

Done.


lib/utils/include/utils/containers/replicate.h line 9 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int--size_t intuitively seems like it should be good (it represents non-negative values!) but actually what it does is just remove the failure case by overflowing so you can't even assert that the value is okay. At some point we may add a non-negative integer type with the proper semantics, but we don't have one yet

Ahh makes sense thanks.


lib/utils/include/utils/containers/scanl.h line 9 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Can you add a docstring with a reference to https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelude.html#v:scanl? Not everyone on the FlexFlow team does a bunch of Haskell 🙂

Done.


lib/utils/include/utils/containers/scanl.h line 24 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

See implementation of fold1 for a more readable version

Done.


lib/utils/test/src/utils/containers/foldl.cc line 9 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I can see you're enjoying Haskell 😛

hahaha yes, I'm starting to see why you're so frustrated with C++. Functional code is so clean, wish the C++ syntax was a bit more accomodating...


lib/utils/test/src/utils/containers/range.cc line 45 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's the expected behavior of range(0, 10, -1)?

empty range (this range is modelled after python's ranges, so I'd like to keep it consistent, but we can also change it to throw an error).

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 41 files at r1, 19 of 24 files at r2, all commit messages.
Reviewable status: 44 of 49 files reviewed, 32 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/include/compiler/machine_mapping.h line 63 at r2 (raw file):

std::unordered_set<StartInvariantMachineView>
    get_allowed_start_invariant_machine_views(
        MachineSpecification const &machinespec,

Suggestion:

        MachineSpecification const &machine_spec,

lib/compiler/src/machine_mapping.cc line 378 at r2 (raw file):

}

std::vector<int> get_tensor_parallel_degrees(ParallelTensorShape const &shape) {

Should not return a vector as that imposes an order on ParallelTensorShape's dims, should return either an unordered_set or a custom dtgen struct


lib/compiler/src/machine_mapping.cc line 385 at r2 (raw file):

}

bool is_valid_machine_view(MachineView const &mv,

A blank line here or there would also help readability--you can use them to break up flows of steps into logical pieces


lib/compiler/src/machine_mapping.cc line 393 at r2 (raw file):

  tensor_degrees =
      filter(tensor_degrees, [](int degree) { return degree != 1; });
  return without_order(mv_degrees) == without_order(tensor_degrees);

This returns a multiset, right?

Code quote:

without_order(mv_degrees)

lib/compiler/src/machine_mapping.cc line 401 at r2 (raw file):

                                ParallelTensorShape const &shape) {

  auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {

Suggestion:

std::vector<int> const &tensor_dims

lib/compiler/src/machine_mapping.cc line 401 at r2 (raw file):

                                ParallelTensorShape const &shape) {

  auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {

I think a return type annotation here would really improve readability

Code quote:

[](std::vector<int> tensor_dims, int total_devices) {

lib/compiler/src/machine_mapping.cc line 404 at r2 (raw file):

    int max_stride_upper_bound =
        (total_devices + 1) /
        product(transform(tensor_dims, [](int degree) { return degree - 1; }));

This is a bit complicated, I feel like a one-line comment would help me figure out what this dense piece of arithmetic is actually doing (or break up the nested expressions and give them each names. While composition of a bunch of functions can be concise, having intermediate variables can really help make code more readable as you can describe each intermediate value/step along the way. Conciseness is good when it benefits readability, but sometimes it gets in the way more than it helps)

Code quote:

    int max_stride_upper_bound =
        (total_devices + 1) /
        product(transform(tensor_dims, [](int degree) { return degree - 1; }));

lib/compiler/src/machine_mapping.cc line 405 at r2 (raw file):

        (total_devices + 1) /
        product(transform(tensor_dims, [](int degree) { return degree - 1; }));
    std::unordered_multiset<std::vector<int>> strides = cartesian_product(

Should this be a stride_size_t?

Code quote:

int

lib/compiler/src/machine_mapping.cc line 410 at r2 (raw file):

  };

  std::vector<int> tensor_dims = filter(get_tensor_parallel_degrees(shape),

Seems like a common pattern, probably good to pull out a separate function?


lib/compiler/src/machine_mapping.cc line 413 at r2 (raw file):

                                        [](int degree) { return degree != 1; });
  std::unordered_set<MachineView> machine_views;
  int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;

Why not just have a get_num_devices or something function for MachineSpecification?

Code quote:

  int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;

lib/compiler/src/machine_mapping.cc line 414 at r2 (raw file):

  std::unordered_set<MachineView> machine_views;
  int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;
  for (std::vector<int> stride :

Suggestion:

  for (std::vector<int> const &stride :

lib/compiler/src/machine_mapping.cc line 416 at r2 (raw file):

  for (std::vector<int> stride :
       candidate_strides(tensor_dims, total_devices)) {
    for (int start_id = 0; start_id < total_devices; start_id++) {

Seems like this loop is intuitively creating StartInvariantMachineViews and then looping over possible start degrees? If so, could this be written to do that? Using more descriptive data structures instead of things like std::vector<int> or std::vector<std::unordered_set<int>> is super helpful for readability. This would also let you break up this function into more intuitive pieces (generating all start-invariant machine views, and then generating all possible start dims for them)


lib/compiler/src/machine_mapping.cc line 437 at r2 (raw file):

      get_candidate_machine_views(machinespec, shape);
  views = filter(views, [&](MachineView const &view) {
    return is_valid_machine_view(view, shape);

Suggestion:

return is_valid_machine_view(view, shape) && is_valid_machine_view(view, machinespec);

lib/compiler/test/src/machine_mapping.cc line 9 at r2 (raw file):

TEST_SUITE(FF_TEST_SUITE) {

  TEST_CASE("get_allowed_machine_view") {

Suggestion:

  TEST_CASE("get_allowed_machine_views") {

lib/compiler/test/src/machine_mapping.cc line 12 at r2 (raw file):

    SUBCASE("1 degree of parallelism") {
      MachineSpecification ms = MachineSpecification(5, 1, 1, 0, 0);

Suggestion:

      MachineSpecification ms = MachineSpecification{5, 1, 1, 0, 0};

lib/pcg/include/pcg/device_coordinates.struct.toml line 18 at r2 (raw file):

[[fields]]
name = "coords"
type = "::FlexFlow::FFOrdered<int>"

I don't think FFOrdered should be used here, since there's no correlation between MachineView dim ordering and tensor dimension ordering. FFOrdered represents things that are in the same order as the dims in a TensorShape (or the shard dims in a ParallelTensorShape)


lib/pcg/include/pcg/machine_view.h line 20 at r2 (raw file):

size_t num_dims(MachineView const &mv);
size_t num_devices(MachineView const &mv);
size_t get_size(MachineView const &mv);

What is get_size here? Name is a bit unclear, might be good to either clarify the name or just add a quick doxygen docstring explaining what this does (I think I know what it is and I'm not sure I can think of a better name, so that would imply a doxygen docstring. That said, I'm not sure this function is actually useful anywhere?)


lib/pcg/include/pcg/start_invariant_machine_view.h line 9 at r2 (raw file):

namespace FlexFlow {

MachineView to_start_dependent(StartInvariantMachineView const &mv,

Suggestion:

MachineView machine_view_from_start_invariant(StartInvariantMachineView const &mv,

lib/pcg/include/pcg/start_invariant_machine_view.h line 11 at r2 (raw file):

MachineView to_start_dependent(StartInvariantMachineView const &mv,
                               device_id_t const &start_id);
StartInvariantMachineView to_start_invariant(MachineView const &mv);

Suggestion:

StartInvariantMachineView start_invariant_from_machine_view(MachineView const &mv);

lib/pcg/include/pcg/strided_rectangle.h line 15 at r2 (raw file):

private:
  std::vector<StridedRectangleSide> _sides;

I don't remember if there are reserved id issues with underscore prefixes, but convention-wise we don't usually use them

Suggestion:

  std::vector<StridedRectangleSide> sides;

lib/pcg/include/pcg/strided_rectangle.h line 28 at r2 (raw file):

  bool operator>=(StridedRectangle const &) const;

  std::vector<StridedRectangleSide> get_sides() const;

For pure getters (things where the result is actually a field) it's safe to return a const &. That said, if you're ever in doubt just return a value type, as that's always safe 🙂

Suggestion:

  std::vector<StridedRectangleSide> const &get_sides() const;

lib/pcg/include/pcg/strided_rectangle.h line 29 at r2 (raw file):

  std::vector<StridedRectangleSide> get_sides() const;
};

Suggestion:

private:
  std::tuple<decltype(sides) const &> tie() const;
  
  friend struct std::hash<StridedRectangle>;
};

lib/pcg/src/pcg/strided_rectangle.cc line 21 at r2 (raw file):

bool StridedRectangle::operator==(StridedRectangle const &other) const {
  return std::tie(this->_sides) == std::tie(other._sides);

Suggestion:

  return this->tie() == other.tie();

lib/pcg/src/pcg/strided_rectangle.cc line 85 at r2 (raw file):

  result ^= std::hash<std::vector<::FlexFlow::StridedRectangleSide>>{}(
                x.get_sides()) +
            0x9e3779b9 + (result << 6) + (result >> 2);

FYI this is actually a function in hash-utils called hash_combine, we just inline it in dtgen (for not really any good reason, we just do at the moment)


lib/pcg/src/pcg/strided_rectangle.cc line 86 at r2 (raw file):

                x.get_sides()) +
            0x9e3779b9 + (result << 6) + (result >> 2);
  return result;

Suggestion:

  return get_std_hash(x.tie());

lib/utils/include/utils/containers/cartesian_product.h line 13 at r2 (raw file):

template <typename Container>
auto cartesian_product(Container const &containers) {

Seems like this is always going to be called over a std::vector<SomeContainer>, so maybe do that specialization to simplify this code a bit?

Code quote:

auto cartesian_product(Container const &containers) {

lib/utils/include/utils/containers/cartesian_product.h line 26 at r2 (raw file):

    }

    for (const auto &item : ordered[depth]) {

Suggestion:

ordered.at(depth)

lib/utils/include/utils/containers/cartesian_product.h line 28 at r2 (raw file):

    for (const auto &item : ordered[depth]) {
      current.push_back(item);
      recurse(current, depth + 1);

Would this be simpler as a non-recursive loop? Not a big issue as this is pretty readable, just wondering


lib/utils/src/utils/containers/any_of.h line 0 at r2 (raw file):
Rename to any_of.cc?


lib/utils/test/src/utils/containers/cartesian_product.cc line 58 at r2 (raw file):

          {1, 2}, {1, 3}, {1, 3}, {1, 2}};
      CHECK(result == correct);
    }

What if one container is empty and the others aren't? Might be a good test case to add

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 30 of 63 files reviewed, 32 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/compiler/include/compiler/machine_mapping.h line 63 at r2 (raw file):

std::unordered_set<StartInvariantMachineView>
    get_allowed_start_invariant_machine_views(
        MachineSpecification const &machinespec,

Done.


lib/compiler/src/machine_mapping.cc line 378 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Should not return a vector as that imposes an order on ParallelTensorShape's dims, should return either an unordered_set or a custom dtgen struct

changed to unordered_multiset (but have to sort the multiset in get_candidate_machine_views, since I need the ordering to be coherent between stride and dimension).


lib/compiler/src/machine_mapping.cc line 385 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

A blank line here or there would also help readability--you can use them to break up flows of steps into logical pieces

Done.


lib/compiler/src/machine_mapping.cc line 393 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This returns a multiset, right?

yes, since different dimensions might have the same degree (we want to see if there exists a mapping between the 2 (multi)sets of dimensions, so order does not matter)


lib/compiler/src/machine_mapping.cc line 401 at r2 (raw file):

                                ParallelTensorShape const &shape) {

  auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {

Done.


lib/compiler/src/machine_mapping.cc line 401 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I think a return type annotation here would really improve readability

Done.


lib/compiler/src/machine_mapping.cc line 404 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This is a bit complicated, I feel like a one-line comment would help me figure out what this dense piece of arithmetic is actually doing (or break up the nested expressions and give them each names. While composition of a bunch of functions can be concise, having intermediate variables can really help make code more readable as you can describe each intermediate value/step along the way. Conciseness is good when it benefits readability, but sometimes it gets in the way more than it helps)

I've simplified, but it's still a bit tricky to explain:
Naively, we could think that, given a (2,3) stride for example, it would result in 3*2=6 as many tiles occupied with respect to having all strides be 1, so we could say max_stride_product = num_total_devices/num_devices_used_by_tensor (where num_devices_used_by_tensor is the product of the parallel dims) and so we could simply say that the max stride across any dimension is max_stride_product, so we do the cartesian product and then filter downstream.
This however, doesn't quite work: consider, for example, a 2D MachineView with 2x2 points, and stride 2 across each dimension, and suppose there are 9 total devices. While the "volume" of the MachineView is technically 4x4, it can really fit into a 3x3 and thus we could fit it with the existing devices (since the outer volume can be cut without messing with any of the devices). To address this, I find not the number of total devices used by the tensor, but, the total number of "inner" devices, essentially the ones such that they have associated with them a full stride "volume". So I find the max stride for these using the naive procedure (which works since they all have full stride volume) and I know that if a given stride is too large for them then surely it's too large for the full set of devices, which essentially contains them. (Note that this solutions is incredibly underoptimized, if the generation turns out to be to slow I'll fix it).


lib/compiler/src/machine_mapping.cc line 405 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Should this be a stride_size_t?

fixed. is there a way to cast a container containing type X to the same container but with type Y (so not using transform and casting within the lambda)?


lib/compiler/src/machine_mapping.cc line 410 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Seems like a common pattern, probably good to pull out a separate function?

Done.


lib/compiler/src/machine_mapping.cc line 413 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not just have a get_num_devices or something function for MachineSpecification?

Done.


lib/compiler/src/machine_mapping.cc line 414 at r2 (raw file):

  std::unordered_set<MachineView> machine_views;
  int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;
  for (std::vector<int> stride :

Done.


lib/compiler/src/machine_mapping.cc line 416 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Seems like this loop is intuitively creating StartInvariantMachineViews and then looping over possible start degrees? If so, could this be written to do that? Using more descriptive data structures instead of things like std::vector<int> or std::vector<std::unordered_set<int>> is super helpful for readability. This would also let you break up this function into more intuitive pieces (generating all start-invariant machine views, and then generating all possible start dims for them)

fixed it up a bit, is this enough? I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase. I could do something like using StrideVec = std::vector<stride_t> to make it more readable, though I'm not sure it would make it more understandable.


lib/compiler/src/machine_mapping.cc line 437 at r2 (raw file):

      get_candidate_machine_views(machinespec, shape);
  views = filter(views, [&](MachineView const &view) {
    return is_valid_machine_view(view, shape);

Done.


lib/compiler/test/src/machine_mapping.cc line 9 at r2 (raw file):

TEST_SUITE(FF_TEST_SUITE) {

  TEST_CASE("get_allowed_machine_view") {

Done.


lib/compiler/test/src/machine_mapping.cc line 12 at r2 (raw file):

    SUBCASE("1 degree of parallelism") {
      MachineSpecification ms = MachineSpecification(5, 1, 1, 0, 0);

Done.


lib/pcg/include/pcg/device_coordinates.struct.toml line 18 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't think FFOrdered should be used here, since there's no correlation between MachineView dim ordering and tensor dimension ordering. FFOrdered represents things that are in the same order as the dims in a TensorShape (or the shard dims in a ParallelTensorShape)

Got it, so FFOrdered is only for Tensor dims?


lib/pcg/include/pcg/machine_view.h line 20 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What is get_size here? Name is a bit unclear, might be good to either clarify the name or just add a quick doxygen docstring explaining what this does (I think I know what it is and I'm not sure I can think of a better name, so that would imply a doxygen docstring. That said, I'm not sure this function is actually useful anywhere?)

Obsolete and removed (originally it was the "volume", so product of all the sizes of the sides, which I thought I could use for doing bounds check, but it's now useless since the devices themselves can occupy a space smaller than that (e.g. imagine a 2D machine view with stride 2 and 2 devices per side, it technically occupies a 4x4 area but can also really fit into a 3x3)


lib/pcg/include/pcg/start_invariant_machine_view.h line 9 at r2 (raw file):

namespace FlexFlow {

MachineView to_start_dependent(StartInvariantMachineView const &mv,

Done.


lib/pcg/include/pcg/start_invariant_machine_view.h line 11 at r2 (raw file):

MachineView to_start_dependent(StartInvariantMachineView const &mv,
                               device_id_t const &start_id);
StartInvariantMachineView to_start_invariant(MachineView const &mv);

Done.


lib/pcg/include/pcg/strided_rectangle.h line 15 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't remember if there are reserved id issues with underscore prefixes, but convention-wise we don't usually use them

Done.


lib/pcg/include/pcg/strided_rectangle.h line 28 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

For pure getters (things where the result is actually a field) it's safe to return a const &. That said, if you're ever in doubt just return a value type, as that's always safe 🙂

Done.


lib/pcg/include/pcg/strided_rectangle.h line 29 at r2 (raw file):

  std::vector<StridedRectangleSide> get_sides() const;
};

Done.


lib/pcg/src/pcg/strided_rectangle.cc line 21 at r2 (raw file):

bool StridedRectangle::operator==(StridedRectangle const &other) const {
  return std::tie(this->_sides) == std::tie(other._sides);

Done.


lib/pcg/src/pcg/strided_rectangle.cc line 86 at r2 (raw file):

                x.get_sides()) +
            0x9e3779b9 + (result << 6) + (result >> 2);
  return result;

Done.


lib/utils/include/utils/containers/cartesian_product.h line 13 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Seems like this is always going to be called over a std::vector<SomeContainer>, so maybe do that specialization to simplify this code a bit?

Yes you're right, fixed (not it's vector<Containers> -> unordered_multiset<vector<...>>


lib/utils/include/utils/containers/cartesian_product.h line 26 at r2 (raw file):

    }

    for (const auto &item : ordered[depth]) {

Done.


lib/utils/include/utils/containers/cartesian_product.h line 28 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would this be simpler as a non-recursive loop? Not a big issue as this is pretty readable, just wondering

I like the recursive version and found it simpler to implement.


lib/utils/test/src/utils/containers/cartesian_product.cc line 58 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What if one container is empty and the others aren't? Might be a good test case to add

Done.


lib/utils/src/utils/containers/any_of.h line at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Rename to any_of.cc?

Done.

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 30 of 66 files reviewed, 33 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/compiler/src/compiler/allowed_machine_views.cc line 124 at r3 (raw file):

                   start_invariant_from_machine_view);
}

Unsure where to put these functions, will we be using parallel_tensor_dim_idx as the main way of indexing the ParallelTensorShape going forwards or is it just for doing the machine view to tensor mapping?

wmdi
wmdi previously approved these changes Aug 21, 2024
Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 36 of 36 files at r3, all commit messages.
Reviewable status: all files reviewed, 33 unresolved discussions (waiting on @lockshaw)

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 36 of 36 files at r3, all commit messages.
Reviewable status: all files reviewed, 50 unresolved discussions (waiting on @Marsella8)


lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml line 2 at r3 (raw file):

namespace = "FlexFlow"
name = "TensorToMachineViewInjection"

Why is this not a surjection? Seems like all of the MachineView dims need to be mapped to?


lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml line 14 at r3 (raw file):

  "utils/bidict/bidict.h",
  "utils/hash/unordered_map.h"
]

Suggestion:

includes = [
  "pcg/machine_view_dim_idx.dtg.h",
  "op-attrs/parallel_tensor_dim_idx.dtg.h",
  "utils/bidict/bidict.h",
]

lib/compiler/src/compiler/allowed_machine_views.cc line 36 at r3 (raw file):

}

bool is_valid_machine_view(MachineView const &mv,

I don't see this function declared in allowed_machine_view.h?


lib/compiler/src/compiler/allowed_machine_views.cc line 124 at r3 (raw file):

Previously, Marsella8 wrote…

Unsure where to put these functions, will we be using parallel_tensor_dim_idx as the main way of indexing the ParallelTensorShape going forwards or is it just for doing the machine view to tensor mapping?

For now let's put get_parallel_dim_at_idx and get_parallel_tensor_indices into parallel_tensor_dim_idx.h


lib/compiler/src/compiler/allowed_machine_views.cc line 152 at r3 (raw file):

std::unordered_set<machine_view_dim_idx>
    get_machine_view_indices(MachineView const &mv) {

Move to machine_view_dim_idx.h


lib/compiler/src/compiler/allowed_machine_views.cc line 157 at r3 (raw file):

}

bool is_valid_injection(TensorToMachineViewInjection const &injection,

Probably better to move this and the functions below to an tensor_to_machine_view_injection.h file


lib/compiler/src/compiler/allowed_machine_views.cc line 175 at r3 (raw file):

  std::unordered_set<parallel_tensor_dim_idx> shape_indices =
      get_parallel_tensor_indices(shape);
  shape_indices = filter(shape_indices, [&](auto const idx) {

Suggestion:

(parallel_tensor_dim_idx const &idx)

lib/compiler/src/compiler/allowed_machine_views.cc line 183 at r3 (raw file):

       permutations(shape_indices)) {
    TensorToMachineViewInjection injection =
        TensorToMachineViewInjection(bidict(zip(sorted(mv_indices), p)));

Suggestion:

  std::vector<machine_view_dim_idx> machine_view_dim_ordering = sorted(mv_indices);
  std::unordered_set<TensorToMachineViewInjection> result;
  for (std::vector<parallel_tensor_dim_idx> const &tensor_dim_ordering :
       permutations(shape_indices)) {
    TensorToMachineViewInjection injection =
        TensorToMachineViewInjection(bidict(zip(machine_view_dim_ordering, tensor_dim_ordering)));

lib/compiler/test/src/allowed_machine_ views.cc line 62 at r3 (raw file):

          views.insert(mv);
        }
        return views;

Suggestion:

        return unordered_set_of(transform(count(num_starts), 
                                          [&](int start) {
                                             return MachineView{
                                               device_id_t{gpu_id_t{start}},
                                               StridedRectangle{{
                                                 StridedRectangleSide{num_points_t{2}, stride_t{stride1}},
                                                 StridedRectangleSide{num_points_t{3}, stride_t{stride2}}
                                               }},
                                             };
                                           }));

lib/compiler/test/src/allowed_machine_ views.cc line 64 at r3 (raw file):

        return views;
      };
      std::unordered_set<MachineView> correct;

Is it possible to instead consider a smaller MachineSpecification where the set of MachineViews is small enough that you don't need all this infrastructure to generate it? This testcase is a bit complicated and might be rather annoying to debug


lib/compiler/test/src/allowed_machine_ views.cc line 136 at r3 (raw file):

          make_2d_view(/*stride1*/ 3, /*stride2*/ 1),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 3),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 4)};

Suggestion:

      std::unordered_set<StartInvariantMachineView> correct = {
          make_2d_view(/*stride1*/ 1, /*stride2*/ 1),
          make_2d_view(/*stride1*/ 2, /*stride2*/ 1),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 2),
          make_2d_view(/*stride1*/ 3, /*stride2*/ 1),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 3),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 4),
      };

lib/compiler/test/src/allowed_machine_ views.cc line 140 at r3 (raw file):

      std::unordered_set<StartInvariantMachineView> result =
          get_allowed_start_invariant_machine_views(ms, shape);
      CHECK(result == correct);

Suggestion:

      std::unordered_set<StartInvariantMachineView> result =
          get_allowed_start_invariant_machine_views(ms, shape);
          
      CHECK(result == correct);

lib/compiler/test/src/allowed_machine_ views.cc line 163 at r3 (raw file):

                         StridedRectangleSide{num_points_t(2), stride_t(4)},
                         StridedRectangleSide{num_points_t(3), stride_t(1)}},
                    }};

Suggestion:

  TEST_CASE("get_all_tensor_to_machine_view_injections") {
    ParallelTensorShape shape = ParallelTensorShape{
        ParallelTensorDims{
            FFOrdered<ShardParallelDim>{
                ShardParallelDim{10, 3},
            },
            ReplicaParallelDimSet{
                SumDegree{2},
                DiscardCopyDegree{2},
            },
        },
        DataType::FLOAT,
    };
    
    MachineView view =
        MachineView{
          device_id_from_index(0, DeviceType::GPU),
          StridedRectangle{{
            StridedRectangleSide{num_points_t(2), stride_t(1)},
            StridedRectangleSide{num_points_t(2), stride_t(4)},
            StridedRectangleSide{num_points_t(3), stride_t(1)},
          }},
        };

lib/compiler/test/src/allowed_machine_ views.cc line 178 at r3 (raw file):

    std::unordered_set<TensorToMachineViewInjection> result =
        get_all_tensor_to_machine_view_injections(view, shape);
    CHECK(correct == result);

Suggestion:

    std::unordered_set<TensorToMachineViewInjection> correct = {
        TensorToMachineViewInjection{b1}, TensorToMachineViewInjection{b2}};
    std::unordered_set<TensorToMachineViewInjection> result =
        get_all_tensor_to_machine_view_injections(view, shape);
        
    CHECK(correct == result);

lib/compiler/test/src/machine_mapping.cc line 7 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {

  // TEST_CASE("MachineMapping::combine") {

What's happening with this file?


lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml line 2 at r3 (raw file):

namespace = "FlexFlow"
name = "parallel_tensor_dim_idx"

Naming convention-wise we usually either do ParallelTensorDimIdx or parallel_tensor_dim_idx_t, generally to allow parallel_tensor_dim_idx to be a variable name

Suggestion:

name = "parallel_tensor_dim_idx_t"

lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml line 10 at r3 (raw file):

  "fmt",
]
explicit_constructors = false

Prefer explicit_constructors = true unless there's a good reason. What's the reason here?


lib/pcg/include/pcg/device_coordinates.struct.toml line 18 at r2 (raw file):

Previously, Marsella8 wrote…

Got it, so FFOrdered is only for Tensor dims?

Yes, to distinguish between the FlexFlow/pytorch ordering of dimensions and the flipped legion ordering of dimensions


lib/pcg/include/pcg/device_coordinates.struct.toml line 17 at r3 (raw file):

  "utils/fmt/vector.h",
  
]

Suggestion:

includes = [
  "<vector>",
]

src_includes = [
  "utils/hash/vector.h",
  "utils/fmt/vector.h",
]

lib/pcg/include/pcg/device_coordinates.struct.toml line 20 at r3 (raw file):

[[fields]]
name = "coords"

Suggestion:

name = "raw_coords"

lib/pcg/include/pcg/machine_view_dim_idx.struct.toml line 2 at r3 (raw file):

namespace = "FlexFlow"
name = "machine_view_dim_idx"

For convention we usually either use MachineViewDimIdx or machine_view_dim_idx_t so that machine_view_dim_idx can be a variable name

Suggestion:

name = "machine_view_dim_idx_t"

lib/pcg/include/pcg/strided_rectangle.h line 17 at r3 (raw file):

  std::vector<StridedRectangleSide> sides;
  std::tuple<std::vector<StridedRectangleSide> const &> tie() const;
  friend struct std::hash<StridedRectangle>;

Usually we put member variables (i.e., sides) into their own private: or public: section for readability

Suggestion:

private: 
  std::tuple<std::vector<StridedRectangleSide> const &> tie() const;
  friend struct std::hash<StridedRectangle>;

lib/pcg/src/pcg/device_id.cc line 36 at r3 (raw file):

    default:
      throw mk_runtime_error(
          fmt::format("Unsupported DeviceType {}", get_device_type(device_id)));

Why remove logging of the device_id_t?


lib/pcg/src/pcg/device_id.cc line 43 at r3 (raw file):

  switch (device_type) {
    case DeviceType::GPU:
      return device_id_t(gpu_id_t(idx));

Prefer {...} initialization for consistency

Suggestion:

      return device_id_t{gpu_id_t{idx}};

lib/pcg/src/pcg/machine_view.cc line 33 at r3 (raw file):

      sum(transform(zip(coefficients, as_vector(point.coords)),
                    [](auto const pair) { return pair.first * pair.second; })) +
      get_raw_id(mv.start);

Suggestion:

  size_t coord_offset = sum(transform(zip(coefficients, as_vector(point.coords)),
                    [](auto const pair) { return pair.first * pair.second; }));
  size_t raw_id = get_raw_id(mv.start) + coord_offset;

lib/pcg/src/pcg/machine_view.cc line 37 at r3 (raw file):

  return ((get_device_type(mv) == DeviceType::CPU)
              ? device_id_t(cpu_id_t(raw_id))
              : device_id_t(gpu_id_t(raw_id)));

Suggestion:

  return device_id_from_index(raw_id, get_device_type(mv));

lib/pcg/src/pcg/machine_view.cc line 43 at r3 (raw file):

  std::vector<std::vector<int>> ranges =
      transform(mv.rect.get_sides(), [](StridedRectangleSide const &side) {
        return range(0, get_side_size(side).unwrapped, side.stride.unwrapped);

Pull the lambda out into a separate function (get_points or something) in strided_rectangle_side.h

Probably also worth pulling out a get_coords or something in strided_rectangle.h

Code quote:

      transform(mv.rect.get_sides(), [](StridedRectangleSide const &side) {
        return range(0, get_side_size(side).unwrapped, side.stride.unwrapped);

lib/pcg/src/pcg/machine_view.cc line 47 at r3 (raw file):

  std::unordered_set<DeviceCoordinates> devices_as_points = unordered_set_of(
      transform(cartesian_product(ranges),
                [](auto const &point) { return DeviceCoordinates(point); }));

Prefer concrete type

Suggestion:

(std::vector<int> const &point) 

lib/pcg/src/pcg/machine_view.cc line 55 at r3 (raw file):

}

device_id_t get_last_device_id(MachineView const &mv) {

Suggestion:

device_id_t get_maximum_device_id(MachineView const &mv) {

lib/pcg/src/pcg/machine_view.cc line 59 at r3 (raw file):

  //     transform(mv.rect.get_sides(), [](StridedRectangleSide const &s) {
  //       return s.stride.unwrapped;
  //     }));

Delete comment

Code quote:

  // DeviceCoordinates last_device = DeviceCoordinates(
  //     transform(mv.rect.get_sides(), [](StridedRectangleSide const &s) {
  //       return s.stride.unwrapped;
  //     }));

lib/pcg/src/pcg/machine_view.cc line 67 at r3 (raw file):

}

std::vector<num_points_t> get_num_devices_per_dim(MachineView const &mv) {

Do MachineViews have ordered sides?

Code quote:

std::vector<num_points_t>

lib/pcg/src/pcg/machine_view.cc line 73 at r3 (raw file):

}

std::vector<side_size_t> get_side_size_per_dim(MachineView const &mv) {

Do MachineViews have ordered sides?

Code quote:

std::vector<side_size_t>

lib/pcg/src/pcg/strided_rectangle.cc line 24 at r3 (raw file):

std::tuple<std::vector<StridedRectangleSide> const &>
    StridedRectangle::tie() const {
  return std::tie(sides);

Suggestion:

  return std::tie(this->sides);

lib/pcg/src/pcg/strided_rectangle.cc line 52 at r3 (raw file):

std::vector<StridedRectangleSide> const &StridedRectangle::get_sides() const {
  return sides;

Always use explicit this to reduce ambiguity

Suggestion:

  return this->sides;

lib/pcg/test/src/pcg/machine_view.cc line 56 at r3 (raw file):

      }};
      gpu_id_t start(0);
      MachineView mv{device_id_t{start}, rect};

Suggestion:

      MachineView mv = MachineView{
        gpu_id_t{0},
        StridedRectangle{{
          StridedRectangleSide(num_points_t(2), stride_t{3}),
          StridedRectangleSide(num_points_t(2), stride_t{2}),
        }},
      };

lib/pcg/test/src/pcg/machine_view.cc line 59 at r3 (raw file):

      SUBCASE("get_device_ids") {
        std::unordered_set<device_id_t> expected =
            make_gpu_device_ids({0, 2, 12, 14});

Why aren't the device ids here {0, 2, 3, 5}? Or at the very least not {0, 2, 6, 8}? I feel like I don't understand the calculation here


lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 16 at r3 (raw file):

    MachineView input = MachineView{start, rect};

    MachineView result = machine_view_from_start_invariant(

This is a good property for a rapidcheck test, but it would be good to also have a concrete test of each of the directions that isn't just testing invertibility


lib/pcg/test/src/pcg/strided_rectangle.cc line 17 at r3 (raw file):

      StridedRectangle r1 = StridedRectangle{{s1, s0}};
      CHECK(r0 == r1);
      CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1});

How much do we care that the sides are sorted vs just caring that there is a canonical order?


lib/utils/include/utils/bidict/bidict.h line 25 at r3 (raw file):

  }

  bidict(std::vector<std::pair<L, R>> init)

Any particular reason for adding this over the existing constructors?


lib/utils/include/utils/bidict/bidict.h line 26 at r3 (raw file):

  bidict(std::vector<std::pair<L, R>> init)
      : bidict(init.begin(), init.end()) {}

Suggestion:

  explicit bidict(std::vector<std::pair<L, R>> init)
      : bidict(init.begin(), init.end()) {}

lib/utils/include/utils/containers/cartesian_product.h line 17 at r3 (raw file):

  std::function<void(V &, size_t)> recurse = [&](V &current, size_t depth) {
    if (depth == containers.size()) {

Suggestion:

template <typename C, typename Elem = typename C::value_type>
std::unordered_multiset<std::vector<Elem>> 
  cartesian_product(std::vector<C> const &containers) {
  
  std::unordered_multiset<std::vector<Elem>> result;

  std::function<void(std::vector<Elem> &, size_t)> recurse = [&](std::vector<Elem> &current, size_t depth) {
    if (depth == containers.size()) {

lib/utils/include/utils/containers/cartesian_product.h line 22 at r3 (raw file):

    }

    for (const auto &item : containers.at(depth)) {

Suggestion:

    for (Elem const &item : containers.at(depth)) {

lib/utils/include/utils/containers/filter.h line 53 at r3 (raw file):

  std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f);
  return result;
}

Suggestion:

template <typename Elem, typename F>
std::unordered_multiset<Elem> filter(std::unordered_multiset<Elem> const &m,
                                     F const &f) {
  std::unordered_multiset<Elem> result;
  std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f);
  return result;
}

lib/utils/include/utils/containers/permutations.h line 13 at r3 (raw file):

template <typename C, typename V = std::vector<typename C::value_type>>
auto permutations(C const &container) {

FYI there's also a lazy implementation of permutations on the substitutions-fix branch at https://github.com/lockshaw/FlexFlow/blob/substitutions-fix/lib/utils/include/utils/containers/get_all_permutations.h (note that having this version too is fine, just wanted to make you aware)


lib/utils/include/utils/containers/permutations.h line 13 at r3 (raw file):

template <typename C, typename V = std::vector<typename C::value_type>>
auto permutations(C const &container) {

Suggestion:

template <typename C, typename Elem = typename C::value_type>
std::unordered_set<std::vector<Elem>> permutations(C const &container) {

lib/utils/include/utils/containers/replicate.h line 9 at r1 (raw file):

Previously, Marsella8 wrote…

Ahh makes sense thanks.

Not seeing this fix?


lib/utils/test/src/utils/containers/cartesian_product.cc line 10 at r1 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("cartesian_product") {
    SUBCASE("empty") {

Why remove this test case?


lib/utils/test/src/utils/containers/permutations.cc line 5 at r3 (raw file):

#include <doctest/doctest.h>
#include <unordered_set>
#include <vector>

Suggestion:

#include <unordered_set>
#include <vector>
#include "utils/fmt/vector.h"
#include "utils/fmt/unordered_set.h"

lib/compiler/src/machine_mapping.cc line 393 at r2 (raw file):

Previously, Marsella8 wrote…

yes, since different dimensions might have the same degree (we want to see if there exists a mapping between the 2 (multi)sets of dimensions, so order does not matter)

Might be good to rename that function to multiset_of to remove ambiguity


lib/compiler/src/machine_mapping.cc line 404 at r2 (raw file):

Previously, Marsella8 wrote…

I've simplified, but it's still a bit tricky to explain:
Naively, we could think that, given a (2,3) stride for example, it would result in 3*2=6 as many tiles occupied with respect to having all strides be 1, so we could say max_stride_product = num_total_devices/num_devices_used_by_tensor (where num_devices_used_by_tensor is the product of the parallel dims) and so we could simply say that the max stride across any dimension is max_stride_product, so we do the cartesian product and then filter downstream.
This however, doesn't quite work: consider, for example, a 2D MachineView with 2x2 points, and stride 2 across each dimension, and suppose there are 9 total devices. While the "volume" of the MachineView is technically 4x4, it can really fit into a 3x3 and thus we could fit it with the existing devices (since the outer volume can be cut without messing with any of the devices). To address this, I find not the number of total devices used by the tensor, but, the total number of "inner" devices, essentially the ones such that they have associated with them a full stride "volume". So I find the max stride for these using the naive procedure (which works since they all have full stride volume) and I know that if a given stride is too large for them then surely it's too large for the full set of devices, which essentially contains them. (Note that this solutions is incredibly underoptimized, if the generation turns out to be to slow I'll fix it).

Ah okay, yeah I'd add that explanation in as a comment. The code makes a lot more sense with that bit of context


lib/compiler/src/machine_mapping.cc line 405 at r2 (raw file):

Previously, Marsella8 wrote…

fixed. is there a way to cast a container containing type X to the same container but with type Y (so not using transform and casting within the lambda)?

I don't think so, and generally we avoid casting anyway in favor of explicit construction


lib/compiler/src/machine_mapping.cc line 416 at r2 (raw file):

I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase

That makes sense, but I think having a MultiDimensionalStride or something like that would be fine, especially as it's a pretty simple/obvious concept and could be used elsewhere e.g., to move get_strided_rectangle out into its own function that constructs a StridedRectangle from a starting point, a number of repetitions in each dim, and a MultiDimensionalStride

fixed it up a bit, is this enough?

It's definitely a big improvement! I do think breaking it up into stages that produce different named data structures (i.e., first generating the candidate MultiDimensionalStrides, then using them to generate the StartInvariantMachineViews, and then using them to generate the MachineViews) would help, along with a brief comment explaining the stride generation logic which is inherently a bit complicated. It might also be a good idea to add a brief comment to the candidate generation functions explaining the guarantees (if any) they provide on their outputs: for example, are each of the outputs guaranteed to have at least one valid MachineView, or can there be outputs for which no valid MachineViews exist?

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 49 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml line 2 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is this not a surjection? Seems like all of the MachineView dims need to be mapped to?

Changed the name to MachineViewToTensorMapping


lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml line 14 at r3 (raw file):

  "utils/bidict/bidict.h",
  "utils/hash/unordered_map.h"
]

placed in bidict.h (wouldn't compile otherwise)


lib/compiler/src/compiler/allowed_machine_views.cc line 36 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't see this function declared in allowed_machine_view.h?

added (their primary use is within get_allowed_machine_views but is probably nice to have them available externally).


lib/compiler/src/compiler/allowed_machine_views.cc line 152 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Move to machine_view_dim_idx.h

Done.


lib/compiler/src/compiler/allowed_machine_views.cc line 157 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Probably better to move this and the functions below to an tensor_to_machine_view_injection.h file

Done.


lib/compiler/src/compiler/allowed_machine_views.cc line 175 at r3 (raw file):

  std::unordered_set<parallel_tensor_dim_idx> shape_indices =
      get_parallel_tensor_indices(shape);
  shape_indices = filter(shape_indices, [&](auto const idx) {

Done.


lib/compiler/src/compiler/allowed_machine_views.cc line 183 at r3 (raw file):

       permutations(shape_indices)) {
    TensorToMachineViewInjection injection =
        TensorToMachineViewInjection(bidict(zip(sorted(mv_indices), p)));

Done.


lib/compiler/test/src/allowed_machine_ views.cc line 62 at r3 (raw file):

          views.insert(mv);
        }
        return views;

Done.


lib/compiler/test/src/allowed_machine_ views.cc line 64 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is it possible to instead consider a smaller MachineSpecification where the set of MachineViews is small enough that you don't need all this infrastructure to generate it? This testcase is a bit complicated and might be rather annoying to debug

I've simplified it a bit (still want to leave enough devices so that a couple of different strides can get generated, and manual enumeration might be too much).


lib/compiler/test/src/allowed_machine_ views.cc line 136 at r3 (raw file):

          make_2d_view(/*stride1*/ 3, /*stride2*/ 1),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 3),
          make_2d_view(/*stride1*/ 1, /*stride2*/ 4)};

Done.


lib/compiler/test/src/allowed_machine_ views.cc line 140 at r3 (raw file):

      std::unordered_set<StartInvariantMachineView> result =
          get_allowed_start_invariant_machine_views(ms, shape);
      CHECK(result == correct);

Done.


lib/compiler/test/src/allowed_machine_ views.cc line 163 at r3 (raw file):

                         StridedRectangleSide{num_points_t(2), stride_t(4)},
                         StridedRectangleSide{num_points_t(3), stride_t(1)}},
                    }};

Done.


lib/compiler/test/src/allowed_machine_ views.cc line 178 at r3 (raw file):

    std::unordered_set<TensorToMachineViewInjection> result =
        get_all_tensor_to_machine_view_injections(view, shape);
    CHECK(correct == result);

Done.


lib/compiler/test/src/machine_mapping.cc line 7 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's happening with this file?

old test cases, i haven't touched them.


lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml line 2 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Naming convention-wise we usually either do ParallelTensorDimIdx or parallel_tensor_dim_idx_t, generally to allow parallel_tensor_dim_idx to be a variable name

Done.


lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml line 10 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer explicit_constructors = true unless there's a good reason. What's the reason here?

no strong reason other than verbosiveness, changed it


lib/pcg/include/pcg/device_coordinates.struct.toml line 18 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Yes, to distinguish between the FlexFlow/pytorch ordering of dimensions and the flipped legion ordering of dimensions

thanks


lib/pcg/include/pcg/device_coordinates.struct.toml line 17 at r3 (raw file):

  "utils/fmt/vector.h",
  
]

Done.


lib/pcg/include/pcg/device_coordinates.struct.toml line 20 at r3 (raw file):

[[fields]]
name = "coords"

Done.


lib/pcg/include/pcg/machine_view_dim_idx.struct.toml line 2 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

For convention we usually either use MachineViewDimIdx or machine_view_dim_idx_t so that machine_view_dim_idx can be a variable name

Done.


lib/pcg/include/pcg/strided_rectangle.h line 17 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Usually we put member variables (i.e., sides) into their own private: or public: section for readability

Done.


lib/pcg/src/pcg/device_id.cc line 36 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why remove logging of the device_id_t?

re-added.


lib/pcg/src/pcg/device_id.cc line 43 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer {...} initialization for consistency

Done.


lib/pcg/src/pcg/machine_view.cc line 33 at r3 (raw file):

      sum(transform(zip(coefficients, as_vector(point.coords)),
                    [](auto const pair) { return pair.first * pair.second; })) +
      get_raw_id(mv.start);

Done.


lib/pcg/src/pcg/machine_view.cc line 37 at r3 (raw file):

  return ((get_device_type(mv) == DeviceType::CPU)
              ? device_id_t(cpu_id_t(raw_id))
              : device_id_t(gpu_id_t(raw_id)));

Done.


lib/pcg/src/pcg/machine_view.cc line 43 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Pull the lambda out into a separate function (get_points or something) in strided_rectangle_side.h

Probably also worth pulling out a get_coords or something in strided_rectangle.h

refactored (didn't use lambdas, but it's pretty straightforward now).
Didn't create get_coords since i would like to keep DeviceCoordinates within machine_view.cc (since its just there to make my life easier, shouldn't be used externally).


lib/pcg/src/pcg/machine_view.cc line 47 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer concrete type

Done.


lib/pcg/src/pcg/machine_view.cc line 55 at r3 (raw file):

}

device_id_t get_last_device_id(MachineView const &mv) {

Done.


lib/pcg/src/pcg/machine_view.cc line 59 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete comment

Done.


lib/pcg/src/pcg/machine_view.cc line 67 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Do MachineViews have ordered sides?

yes, (but the order is not chosen by the user, the sides are sorted at construction and that is the canonical ordering)


lib/pcg/src/pcg/machine_view.cc line 73 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Do MachineViews have ordered sides?

see above


lib/pcg/src/pcg/strided_rectangle.cc line 24 at r3 (raw file):

std::tuple<std::vector<StridedRectangleSide> const &>
    StridedRectangle::tie() const {
  return std::tie(sides);

Done.


lib/pcg/src/pcg/strided_rectangle.cc line 52 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Always use explicit this to reduce ambiguity

Done.


lib/pcg/test/src/pcg/machine_view.cc line 56 at r3 (raw file):

      }};
      gpu_id_t start(0);
      MachineView mv{device_id_t{start}, rect};

Done.


lib/pcg/test/src/pcg/machine_view.cc line 59 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why aren't the device ids here {0, 2, 3, 5}? Or at the very least not {0, 2, 6, 8}? I feel like I don't understand the calculation here

added an explanation, let me know if this is sufficient


lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 16 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This is a good property for a rapidcheck test, but it would be good to also have a concrete test of each of the directions that isn't just testing invertibility

updated the tests, I think rapidcheck is a bit overkill since the code itself is very simple.


lib/pcg/test/src/pcg/strided_rectangle.cc line 17 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

How much do we care that the sides are sorted vs just caring that there is a canonical order?

we only care that it has a canonical order (which is enforced at construction), and it think sorted is the most unambiguous way to go about it.


lib/utils/include/utils/bidict/bidict.h line 25 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Any particular reason for adding this over the existing constructors?

no strong reason, made it easier to construct a bidict given a zipped container


lib/utils/include/utils/bidict/bidict.h line 26 at r3 (raw file):

  bidict(std::vector<std::pair<L, R>> init)
      : bidict(init.begin(), init.end()) {}

Done.


lib/utils/include/utils/containers/cartesian_product.h line 17 at r3 (raw file):

  std::function<void(V &, size_t)> recurse = [&](V &current, size_t depth) {
    if (depth == containers.size()) {

Done.


lib/utils/include/utils/containers/cartesian_product.h line 22 at r3 (raw file):

    }

    for (const auto &item : containers.at(depth)) {

Done.


lib/utils/include/utils/containers/filter.h line 53 at r3 (raw file):

  std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f);
  return result;
}

Done.


lib/utils/include/utils/containers/permutations.h line 13 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

FYI there's also a lazy implementation of permutations on the substitutions-fix branch at https://github.com/lockshaw/FlexFlow/blob/substitutions-fix/lib/utils/include/utils/containers/get_all_permutations.h (note that having this version too is fine, just wanted to make you aware)

nice thanks, substituted my version with yours.


lib/utils/include/utils/containers/permutations.h line 13 at r3 (raw file):

template <typename C, typename V = std::vector<typename C::value_type>>
auto permutations(C const &container) {

Done.


lib/utils/include/utils/containers/replicate.h line 9 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Not seeing this fix?

Done.


lib/utils/test/src/utils/containers/cartesian_product.cc line 10 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why remove this test case?

Not sure if our function should accept an empty collection (mathematically its defined, is a single empty set, and the function actually returns that, but at the same time if somebody passed an empty collection is was probably unintentional) Let me know what you think.


lib/utils/test/src/utils/containers/permutations.cc line 5 at r3 (raw file):

#include <doctest/doctest.h>
#include <unordered_set>
#include <vector>

remove the tests (replaced by get_all_permutations)


lib/compiler/src/machine_mapping.cc line 393 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be good to rename that function to multiset_of to remove ambiguity

Done.


lib/compiler/src/machine_mapping.cc line 404 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ah okay, yeah I'd add that explanation in as a comment. The code makes a lot more sense with that bit of context

cleaned up and added the explanation


lib/compiler/src/machine_mapping.cc line 405 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't think so, and generally we avoid casting anyway in favor of explicit construction

got it thanks


lib/compiler/src/machine_mapping.cc line 416 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase

That makes sense, but I think having a MultiDimensionalStride or something like that would be fine, especially as it's a pretty simple/obvious concept and could be used elsewhere e.g., to move get_strided_rectangle out into its own function that constructs a StridedRectangle from a starting point, a number of repetitions in each dim, and a MultiDimensionalStride

fixed it up a bit, is this enough?

It's definitely a big improvement! I do think breaking it up into stages that produce different named data structures (i.e., first generating the candidate MultiDimensionalStrides, then using them to generate the StartInvariantMachineViews, and then using them to generate the MachineViews) would help, along with a brief comment explaining the stride generation logic which is inherently a bit complicated. It might also be a good idea to add a brief comment to the candidate generation functions explaining the guarantees (if any) they provide on their outputs: for example, are each of the outputs guaranteed to have at least one valid MachineView, or can there be outputs for which no valid MachineViews exist?

Added MultiDimensionalStride, and put get_strided_rectangle in StridedRectangle.

This + the explanation of candidate_strides should be enough to make it understandable.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring review until new device id calculation is added.

Reviewed 19 of 41 files at r4.
Reviewable status: 60 of 82 files reviewed, 20 unresolved discussions (waiting on @wmdi)

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 41 files at r4, 16 of 17 files at r6, all commit messages.
Reviewable status: 85 of 86 files reviewed, 31 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/src/compiler/allowed_machine_views.cc line 44 at r6 (raw file):

bool is_valid_machine_view(MachineView const &mv,
                           MachineSpecification const &machine_spec) {
  return false; // TODO: fix

Implement or mark as NOT_IMPLEMENTED()

Code quote:

  return false; // TODO: fix

lib/compiler/src/compiler/allowed_machine_views.cc line 67 at r6 (raw file):

 */
static std::unordered_set<MachineView>
    get_candidate_machine_views(MachineSpecification const &machine_spec,

Can any of this get pulled out into separate functions that make logical sense by themselves? Each statement is okay to read but the sheer number of statements is a bit intense and doesn't provide a lot of walking the reader through. If it really is this complicated that's fine (there are places where stuff just fundamentally needs a bunch of code that can't get decomposed), but I'd like to be convinced of that 🙂


lib/compiler/test/src/allowed_machine_views.cc line 0 at r6 (raw file):
What's happening with this?


lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 12 at r6 (raw file):

  TEST_CASE("get_all_machine_view_to_tensor_mappings") {
    ParallelTensorShape shape = ParallelTensorShape{
        ParallelTensorDims{

Suggestion:

    SUBCASE("no possible mappings") {
      ...@pietro implement this ...
    }
    
    SUBCASE("multiple possible mappings") {
      ParallelTensorShape shape = ParallelTensorShape{
          ParallelTensorDims{

lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 26 at r6 (raw file):

        MachineView{DeviceCoordinates{{0, 0, 0}},
                    StridedRectangle{{
                        StridedRectangleSide{num_points_t(2), stride_t(1)},

For consistency

Suggestion:

num_points_t{2},

lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 33 at r6 (raw file):

    bidict<machine_view_dim_idx_t, parallel_tensor_dim_idx_t> b1 = {
        {machine_view_dim_idx_t(2), parallel_tensor_dim_idx_t{ff_dim_t(0)}},

For consistency

Suggestion:

        {machine_view_dim_idx_t{2}, parallel_tensor_dim_idx_t{ff_dim_t{0}}},

lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 37 at r6 (raw file):

         parallel_tensor_dim_idx_t{ReplicaType::SUM}},
        {machine_view_dim_idx_t(0),
         parallel_tensor_dim_idx_t{ReplicaType::DISCARD_COPY}}};

Might make this a bit more readable.

Suggestion:

    machine_view_dim_idx_t mv_dim_0 = machine_view_dim_idx_t{0};
    machine_view_dim_idx_t mv_dim_1 = machine_view_dim_idx_t{1};
    machine_view_dim_idx_t mv_dim_2 = machine_view_dim_idx_t{2};
    parallel_tensor_dim_idx_t pt_dim_0 = parallel_tensor_dim_idx_t{ff_dim_t{0}};
    parallel_tensor_dim_idx_t pt_dim_sum = parallel_tensor_dim_idx_t{ReplicaType::SUM};
    parallel_tensor_dim_idx_t pt_dim_eq = parallel_tensor_dim_idx_t{ReplicaType::DISCARD_COPY};
    
    bidict<machine_view_dim_idx_t, parallel_tensor_dim_idx_t> b1 = {
        {mv_dim_2, pt_dim_0},
        {mv_dim_1, pt_dim_sum},
        {mv_dim_0, pt_dim_eq},
    };

lib/local-execution/test/src/test_local_cost_estimator.cc line 69 at r6 (raw file):

          std::vector<ParallelTensorAttrs>{weight_attrs},
          std::vector<ParallelTensorAttrs>{output_attrs},
          make_1d_machine_view(gpu_id_t{0}, gpu_id_t{1}));

I kinda like the clarify of having the GPU types in here--is there a reason this overload was removed? If you prefer to not wrap the argument types, it would be nice to force the DeviceType to be passed explicitly so there's no ambiguity (yes we're pretty much always going to be using GPU ids, but as long as CPU ids are an option I'd rather it be 100% clear which one is being constructed).

Another option would be to rename this to make_1d_gpu_machine_view if you like that better


lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx_t.h line 13 at r6 (raw file):

std::unordered_set<parallel_tensor_dim_idx_t>
    get_parallel_tensor_indices(ParallelTensorShape const &shape);

Suggestion:

    get_parallel_tensor_dim_indices(ParallelTensorShape const &shape);

lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc line 10 at r6 (raw file):

namespace FlexFlow {

ParallelDim get_parallel_dim_at_idx(ParallelTensorShape const &shape,

Probably better to put these in parallel_tensor_shape.h instead (at least to me that's where I'd go look for them if I wanted to find them, lmk if you disagree)


lib/pcg/include/pcg/device_coordinates.struct.toml line 2 at r6 (raw file):

namespace = "FlexFlow"
name = "DeviceCoordinates"

Makes reasoning about plurality in variable names easier 🙂

Suggestion:

name = "DeviceCoordinate"

lib/pcg/include/pcg/machine_specification_dimension.enum.toml line 11 at r6 (raw file):

[[values]]
name = "INTER"

Suggestion:

name = "INTERNODE"

lib/pcg/include/pcg/machine_specification_dimension.enum.toml line 14 at r6 (raw file):

[[values]]
name = "INTRA"

Suggestion:

name = "INTRANODE"

lib/pcg/include/pcg/machine_view_projection.struct.toml line 7 at r6 (raw file):

  # "ord",
  "hash",
  # "json",

Any reason no json serialization? I'm not seeing anything here that's obviously not serializable, but I could totally be missing something


lib/pcg/include/pcg/machine_view_projection.struct.toml line 18 at r6 (raw file):

  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h",
]

Suggestion:

includes = [
  "pcg/machine_view.dtg.h",
  "pcg/machine_view_dim_idx_t.dtg.h",
  "pcg/machine_specification_dimension.dtg.h",
]

src_includes = [
  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h",
]

lib/pcg/include/pcg/machine_view_projection.struct.toml line 21 at r6 (raw file):

[[fields]]
name = "raw_projection"

Suggestion:

name = "machine_view_dim_to_machine_spec_dim"

lib/pcg/src/pcg/machine_view.cc line 38 at r6 (raw file):

  std::unordered_set<DeviceCoordinates> device_coordinates =
      transform(raw_coordinates, [](std::vector<int> const &point) {
        return DeviceCoordinates(point);

Suggestion:

        return DeviceCoordinates{point};

lib/pcg/src/pcg/machine_view.cc line 47 at r6 (raw file):

}

device_id_t get_device_id(MachineView const &mv,

This needs simplifying (pull out some lambdas and/or functions--for example, a function that adds DeviceCoordinates together would make a lot of sense, or maybe a function that gives you a number based on a stride in a StridedRectangleDim?). When you're dealing with wrapper types, instead of just unwrapping them in your function it's often better to write small wrapper functions that do the unwrapping for you (where those functions logically stand by themselves) and then use those, e.g., the functions in computation_graph.h


lib/pcg/src/pcg/machine_view.cc line 110 at r6 (raw file):

    devices_ids.insert(get_device_id(mv, coordinates, ms, projection));
  }
  return devices_ids;

Suggestion:

  return transform(get_device_coordinates(mv), [&](DeviceCoordinates const &c) { 
                                                 return get_device_id(mv, c, ms, projection);
                                               }); 

lib/pcg/test/src/pcg/machine_view.cc line 33 at r6 (raw file):

    }
    SUBCASE("get_num_devices_per_dim") {
      std::vector<num_points_t> expected = {

Prefer correct for consistency

Suggestion:

      std::vector<num_points_t> correct = {

lib/pcg/test/src/pcg/machine_view.cc line 52 at r6 (raw file):

                      DeviceType::GPU};
      SUBCASE("get_devices_coordinates") {
        std::unordered_set<DeviceCoordinates> expected = {

Why is this not { (0, 0), (0, 3), (2, 0), (2, 3) } here? For a name like DeviceCoordinates that's kinda what I would expect (i.e. that they're in the device space, i.e., MachineSpec)? Maybe there are two concepts, MachineViewCoord and DeviceCoord that should be differentiated?


lib/pcg/test/src/pcg/machine_view.cc line 56 at r6 (raw file):

             DeviceCoordinates{{0, 1}},
             DeviceCoordinates{{1, 0}},
             DeviceCoordinates{{1, 1}}}};

Suggestion:

        std::unordered_set<DeviceCoordinates> expected = {
          DeviceCoordinates{{0, 0}},
          DeviceCoordinates{{0, 1}},
          DeviceCoordinates{{1, 0}},
          DeviceCoordinates{{1, 1}},
        };

lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 17 at r6 (raw file):

    DeviceType device_type = DeviceType::GPU;

    SUBCASE("To StartInvariantMachineView") {

Suggestion:

    SUBCASE("start_invariant_from_machine_view") {

lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 28 at r6 (raw file):

    }

    SUBCASE("From StartInvariantMachineView") {

Suggestion:

    SUBCASE("machine_view_from_start_invariant") {

lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 50 at r6 (raw file):

          machine_view_from_start_invariant(correct, start));
      CHECK(correct == result);
    }

Might be a bit clearer

Suggestion:

    SUBCASE("conversion is invertible") {
      SUBCASE("MachineView -> StrideInvariant -> MachineView") {
        MachineView correct = MachineView{start, rect, device_type};
        MachineView result = machine_view_from_start_invariant(
            start_invariant_from_machine_view(correct), start);
        CHECK(correct == result);
      }

      SUBCASE("StrideInvariant -> MachineView -> StrideInvariant") {
        StartInvariantMachineView correct =
            StartInvariantMachineView{rect, device_type};
        StartInvariantMachineView result = start_invariant_from_machine_view(
            machine_view_from_start_invariant(correct, start));
        CHECK(correct == result);
      }
    }

lib/pcg/test/src/pcg/strided_rectangle.cc line 17 at r3 (raw file):

Previously, Marsella8 wrote…

we only care that it has a canonical order (which is enforced at construction), and it think sorted is the most unambiguous way to go about it.

There are two checks here, the first CHECK(r0 == r1) seems to be that there is a canonical ordering, and a second CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1}); that that canonical ordering is sorting. Maybe it would be better to have separate SUBCASEs for them mentioning this? Not a big deal


lib/pcg/test/src/pcg/strided_rectangle.cc line 18 at r6 (raw file):

      CHECK(r0 == r1);
      CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1});
      CHECK(r1.get_sides() != std::vector<StridedRectangleSide>{s1, s0});

Isn't this trivially true based on the prior check?


lib/utils/include/utils/bidict/bidict.h line 25 at r3 (raw file):

Previously, Marsella8 wrote…

no strong reason, made it easier to construct a bidict given a zipped container

You adding this convinced me to add bidict_from_pairs, use that instead 🙂 (I'm not a huge fan of having a bunch of constructors, at least to me it just makes the container conceptually awkward)


lib/utils/include/utils/containers/replicate.h line 14 at r6 (raw file):

    result.push_back(element);
  }
  return result;

Suggestion:

  return std::vector<T>(n, element);

lib/utils/test/src/test_containers.cc line 122 at r6 (raw file):

  }

  TEST_CASE("unordered_multiset_of") {

Delete this test case


lib/utils/test/src/utils/containers/cartesian_product.cc line 10 at r1 (raw file):

Previously, Marsella8 wrote…

Not sure if our function should accept an empty collection (mathematically its defined, is a single empty set, and the function actually returns that, but at the same time if somebody passed an empty collection is was probably unintentional) Let me know what you think.

I think it's fine to return an empty set in that case, it seems like a well-defined operation. I feel like this isn't a conceptually complicated enough function that I'd feel the need to add guardrails like a non-emptiness check (if the user of the function wants that it feels like they can add it)--but in either case, test-case wise I'd like the behavior well-defined by a corresponding SUBCASE (either a CHECK that it returns empty or a CHECK_THROWS if you really think it should fail on empty input)


lib/utils/include/utils/containers/unordered_multiset_of.h line 10 at r6 (raw file):

template <typename C, typename T = typename C::value_type>
std::unordered_multiset<T> unordered_multiset_of(C const &c) {
  return {c.cbegin(), c.cend()};

Suggestion:

  return {std::cbegin(c), std::cend(c)};

wmdi
wmdi previously approved these changes Sep 11, 2024
Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 41 files at r1, 5 of 24 files at r2, 34 of 41 files at r4, 17 of 17 files at r6, all commit messages.
Reviewable status: all files reviewed, 31 unresolved discussions (waiting on @Marsella8)

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 43 of 89 files reviewed, 30 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/compiler/src/compiler/allowed_machine_views.cc line 44 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Implement or mark as NOT_IMPLEMENTED()

implemented


lib/compiler/test/src/allowed_machine_views.cc line at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's happening with this?

now re-enabled


lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 12 at r6 (raw file):

  TEST_CASE("get_all_machine_view_to_tensor_mappings") {
    ParallelTensorShape shape = ParallelTensorShape{
        ParallelTensorDims{

added, throws an error in this case (since it means that the machine view was generated for another tensor)


lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 26 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

For consistency

Done.


lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 33 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

For consistency

Done.


lib/compiler/test/src/machine_view_to_tensor_mapping.cc line 37 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might make this a bit more readable.

Done.


lib/local-execution/test/src/test_local_cost_estimator.cc line 69 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I kinda like the clarify of having the GPU types in here--is there a reason this overload was removed? If you prefer to not wrap the argument types, it would be nice to force the DeviceType to be passed explicitly so there's no ambiguity (yes we're pretty much always going to be using GPU ids, but as long as CPU ids are an option I'd rather it be 100% clear which one is being constructed).

Another option would be to rename this to make_1d_gpu_machine_view if you like that better

Removed the default GPU type. Before the interface was different since the start of the MachineView was a device_id_t, which implictly stored the device type. Now the start is a devicetype-less n-dimensional point, and thus the device type is stored separately in machineview as a separate attribute (lucklily ends up being a lot cleaner than the previous implementation)


lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx_t.h line 13 at r6 (raw file):

std::unordered_set<parallel_tensor_dim_idx_t>
    get_parallel_tensor_indices(ParallelTensorShape const &shape);

Done.


lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc line 10 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Probably better to put these in parallel_tensor_shape.h instead (at least to me that's where I'd go look for them if I wanted to find them, lmk if you disagree)

I think the original rationale we agreed on is that we'd put them here since we weren't sure whether parallel_tensor_dim_idx_t would be used more widely. If you think that this will be used for indexing ParallelTensorShape throughout the codebase, I'll move it.


lib/pcg/include/pcg/machine_specification_dimension.enum.toml line 11 at r6 (raw file):

[[values]]
name = "INTER"

Done.


lib/pcg/include/pcg/machine_specification_dimension.enum.toml line 14 at r6 (raw file):

[[values]]
name = "INTRA"

Done.


lib/pcg/include/pcg/machine_view_projection.struct.toml line 7 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Any reason no json serialization? I'm not seeing anything here that's obviously not serializable, but I could totally be missing something

added json serialization


lib/pcg/include/pcg/machine_view_projection.struct.toml line 18 at r6 (raw file):

  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h",
]

Done.


lib/pcg/include/pcg/machine_view_projection.struct.toml line 21 at r6 (raw file):

[[fields]]
name = "raw_projection"

Done.


lib/pcg/src/pcg/machine_view.cc line 38 at r6 (raw file):

  std::unordered_set<DeviceCoordinates> device_coordinates =
      transform(raw_coordinates, [](std::vector<int> const &point) {
        return DeviceCoordinates(point);

Done.


lib/pcg/src/pcg/machine_view.cc line 110 at r6 (raw file):

    devices_ids.insert(get_device_id(mv, coordinates, ms, projection));
  }
  return devices_ids;

Done.


lib/pcg/test/src/pcg/machine_view.cc line 33 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer correct for consistency

Done.


lib/pcg/test/src/pcg/machine_view.cc line 52 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why is this not { (0, 0), (0, 3), (2, 0), (2, 3) } here? For a name like DeviceCoordinates that's kinda what I would expect (i.e. that they're in the device space, i.e., MachineSpec)? Maybe there are two concepts, MachineViewCoord and DeviceCoord that should be differentiated?

Currently the indexing is done "jumping along with the stride" (e.g 3x3 machine view, 2x2 points, 2x2 stride, we have (0,0) be in position (0, 0), (1,0) be in the real position (2, 0), etc...). Depends on whether we want to index the machineview directly (in which case it's better to have the current indexing) or not (in which case you'd prefer the second one). I personally like the second one best, but when we were discussing it a few weeks ago you had used the current one and had assumed you preferred it.


lib/pcg/test/src/pcg/machine_view.cc line 56 at r6 (raw file):

             DeviceCoordinates{{0, 1}},
             DeviceCoordinates{{1, 0}},
             DeviceCoordinates{{1, 1}}}};

Done.


lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 17 at r6 (raw file):

    DeviceType device_type = DeviceType::GPU;

    SUBCASE("To StartInvariantMachineView") {

Done.


lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 28 at r6 (raw file):

    }

    SUBCASE("From StartInvariantMachineView") {

Done.


lib/pcg/test/src/pcg/start_invariant_machine_view.cc line 50 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be a bit clearer

yes sorry


lib/pcg/test/src/pcg/strided_rectangle.cc line 17 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

There are two checks here, the first CHECK(r0 == r1) seems to be that there is a canonical ordering, and a second CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1}); that that canonical ordering is sorting. Maybe it would be better to have separate SUBCASEs for them mentioning this? Not a big deal

also added a docstring


lib/pcg/test/src/pcg/strided_rectangle.cc line 18 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Isn't this trivially true based on the prior check?

removed.


lib/utils/include/utils/containers/replicate.h line 14 at r6 (raw file):

    result.push_back(element);
  }
  return result;

wow thanks


lib/utils/test/src/test_containers.cc line 122 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete this test case

Done.


lib/utils/test/src/utils/containers/cartesian_product.cc line 10 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I think it's fine to return an empty set in that case, it seems like a well-defined operation. I feel like this isn't a conceptually complicated enough function that I'd feel the need to add guardrails like a non-emptiness check (if the user of the function wants that it feels like they can add it)--but in either case, test-case wise I'd like the behavior well-defined by a corresponding SUBCASE (either a CHECK that it returns empty or a CHECK_THROWS if you really think it should fail on empty input)

makes sense, I think returning the empty is reasonable, readded the test case.


lib/pcg/include/pcg/device_coordinates.struct.toml line 2 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Makes reasoning about plurality in variable names easier 🙂

Done.


lib/utils/include/utils/containers/unordered_multiset_of.h line 10 at r6 (raw file):

template <typename C, typename T = typename C::value_type>
std::unordered_multiset<T> unordered_multiset_of(C const &c) {
  return {c.cbegin(), c.cend()};

why is this preferrable? (just curious)

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 27 of 46 files at r7, 29 of 34 files at r8, 1 of 21 files at r9.
Reviewable status: 72 of 96 files reviewed, 30 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/include/compiler/allowed_machine_views.h line 15 at r8 (raw file):

bool is_valid_machine_view(MachineView const &mv,
                           TaskSpaceOperator const &task);

These two overloads seem nontrivially different(?), so giving them different (and clearer) names might help

Code quote:

bool is_valid_machine_view(MachineView const &mv,
                           TaskSpaceOperator const &task,
                           MachineSpecification const &ms);

bool is_valid_machine_view(MachineView const &mv,
                           TaskSpaceOperator const &task);

lib/local-execution/test/src/test_local_cost_estimator.cc line 11 at r8 (raw file):

// TEST_SUITE(FF_CUDA_TEST_SUITE) {
//   TEST_CASE("Local Cost Estimator") {

What's going on here?


lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc line 10 at r6 (raw file):

Previously, Marsella8 wrote…

I think the original rationale we agreed on is that we'd put them here since we weren't sure whether parallel_tensor_dim_idx_t would be used more widely. If you think that this will be used for indexing ParallelTensorShape throughout the codebase, I'll move it.

I don't know how widely parallel_tensor_dim_idx_t will be used, but it's such an obvious thing to have conceptually that I think it makes sense to integrate into parallel_tensor_shape.h, etc.


lib/pcg/include/pcg/machine_specification.h line 19 at r8 (raw file):

bool is_valid_machine_space_coordinates(MachineSpecification const &ms,
                                        MachineSpaceCoordinate const &coord);

Suggestion:

bool is_valid_machine_space_coordinate(MachineSpecification const &ms,
                                        MachineSpaceCoordinate const &coord);

lib/pcg/include/pcg/machine_view.struct.toml line 29 at r8 (raw file):

[[fields]]
name = "projection"
type = "std::vector<::FlexFlow::MachineSpecificationDimension>"

Why are these separate vectors rather than a single vector?


lib/pcg/include/pcg/machine_view_dim_idx_t.h line 9 at r8 (raw file):

namespace FlexFlow {

std::vector<machine_view_dim_idx_t>

Why a vector here?


lib/pcg/include/pcg/task_space_operator.h line 12 at r8 (raw file):

std::unordered_set<TaskSpaceCoordinate>
    get_fragment_coordinates(TaskSpaceOperator const &task);

Why not just get_task_space_coordinates?


lib/pcg/include/pcg/task_space_operator.h line 15 at r8 (raw file):

TaskSpaceCoordinate
    get_maximum_fragment_coordinate(TaskSpaceOperator const &task);

Suggestion:

   get_task_space_maximum_coordinate(TaskSpaceOperator const &task);

lib/pcg/include/pcg/task_space_operator.h line 18 at r8 (raw file):

size_t num_dims(TaskSpaceOperator const &task);
size_t num_fragments(TaskSpaceOperator const &task);

Clearer and doesn't require new terminology (i.e., "fragment")--not that I mind new terminology when necessary, just don't think it's necessary here

Suggestion:

size_t num_tasks(TaskSpaceOperator const &task);

lib/pcg/src/pcg/delete.c line 0 at r8 (raw file):
What's going on here?


lib/pcg/src/pcg/machine_specification.cc line 43 at r8 (raw file):

device_id_t get_device_id(MachineSpecification const &ms,
                          MachineSpaceCoordinate const &coord) {
  assert(is_valid_machine_space_coordinates(ms, coord));

Might be better to throw an exception so that the out-of-bounds case can be tested? We can't really test that an assert is fired

Code quote:

  assert(is_valid_machine_space_coordinates(ms, coord));

lib/pcg/test/src/pcg/machine_view.cc line 35 at r8 (raw file):

      MachineSpecification ms = MachineSpecification{
          1, 6, 6, 0, 0}; // Single node with 6 GPUs (0,1,2,3,4,5)

Add argument name comments?

Code quote:

      MachineSpecification ms = MachineSpecification{
          1, 6, 6, 0, 0}; // Single node with 6 GPUs (0,1,2,3,4,5)

lib/utils/include/utils/containers/get_all_permutations.h line 105 at r8 (raw file):

template <typename T>
struct permutations_with_repetition_container {

I'd move this to a separate get_all_permutations_with_repetitions.h file unless there's a strong reason it should be here

Code quote:

struct permutations_with_repetition_container {

lib/utils/include/utils/containers/transform.h line 50 at r8 (raw file):

template <typename F,
          typename In,
          typename Out = decltype(std::declval<F>()(std::declval<In>()))>

Suggestion:

          typename Out = std::invoke_result_t<F, In>>

lib/utils/include/utils/containers/transform.h line 53 at r8 (raw file):

std::set<Out> transform(std::set<In> const &v, F const &f) {
  std::set<Out> result;
  for (auto const &e : v) {

Suggestion:

  for (In const &e : v) {

lib/utils/test/src/utils/containers/get_all_permutations.cc line 56 at r8 (raw file):

  TEST_CASE("get_all_permutations_with_repetition") {
    SUBCASE("container size =  3, n = 1") {

Come up with better testcase descriptions. The goal of these should be a high level discussion of semantic properties and/or the failure cases they're trying to check--"container size = 3, n = 1" is something I can trivially see from the code and isn't really (at least in the current wording) a high-level idea. If there's a reason this testcase is here (say, testing the limiting case that we have only one element), then the description should be clarified. If there's not a good high-level reason for that specific testcase, it should probably be removed 🙂

Code quote:

    SUBCASE("container size =  3, n = 1") {

lib/utils/test/src/utils/containers/get_all_permutations.cc line 60 at r8 (raw file):

      std::unordered_multiset<std::vector<int>> result =
          unordered_multiset_of(get_all_permutations_with_repetition(input, 1));

I'm guessing this terminology is from https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition--that said, the naming is somewhat confusing (as even wikipedia notes), so either come up with a slightly clearer name for this (isn't this just essentially cartesian product in the case that the tuple size is >= the list size?) or add an explicit docstring with links to that wikipedia page


lib/utils/test/src/utils/containers/range.cc line 2 at r8 (raw file):

#include "utils/containers/range.h"
#include "utils/fmt/vector.h"

Don't these have to be test/utils/doctest/fmt/vector.h now?


lib/utils/include/utils/fmt/unordered_multiset.h line 27 at r7 (raw file):

          return fmt::to_string(t);
        });
    // }

Delete?


lib/pcg/include/pcg/machine_specification_coordinate.struct.toml line 17 at r7 (raw file):

[[fields]]
name = "inter"

"inter" by itself is unclear. "inter"-what?

Suggestion:

name = "node_idx"

lib/pcg/include/pcg/machine_specification_coordinate.struct.toml line 21 at r7 (raw file):

[[fields]]
name = "intra"

Suggestion:

name = "device_idx"

lib/pcg/include/pcg/task_space_operator.struct.toml line 2 at r9 (raw file):

namespace = "FlexFlow"
name = "TaskSpaceOperator"

Suggestion:

name = "OperatorTaskSpace"

lib/pcg/include/pcg/task_space_operator.struct.toml line 24 at r9 (raw file):

[[fields]]
name = "degrees"
type = "std::vector<::FlexFlow::num_points_t>"

Is there a value to using num_points now? It doesn't seem like there's really an alternative for OperatorTaskSpace (like, there's no way for it to be strided, so the extra safety doesn't seem as worth it now?)

Code quote:

<::FlexFlow::num_points_t>"

lib/utils/include/utils/containers/unordered_multiset_of.h line 10 at r6 (raw file):

Previously, Marsella8 wrote…

why is this preferrable? (just curious)

It's not really a big deal either way, but technically std::cbegin and std::cend work on raw arrays. This was really more of a "C++ interesting FYI" than a particularly important comment tbh

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 21 files at r9.
Reviewable status: 73 of 96 files reviewed, 30 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/compiler/include/compiler/allowed_machine_views.h line 15 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

These two overloads seem nontrivially different(?), so giving them different (and clearer) names might help

second one was redundant.


lib/compiler/src/compiler/allowed_machine_views.cc line 67 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Can any of this get pulled out into separate functions that make logical sense by themselves? Each statement is okay to read but the sheer number of statements is a bit intense and doesn't provide a lot of walking the reader through. If it really is this complicated that's fine (there are places where stuff just fundamentally needs a bunch of code that can't get decomposed), but I'd like to be convinced of that 🙂

I've tried to split it up a bit more, but it's probably not enough. Not super sure how to express the candidate_strides computation more cleanly otherwise.


lib/local-execution/test/src/test_local_cost_estimator.cc line 11 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's going on here?

the interface for MachineView has changed so there is not really an equivalent for make_1d_machine_view anymore.
It would have to be something like a OperatorTaskSpace which encodes the number of points (so degree of the only dimension) and then a MachineView which tells us the stride and the StartCoordinate within the MachineSpecification space.


lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc line 10 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I don't know how widely parallel_tensor_dim_idx_t will be used, but it's such an obvious thing to have conceptually that I think it makes sense to integrate into parallel_tensor_shape.h, etc.

Done. note that since we don't have the bijective mapping stuff anymore, parallel_tensor_dim_idx is not needed anymore, but I've moved it to 'parallel_tensor_shape.cc` for future use.


lib/pcg/include/pcg/machine_specification.h line 19 at r8 (raw file):

bool is_valid_machine_space_coordinates(MachineSpecification const &ms,
                                        MachineSpaceCoordinate const &coord);

Done.


lib/pcg/include/pcg/machine_view.struct.toml line 29 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why are these separate vectors rather than a single vector?

we can have them as a single vector of pairs (or some custom other data type) but i think it would make it a bit more clunky to iterate over either vector. I can change it if you think it's preferrable.


lib/pcg/include/pcg/machine_view_dim_idx_t.h line 9 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why a vector here?

old residue from the previous interface, now it's deleted


lib/pcg/include/pcg/task_space_operator.h line 12 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not just get_task_space_coordinates?

Done.


lib/pcg/include/pcg/task_space_operator.h line 15 at r8 (raw file):

TaskSpaceCoordinate
    get_maximum_fragment_coordinate(TaskSpaceOperator const &task);

Done.


lib/pcg/include/pcg/task_space_operator.h line 18 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Clearer and doesn't require new terminology (i.e., "fragment")--not that I mind new terminology when necessary, just don't think it's necessary here

Done.


lib/pcg/src/pcg/delete.c line at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What's going on here?

sorry old file where I dumped some stuff, now deleted


lib/pcg/src/pcg/machine_specification.cc line 43 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be better to throw an exception so that the out-of-bounds case can be tested? We can't really test that an assert is fired

Done + added some tests


lib/pcg/src/pcg/machine_view.cc line 47 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This needs simplifying (pull out some lambdas and/or functions--for example, a function that adds DeviceCoordinates together would make a lot of sense, or maybe a function that gives you a number based on a stride in a StridedRectangleDim?). When you're dealing with wrapper types, instead of just unwrapping them in your function it's often better to write small wrapper functions that do the unwrapping for you (where those functions logically stand by themselves) and then use those, e.g., the functions in computation_graph.h

changed, lmk if this is better (old implementation was crap, I apologize)


lib/pcg/test/src/pcg/machine_view.cc line 35 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Add argument name comments?

Done.


lib/utils/include/utils/containers/get_all_permutations.h line 105 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I'd move this to a separate get_all_permutations_with_repetitions.h file unless there's a strong reason it should be here

Done.


lib/utils/include/utils/containers/transform.h line 50 at r8 (raw file):

template <typename F,
          typename In,
          typename Out = decltype(std::declval<F>()(std::declval<In>()))>

Done.


lib/utils/include/utils/containers/transform.h line 53 at r8 (raw file):

std::set<Out> transform(std::set<In> const &v, F const &f) {
  std::set<Out> result;
  for (auto const &e : v) {

Done.


lib/utils/test/src/utils/containers/get_all_permutations.cc line 56 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Come up with better testcase descriptions. The goal of these should be a high level discussion of semantic properties and/or the failure cases they're trying to check--"container size = 3, n = 1" is something I can trivially see from the code and isn't really (at least in the current wording) a high-level idea. If there's a reason this testcase is here (say, testing the limiting case that we have only one element), then the description should be clarified. If there's not a good high-level reason for that specific testcase, it should probably be removed 🙂

great point, yeah tests weren't super targeted, slightly retooled them to be more informative.


lib/utils/test/src/utils/containers/get_all_permutations.cc line 60 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I'm guessing this terminology is from https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition--that said, the naming is somewhat confusing (as even wikipedia notes), so either come up with a slightly clearer name for this (isn't this just essentially cartesian product in the case that the tuple size is >= the list size?) or add an explicit docstring with links to that wikipedia page

afaik, permutations_with_repetition is the most standard name for this kind of behaviour. I've added a docstring


lib/utils/test/src/utils/containers/range.cc line 2 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Don't these have to be test/utils/doctest/fmt/vector.h now?

Done.


lib/utils/include/utils/fmt/unordered_multiset.h line 27 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete?

Done.


lib/pcg/include/pcg/machine_specification_coordinate.struct.toml line 17 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

"inter" by itself is unclear. "inter"-what?

Done.


lib/pcg/include/pcg/machine_specification_coordinate.struct.toml line 21 at r7 (raw file):

[[fields]]
name = "intra"

Done.


lib/pcg/include/pcg/task_space_operator.struct.toml line 2 at r9 (raw file):

namespace = "FlexFlow"
name = "TaskSpaceOperator"

Done.


lib/pcg/include/pcg/task_space_operator.struct.toml line 24 at r9 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is there a value to using num_points now? It doesn't seem like there's really an alternative for OperatorTaskSpace (like, there's no way for it to be strided, so the extra safety doesn't seem as worth it now?)

agree, only thing i can think of is that num_points_t makes clear what the degree of an operator means. changed back to int


lib/utils/include/utils/containers/unordered_multiset_of.h line 10 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

It's not really a big deal either way, but technically std::cbegin and std::cend work on raw arrays. This was really more of a "C++ interesting FYI" than a particularly important comment tbh

got it

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 21 files at r9, 34 of 36 files at r10, all commit messages.
Reviewable status: 99 of 101 files reviewed, 38 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/test/src/allowed_machine_views.cc line 18 at r10 (raw file):

    SUBCASE("1 degree of parallelism") {

      MachineSpecification ms = MachineSpecification{1, 5, 5, 0, 0};

Suggestion:

      MachineSpecification ms = MachineSpecification{
        /*num_nodes=*/1, 
        /*num_cpus_per_node=*/5, 
        /*num_gpus_per_node=*/5, 
        /*inter_node_bandwidth=*/0, 
        /*intra_node_bandwidth=*/0,
      };

lib/compiler/test/src/allowed_machine_views.cc line 24 at r10 (raw file):

          MachineView{{{stride_t{1}}},
                      {{MachineSpecificationDimension::INTRA_NODE}},
                      MachineSpaceCoordinate{0, 0, DeviceType::GPU}},

Suggestion:

                      MachineSpaceCoordinate{
                        /*node=idx=*/0, 
                        /*device_idx=*/0, 
                        DeviceType::GPU,
                      }},

lib/compiler/test/src/allowed_machine_views.cc line 45 at r10 (raw file):

    SUBCASE("2 degrees of parallelism") {

      MachineSpecification ms = MachineSpecification{3, 3, 3, 0, 0};

Suggestion:

      MachineSpecification ms = MachineSpecification{
        /*num_nodes=*/3, 
        /*num_cpus_per_node=*/3, 
        /*num_gpus_per_node=*/3, 
        /*inter_node_bandwidth=*/0, 
        /*intra_node_bandwidth=*/0,
      };

lib/compiler/test/src/allowed_machine_views.cc line 48 at r10 (raw file):

      OperatorTaskSpace task = OperatorTaskSpace{{2, 3}};

      auto make_2d_views = [&](int start_x,

Suggestion:

      auto make_2d_view

lib/compiler/test/src/allowed_machine_views.cc line 49 at r10 (raw file):

      auto make_2d_views = [&](int start_x,
                               int start_y,

Aren't these in machine space? What does y mean in a machine space?

Code quote:

                               int start_y,

lib/compiler/test/src/allowed_machine_views.cc line 63 at r10 (raw file):

      auto inter = MachineSpecificationDimension::INTER_NODE;
      std::unordered_set<MachineView> correct = {
          make_2d_views(0, 0, /*stride1*/ 1, /*stride2*/ 1, inter, intra),

Suggestion:

          make_2d_views(0, 0, /*stride1=*/1, /*stride2=*/1, inter, intra),

lib/local-execution/test/src/test_local_cost_estimator.cc line 11 at r8 (raw file):

Previously, Marsella8 wrote…

the interface for MachineView has changed so there is not really an equivalent for make_1d_machine_view anymore.
It would have to be something like a OperatorTaskSpace which encodes the number of points (so degree of the only dimension) and then a MachineView which tells us the stride and the StartCoordinate within the MachineSpecification space.

Got it, so we likely need to (at some point) add a primitive in op-attrs that lets us take an UnmappedCostEstimateKey and return the OperatorTaskSpace for the operator?


lib/pcg/include/pcg/machine_view.struct.toml line 29 at r8 (raw file):

Previously, Marsella8 wrote…

we can have them as a single vector of pairs (or some custom other data type) but i think it would make it a bit more clunky to iterate over either vector. I can change it if you think it's preferrable.

I'd go with one vector due to the "make invalid states unrepresentable" reasoning (one vector prevents the two from having different lengths). I'd add a custom well-named (maybe MachineDimensionStride? or StridedMachineDimension? or even something as basic as MachineDimensionWithStride? Feel free to use something else if you can come up with something, I'm not exactly in love with any of the above names) .struct.toml datatype to represent the combination of the two


lib/pcg/test/src/pcg/machine_specification.cc line 11 at r10 (raw file):

  TEST_CASE("MachineSpecification") {

    MachineSpecification ms = MachineSpecification{4, 16, 8, 100.0f, 200.0f};

Suggestion:

    MachineSpecification ms = MachineSpecification{
      /*num_nodes=*/4, 
      /*num_cpus_per_node=*/16, 
      /*num_gpus_per_node=*/8, 
      /*inter_node_bandwidth=*/100.0f, 
      /*intra_node_bandwidth=*/200.0f,
    };

lib/pcg/test/src/pcg/machine_specification.cc line 29 at r10 (raw file):

      SUBCASE("valid MachineSpaceCoordinate") {
        MachineSpaceCoordinate coord =
            MachineSpaceCoordinate{2, 12, DeviceType::CPU};

Suggestion:

           MachineSpaceCoordinate{
             /*node_idx=*/2, 
             /*device_idx=*/12, 
             DeviceType::CPU,
           };

lib/pcg/test/src/pcg/machine_specification.cc line 35 at r10 (raw file):

        CHECK(correct == result);
      }
      SUBCASE("invalid MachineSpaceCoordinate") {

Suggestion:

      SUBCASE("MachineSpaceCoordinate out of bounds of machine spec") {

lib/pcg/test/src/pcg/machine_specification.cc line 37 at r10 (raw file):

      SUBCASE("invalid MachineSpaceCoordinate") {
        MachineSpaceCoordinate coord =
            MachineSpaceCoordinate{2, 18, DeviceType::CPU};

Suggestion:

            MachineSpaceCoordinate{
              /*node_idx=*/2, 
              /*device_idx=*/18, 
              DeviceType::CPU,
            };

lib/pcg/test/src/pcg/machine_view.cc line 52 at r6 (raw file):

Previously, Marsella8 wrote…

Currently the indexing is done "jumping along with the stride" (e.g 3x3 machine view, 2x2 points, 2x2 stride, we have (0,0) be in position (0, 0), (1,0) be in the real position (2, 0), etc...). Depends on whether we want to index the machineview directly (in which case it's better to have the current indexing) or not (in which case you'd prefer the second one). I personally like the second one best, but when we were discussing it a few weeks ago you had used the current one and had assumed you preferred it.

I think this was an out-of-date comment from when I reviewed pre-our-few-weeks-ago-discussion, sorry for the mixup


lib/pcg/test/src/pcg/machine_view.cc line 38 at r10 (raw file):

                                                     /*num_gpus_per_node*/ 6,
                                                     0,
                                                     0};

Suggestion:

      MachineSpecification ms = MachineSpecification{
        /*num_nodes=*/1,
        /*num_cpus_per_node=*/6,
        /*num_gpus_per_node=*/6,
        /*inter_node_bandwidth=*/0,
        /*intra_node_bandwidth=*/0,
      };

lib/pcg/test/src/pcg/machine_view.cc line 40 at r10 (raw file):

                                                     0};

      SUBCASE("Fragment 0") {

Didn't we get rid of "fragment"?

Code quote:

      SUBCASE("Fragment 0") {

lib/pcg/test/src/pcg/machine_view.cc line 49 at r10 (raw file):

      }

      SUBCASE("Fragment 1") {

Come up with better case names. It appears you have three subcases here for the 2d thing, so I'm assuming you're checking the start point and then checking the stepping in each of the two dimensions?

Actually looks like not as there's only one dim, so I'm not sure why Fragment 2 is here at all. Try to think about your subcase names as explaining to me why they're they're, not just what they are (you can add near-infinite testcases, so your names should give me some idea of why this particular set of cases was chosen).

Code quote:

      SUBCASE("Fragment 1") {

lib/pcg/test/src/pcg/machine_view.cc line 58 at r10 (raw file):

      }

      SUBCASE("Fragment 2") {

What happens if the TaskSpaceCoordinate is out of bounds?


lib/pcg/test/src/pcg/machine_view.cc line 74 at r10 (raw file):

                      {{MachineSpecificationDimension::INTER_NODE,
                        MachineSpecificationDimension::INTRA_NODE}},
                      MachineSpaceCoordinate{1, 2, DeviceType::GPU}};

Feels a bit weird having the start field last? 😂

Code quote:

                      MachineSpaceCoordinate{1, 2, DeviceType::GPU}}

lib/pcg/test/src/pcg/machine_view.cc line 80 at r10 (raw file):

                                                     /*num_gpus_per_node*/ 5,
                                                     0,
                                                     0};

Suggestion:

      MachineSpecification ms = MachineSpecification{
        /*num_nodes=*/3,
        /*num_cpus_per_node=*/5,
        /*num_gpus_per_node=*/5,
        /*inter_node_bandwidth=*/0,
        /*intra_node_bandwidth=*/0,
      };

lib/pcg/test/src/pcg/machine_view.cc line 82 at r10 (raw file):

                                                     0};

      SUBCASE("Fragment (0,0)") {

See above about better testcase naming

Code quote:

     SUBCASE("Fragment (0,0)") {

lib/pcg/test/src/pcg/machine_view.cc line 85 at r10 (raw file):

        TaskSpaceCoordinate coord = TaskSpaceCoordinate{{0, 0}};
        MachineSpaceCoordinate correct =
            MachineSpaceCoordinate{1, 2, DeviceType::GPU};

Suggestion:

            MachineSpaceCoordinate{/*node_idx=*/1, /*device_idx=*/2, DeviceType::GPU};

lib/pcg/test/src/pcg/machine_view.cc line 124 at r10 (raw file):

                      {{MachineSpecificationDimension::INTER_NODE,
                        MachineSpecificationDimension::INTRA_NODE,
                        MachineSpecificationDimension::INTRA_NODE}},

Rather than having duplicate INTER_NODE/INTRA_NODE values for the first time in the 3d case, why not have a separate 2d case that is just duplicates?

Code quote:

      OperatorTaskSpace task = OperatorTaskSpace{{2, 2, 2}};
      MachineView mv =
          MachineView{{{stride_t{1}, stride_t{2}, stride_t{1}}},
                      {{MachineSpecificationDimension::INTER_NODE,
                        MachineSpecificationDimension::INTRA_NODE,
                        MachineSpecificationDimension::INTRA_NODE}},

lib/pcg/test/src/pcg/machine_view.cc line 128 at r10 (raw file):

      MachineSpecification ms = MachineSpecification{/*num_nodes*/ 2,
                                                     /*num_cpus_per_node*/ 8,

Fix parameter naming comments


lib/pcg/test/src/pcg/machine_view.cc line 133 at r10 (raw file):

                                                     0};

      SUBCASE("Fragment (0,0,1)") {

Better testcase names (see above)


lib/utils/include/utils/containers/get_all_permutations_with_repetition.h line 16 at r10 (raw file):

 * @details
 *https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition
 **/

Suggestion:

/**
 * @brief For a given container `c` and integer `n`, return all possible vectors
 * of size `n` that only contain (possibly duplicated) elements of `c`.
 * @details
 * https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition
 **/

lib/utils/include/utils/containers/get_all_permutations_with_repetition.h line 18 at r10 (raw file):

 **/
template <typename T>
struct permutations_with_repetition_container {

Would probably be better to just do this as a strict function rather than a lazy iterator until we actually have a concrete need to not materialize the whole container at once (I probably should have taken my own advice when writing get_permutations tbh). I'm not against the container implementation, but it's usually going to be much more complex and harder to read so we should only do it if we have a real reason


lib/utils/test/src/utils/containers/cartesian_product.cc line 4 at r10 (raw file):

#include "test/utils/doctest/fmt/unordered_multiset.h"
#include "utils/fmt/unordered_multiset.h"
#include "utils/fmt/vector.h"

Are these still necessary?

Code quote:

#include "utils/fmt/unordered_multiset.h"
#include "utils/fmt/vector.h"

lib/utils/test/src/utils/containers/replicate.cc line 4 at r10 (raw file):

#include "test/utils/doctest/fmt/vector.h"
#include "utils/fmt/unordered_set.h"
#include "utils/fmt/vector.h"

Are these still necessary?

Code quote:

#include "utils/fmt/unordered_set.h"
#include "utils/fmt/vector.h"

lib/utils/test/src/utils/containers/scanl.cc line 3 at r10 (raw file):

#include "utils/containers/scanl.h"
#include "test/utils/doctest/fmt/vector.h"
#include "utils/fmt/vector.h"

Is this still necessary?

Code quote:

#include "utils/fmt/vector.h"

lib/utils/test/src/utils/containers/scanl.cc line 40 at r10 (raw file):

  }

  TEST_CASE("scanl1") {

What happens if the input to scanl1 is empty?

Code quote:

 TEST_CASE("scanl1") {

lib/pcg/include/pcg/operator_task_space.h line 1 at r10 (raw file):

#ifndef _FLEXFLOW_PCG_INCLUDE_operator_task_space_H

Capitalization?


lib/pcg/test/src/pcg/operator_task_space.cc line 9 at r10 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("operator_task_space functions") {
    SUBCASE("get_task_space_coordinates") {

What if the task space dim list is empty?


lib/pcg/test/src/pcg/operator_task_space.cc line 9 at r10 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("operator_task_space functions") {
    SUBCASE("get_task_space_coordinates") {

Suggestion:

  TEST_CASE("get_task_space_coordinates") {

lib/pcg/test/src/pcg/operator_task_space.cc line 11 at r10 (raw file):

    SUBCASE("get_task_space_coordinates") {

      SUBCASE("2D Task") {

Suggestion:

      SUBCASE("2D Task Space") {

lib/pcg/test/src/pcg/operator_task_space.cc line 19 at r10 (raw file):

             TaskSpaceCoordinate{{0, 1}},
             TaskSpaceCoordinate{{1, 0}},
             TaskSpaceCoordinate{{1, 1}}}};

Suggestion:

        std::unordered_set<TaskSpaceCoordinate> correct = {
          TaskSpaceCoordinate{{0, 0}},
          TaskSpaceCoordinate{{0, 1}},
          TaskSpaceCoordinate{{1, 0}},
          TaskSpaceCoordinate{{1, 1}}.
        };

lib/pcg/test/src/pcg/operator_task_space.cc line 32 at r10 (raw file):

             TaskSpaceCoordinate{{0, 0, 1}},
             TaskSpaceCoordinate{{0, 1, 0}},
             TaskSpaceCoordinate{{0, 1, 1}}}};

Suggestion:

        std::unordered_set<TaskSpaceCoordinate> correct = {
          TaskSpaceCoordinate{{0, 0, 0}},
          TaskSpaceCoordinate{{0, 0, 1}},
          TaskSpaceCoordinate{{0, 1, 0}},
          TaskSpaceCoordinate{{0, 1, 1}},
        };

lib/pcg/test/src/pcg/operator_task_space.cc line 38 at r10 (raw file):

      }
    }
    SUBCASE("get_task_space_maximum_coordinate") {

Suggestion:

    TEST_CASE("get_task_space_maximum_coordinate") {

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 99 of 101 files reviewed, 38 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/compiler/test/src/allowed_machine_views.cc line 18 at r10 (raw file):

    SUBCASE("1 degree of parallelism") {

      MachineSpecification ms = MachineSpecification{1, 5, 5, 0, 0};

Done.


lib/compiler/test/src/allowed_machine_views.cc line 24 at r10 (raw file):

          MachineView{{{stride_t{1}}},
                      {{MachineSpecificationDimension::INTRA_NODE}},
                      MachineSpaceCoordinate{0, 0, DeviceType::GPU}},

Done.


lib/compiler/test/src/allowed_machine_views.cc line 45 at r10 (raw file):

    SUBCASE("2 degrees of parallelism") {

      MachineSpecification ms = MachineSpecification{3, 3, 3, 0, 0};

Done.


lib/compiler/test/src/allowed_machine_views.cc line 48 at r10 (raw file):

      OperatorTaskSpace task = OperatorTaskSpace{{2, 3}};

      auto make_2d_views = [&](int start_x,

Done.


lib/compiler/test/src/allowed_machine_views.cc line 49 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Aren't these in machine space? What does y mean in a machine space?

Machine space is the 2D space described by machine specification. (!= Task Space, which is N-dimensional, with N dimension of the parallel tensor). So we have a vector of strides, a vector of dimensions to which to map each dimension of the task, and then a 2D coordinate (start_x, start_y) which is the node and device indices.


lib/compiler/test/src/allowed_machine_views.cc line 63 at r10 (raw file):

      auto inter = MachineSpecificationDimension::INTER_NODE;
      std::unordered_set<MachineView> correct = {
          make_2d_views(0, 0, /*stride1*/ 1, /*stride2*/ 1, inter, intra),

Done.


lib/local-execution/test/src/test_local_cost_estimator.cc line 11 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Got it, so we likely need to (at some point) add a primitive in op-attrs that lets us take an UnmappedCostEstimateKey and return the OperatorTaskSpace for the operator?

yes pretty much (we don't really need OperatorTaskSpace as long as we have a datatype from which we can get a vector of the parallel degrees, but obviously OperatorTaskSpace makes things more clean)


lib/pcg/test/src/pcg/machine_specification.cc line 11 at r10 (raw file):

  TEST_CASE("MachineSpecification") {

    MachineSpecification ms = MachineSpecification{4, 16, 8, 100.0f, 200.0f};

Done.


lib/pcg/test/src/pcg/machine_specification.cc line 29 at r10 (raw file):

      SUBCASE("valid MachineSpaceCoordinate") {
        MachineSpaceCoordinate coord =
            MachineSpaceCoordinate{2, 12, DeviceType::CPU};

Done.


lib/pcg/test/src/pcg/machine_specification.cc line 35 at r10 (raw file):

        CHECK(correct == result);
      }
      SUBCASE("invalid MachineSpaceCoordinate") {

Done.


lib/pcg/test/src/pcg/machine_specification.cc line 37 at r10 (raw file):

      SUBCASE("invalid MachineSpaceCoordinate") {
        MachineSpaceCoordinate coord =
            MachineSpaceCoordinate{2, 18, DeviceType::CPU};

Done.


lib/pcg/test/src/pcg/machine_view.cc line 38 at r10 (raw file):

                                                     /*num_gpus_per_node*/ 6,
                                                     0,
                                                     0};

Done.


lib/pcg/test/src/pcg/machine_view.cc line 40 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Didn't we get rid of "fragment"?

Done.


lib/pcg/test/src/pcg/machine_view.cc line 49 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Come up with better case names. It appears you have three subcases here for the 2d thing, so I'm assuming you're checking the start point and then checking the stepping in each of the two dimensions?

Actually looks like not as there's only one dim, so I'm not sure why Fragment 2 is here at all. Try to think about your subcase names as explaining to me why they're they're, not just what they are (you can add near-infinite testcases, so your names should give me some idea of why this particular set of cases was chosen).

made a docstring for each case that explains it pretty extensively.


lib/pcg/test/src/pcg/machine_view.cc line 58 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What happens if the TaskSpaceCoordinate is out of bounds?

now returns an optional if out of bound.
(also added testing)


lib/pcg/test/src/pcg/machine_view.cc line 74 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Feels a bit weird having the start field last? 😂

lol fair enough, changed :-)


lib/pcg/test/src/pcg/machine_view.cc line 80 at r10 (raw file):

                                                     /*num_gpus_per_node*/ 5,
                                                     0,
                                                     0};

Done.


lib/pcg/test/src/pcg/machine_view.cc line 82 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

See above about better testcase naming

Done.


lib/pcg/test/src/pcg/machine_view.cc line 85 at r10 (raw file):

        TaskSpaceCoordinate coord = TaskSpaceCoordinate{{0, 0}};
        MachineSpaceCoordinate correct =
            MachineSpaceCoordinate{1, 2, DeviceType::GPU};

Done.


lib/pcg/test/src/pcg/machine_view.cc line 124 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Rather than having duplicate INTER_NODE/INTRA_NODE values for the first time in the 3d case, why not have a separate 2d case that is just duplicates?

good point, added new test case


lib/pcg/test/src/pcg/machine_view.cc line 128 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Fix parameter naming comments

Done.


lib/pcg/test/src/pcg/machine_view.cc line 133 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Better testcase names (see above)

Done.


lib/utils/include/utils/containers/get_all_permutations_with_repetition.h line 16 at r10 (raw file):

 * @details
 *https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition
 **/

Done.


lib/utils/include/utils/containers/get_all_permutations_with_repetition.h line 18 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would probably be better to just do this as a strict function rather than a lazy iterator until we actually have a concrete need to not materialize the whole container at once (I probably should have taken my own advice when writing get_permutations tbh). I'm not against the container implementation, but it's usually going to be much more complex and harder to read so we should only do it if we have a real reason

Done.


lib/utils/test/src/utils/containers/cartesian_product.cc line 4 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Are these still necessary?

Done.


lib/utils/test/src/utils/containers/replicate.cc line 4 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Are these still necessary?

Done.


lib/utils/test/src/utils/containers/scanl.cc line 3 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is this still necessary?

Done.


lib/utils/test/src/utils/containers/scanl.cc line 40 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What happens if the input to scanl1 is empty?

returns an empty list (added test case)


lib/pcg/include/pcg/operator_task_space.h line 1 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Capitalization?

Done.


lib/pcg/test/src/pcg/operator_task_space.cc line 9 at r10 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("operator_task_space functions") {
    SUBCASE("get_task_space_coordinates") {

Done.


lib/pcg/test/src/pcg/operator_task_space.cc line 9 at r10 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What if the task space dim list is empty?

Returns a single empty coordinate (vacuously makes sense?). Not sure what a zero dimensional TaskSpaceOperator would represent, whether something with no parallelism or a straight up wrong data-type (so something with no parallelism would be a TaskSpaceOperator with 1 or more ones. Let me know what you think is best.


lib/pcg/test/src/pcg/operator_task_space.cc line 11 at r10 (raw file):

    SUBCASE("get_task_space_coordinates") {

      SUBCASE("2D Task") {

Done.


lib/pcg/test/src/pcg/operator_task_space.cc line 19 at r10 (raw file):

             TaskSpaceCoordinate{{0, 1}},
             TaskSpaceCoordinate{{1, 0}},
             TaskSpaceCoordinate{{1, 1}}}};

Done.


lib/pcg/test/src/pcg/operator_task_space.cc line 32 at r10 (raw file):

             TaskSpaceCoordinate{{0, 0, 1}},
             TaskSpaceCoordinate{{0, 1, 0}},
             TaskSpaceCoordinate{{0, 1, 1}}}};

Done.


lib/pcg/test/src/pcg/operator_task_space.cc line 38 at r10 (raw file):

      }
    }
    SUBCASE("get_task_space_maximum_coordinate") {

Done.

Copy link
Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 87 of 102 files reviewed, 38 unresolved discussions (waiting on @lockshaw and @wmdi)


lib/pcg/include/pcg/machine_view.struct.toml line 29 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I'd go with one vector due to the "make invalid states unrepresentable" reasoning (one vector prevents the two from having different lengths). I'd add a custom well-named (maybe MachineDimensionStride? or StridedMachineDimension? or even something as basic as MachineDimensionWithStride? Feel free to use something else if you can come up with something, I'm not exactly in love with any of the above names) .struct.toml datatype to represent the combination of the two

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants