Skip to content

[DC][Pipeline] Connecting together DC and pipeline #5477

Open
@mortbopet

Description

The need here is fairly similar to what's been discussed for #4613. However, i fear the synchronization point between DC (handshake, in that case) and pipelines being at the ESI level (meaning that the pipeline has already been lowered) skips the step where we'd actually want to do some meaningful analysis of DC + pipeline interactions (think merging pipelines).

This also relates to the discussion at https://discourse.llvm.org/t/should-ssa-values-in-handshake-always-have-implicit-handshake-semantics/70321. DC takes care of the "implicit SSA handshake semantics" issue, but the handshake.unit proposal is orthogonal, and still a valid design point.

As an example, we want to connect the following pipeline with the surrounding DC logic:

hw.module @myPipeline(%arg0 : !dc.value<i32>) -> (out: !dc.value<i32>) {
   ...
  %out, %done = pipeline.scheduled(%arg0) clock %clk reset %rst go %go : (i32) -> (i32) {
  ^bb0(%arg0_0: i32, %s0_valid : i1):
    %1 = comb.sub %arg0_0, %arg0_0 : i32
    pipeline.stage ^bb1 regs(%1, %arg0_0 : i32, i32)
  ^bb1(%6: i32, %7: i32, %s1_valid : i1):  // pred: ^bb1
    %8 = comb.add %6, %7 : i32
    pipeline.return %8 : i32
  }
  ...
}

Option 1: DC-interface'd pipeline

This is closer to the original notion of a "latency-insensitive pipeline" that the pipeline dialect was designed with. Back then, the design intention was "how can we have a pipeline abstraction where the body of the pipeline can serve as both a latency insensitive and latency sensitive implementation internally, in between each stage. The latter part, we found, wasn't really possible.
However, it's still perfectly valid to have the interface of the statically scheduled pipeline be latency insensitive - this essentially implies that all of the logic that was explicit - as in option 1 - is now implicit, and implemented by a lowering.

The semantics here would imply unit-rate actor semantics between all of the inputs and outputs of the pipeline + accounting for the current state of the pipeline (i.e. II > 1 and stall signal assertions).

hw.module @myPipeline(%arg0 : !dc.value<i32>) -> (out: !dc.value<i32>) {
  %out = pipeline.scheduled.li(%arg0) clock %clk reset %rst : (!dc.value<i32>) -> (!dc.value<i32>) {
  ^bb0(%arg0_0: i32, %s0_valid : i1):
    %1 = comb.sub %arg0_0, %arg0_0 : i32
    pipeline.stage ^bb1 regs(%1, %arg0_0 : i32, i32)
  ^bb1(%6: i32, %7: i32, %s1_valid : i1):  // pred: ^bb1
    %8 = comb.add %6, %7 : i32
    pipeline.return %8 : i32
  }
  hw.output %out : !dc.value<i32>
}

This obviously puts a larger strain on the lowering. However, what i like about this is wrt. IR analysis. It is trivial to determine latency-insensitive pipelines (since it's now a separate op). Furthermore, we also know that said pipeline internally is always statically scheduled, with everything that comes with that (latency, ...).

e.g. for merging two pipelines that feeds into each other:

hw.module @myPipeline(%arg0 : !dc.value<i32>) -> (out: !dc.value<i32>) {
  %out = pipeline.scheduled.li(%arg0) clock %clk reset %rst : (!dc.value<i32>) -> (!dc.value<i32>) {
  ^bb0(%arg0_0: i32, %s0_valid : i1):
    %1 = comb.sub %arg0_0, %arg0_0 : i32
    pipeline.stage ^bb1 regs(%1, %arg0_0 : i32, i32)
  ^bb1(%6: i32, %7: i32, %s1_valid : i1):  // pred: ^bb1
    %8 = comb.add %6, %7 : i32
    pipeline.return %8 : i32
  }

  %out2 = pipeline.scheduled.li(%out) clock %clk reset %rst : (!dc.value<i32>) -> (!dc.value<i32>) {
  ^bb0(%arg0_0: i32, %s0_valid : i1):
    %1 = comb.sub %arg0_0, %arg0_0 : i32
    pipeline.stage ^bb1 regs(%1, %arg0_0 : i32, i32)
  ^bb1(%6: i32, %7: i32, %s1_valid : i1):  // pred: ^bb1
    %8 = comb.add %6, %7 : i32
    pipeline.return %8 : i32
  }

  hw.output %out2 : !dc.value<i32>
}

// Merges to

hw.module @myPipeline(%arg0 : !dc.value<i32>) -> (out: !dc.value<i32>) {
  %out = pipeline.scheduled.li(%arg0) clock %clk reset %rst : (!dc.value<i32>) -> (!dc.value<i32>) {
  ^bb0(%arg0_0: i32, %s0_valid : i1):
    %1 = comb.sub %arg0_0, %arg0_0 : i32
    pipeline.stage ^bb1 regs(%1, %arg0_0 : i32, i32)
  ^bb1(%6: i32, %7: i32, %s1_valid : i1):  // pred: ^bb1
    %8 = comb.add %6, %7 : i32
    %9 = comb.sub %8, %8 : i32
    pipeline.stage ^bb2(%9, %8 : i32, i32)
  ^bb2(%10: i32, %11 : i32 %s2_valid : i1):
    %12 = comb.add %10, %11 : i32
    pipeline.return %12 : i32
  }
  hw.output %out : !dc.value<i32>
}

Option 2: Generic DC "fixed-latency, unit rate" operation

In practice, i'd assume most DC<->Pipeline optimizations pertains to the merging of known-latency groups of operations, which isn't restricted to just pipeline-dialect operations. This is where the unit-rate actor proposal of https://discourse.llvm.org/t/should-ssa-values-in-handshake-always-have-implicit-handshake-semantics/70321 comes in, wherein it is the in- and outputs of the operations which have unit-rate actor semantics, allowing us to place essentially anything within the body of the unit rate actor, so long as it returns an output valid signal.
Additionally, we would be able to tag such a dc.unit operation with information such as latency.

hw.module @myPipeline(%arg0 : !dc.value<i32>) -> (out: !dc.value<i32>) {
  %out = dc.unit(%arg0) : (!dc.value<i32>) -> (!dc.value<i32>) {latency = 1} {
    // Body of a unit rate actor just has the "unwrapped" arguments, the
    // (joined) valid signal, and has a mandatory "done"/output validity
    // return signal. 
    ^bb0(%a0 : i32, %valid : i1):
      %out, %done = pipeline.scheduled.li(%arg0) clock %clk reset %rst go %valid : (i32) -> (i32) {
        ^bb0(%arg0_0: i32, %s0_valid : i1):
            %1 = comb.sub %arg0_0, %arg0_0 : i32
            pipeline.stage ^bb1 regs(%1, %arg0_0 : i32, i32)
        ^bb1(%6: i32, %7: i32, %s1_valid : i1):  // pred: ^bb1
            %8 = comb.add %6, %7 : i32
            pipeline.return %8 : i32
        }
      return %out, %done : i32, i1
  }
  hw.output %out : !dc.value<i32>
}

While this is a very generic approach, i fear that it may make analysis and transformation a bit harder (think the case of merging pipelines - is it better to have two pipeline.scheduled.li operations abutting or two pipeline.dc.unit abutting? Given that things in dc.unit may be literally anything, we're able to place the things together, but wouldn't be able to do a cool pipeline merge into a single pipeline).

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions