Avoid using aggregate types when lowering Hamiltonian

In the [Performance Tips for Frontend Authors](https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type) section of the LLVM documentation the following suggestion is presented:

> Avoid creating values of [aggregate types](https://llvm.org/docs/LangRef.html#t-aggregate) (i.e. structs and arrays). In particular, avoid loading and storing them, or manipulating them with insertvalue and extractvalue instructions. Instead, only load and store individual fields of the aggregate.
> There are some exceptions to this rule:
>
>    It is fine to use values of aggregate type in global variable initializers.
>
>    It is fine to return structs, if this is done to represent the return of multiple values in registers.
>
>    It is fine to work with structs returned by LLVM intrinsics, such as the with.overflow family of intrinsics.
>
>    It is fine to use aggregate types without creating values. For example, they are commonly used in getelementptr instructions or attributes like sret.

The goal of this issue is to evaluate the performance impact of the suggestion.  Specifically. when lowering hamiltonians, we first create an aggregate value and then store this aggregate value to an alloca pointer. We can avoid creating the aggregate value and store individual fields in the alloca pointer. In other words

```mlir
func.func @hamiltonian(%obs : !quantum.obs, %p1 : memref<1xf64>, %p2 : memref<3xf64>) {
    quantum.hamiltonian(%p1 : memref<1xf64>) %obs : !quantum.obs
    return
}
```

Produces

```mlir
module {                                                                                      
  llvm.func @__catalyst__qis__HamiltonianObs(!llvm.ptr, i64, ...) -> i64                                 
  llvm.func @hamiltonian(%arg0: i64, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: !llvm.ptr, %arg7: !llvm.ptr, %arg8: i64, %arg9: i64, %arg10: i64) {
    %0 = llvm.mlir.constant(1 : i64) : i64                                                    
    %1 = llvm.alloca %0 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr
    %2 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %3 = llvm.insertvalue %arg1, %2[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>           
    %4 = llvm.insertvalue %arg2, %3[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %5 = llvm.insertvalue %arg3, %4[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>     
    %6 = llvm.insertvalue %arg4, %5[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %7 = llvm.insertvalue %arg5, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %8 = llvm.mlir.constant(1 : i64) : i64
    llvm.store %7, %1 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr
    %9 = llvm.call @__catalyst__qis__HamiltonianObs(%1, %8, %arg0) vararg(!llvm.func<i64 (ptr, i64, ...)>) : (!llvm.ptr, i64, i64) -> i64
    llvm.return
  } 
} 
```

when we could produce:

```mlir
module {                                                                                      
  llvm.func @__catalyst__qis__HamiltonianObs(!llvm.ptr, i64, ...) -> i64                                 
  llvm.func @hamiltonian(%arg0: i64, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: !llvm.ptr, %arg7: !llvm.ptr, %arg8: i64, %arg9: i64, %arg10: i64) {
    %0 = llvm.mlir.constant(1 : i64) : i64                                                    
    %1 = llvm.alloca %0 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr
    // get offsets with gep
    // store values from args into gep
    %9 = llvm.call @__catalyst__qis__HamiltonianObs(%1, %8, %arg0) vararg(!llvm.func<i64 (ptr, i64, ...)>) : (!llvm.ptr, i64, i64) -> i64
    llvm.return
  } 
} 
```

Completion of this issue should examine the following:
1. Implement the suggested improvements to the `hamiltonian` function.
2. Conduct a performance evaluation of the two versions of the `hamiltonian` function.
3. Are the improvements suggested in Performance Tips for Frontend Authors](https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type) worthwhile in this situation?  Explain why or why not?
4. On initial inspection it is not obvious that replacing the instances of `llvm.insertvalue` with  `llvm.store` instructions would result in a positive improvement.  If we find that it does in fact yield a speedup, why does it?  For example, are there downstream optimizations that LLVM applies?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid using aggregate types when lowering Hamiltonian #1507

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid using aggregate types when lowering Hamiltonian #1507

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions