Skip to content

Avoid using aggregate types when lowering Hamiltonian #1507

Open
@erick-xanadu

Description

@erick-xanadu

In the Performance Tips for Frontend Authors section of the LLVM documentation the following suggestion is presented:

Avoid creating values of aggregate types (i.e. structs and arrays). In particular, avoid loading and storing them, or manipulating them with insertvalue and extractvalue instructions. Instead, only load and store individual fields of the aggregate.
There are some exceptions to this rule:

It is fine to use values of aggregate type in global variable initializers.

It is fine to return structs, if this is done to represent the return of multiple values in registers.

It is fine to work with structs returned by LLVM intrinsics, such as the with.overflow family of intrinsics.

It is fine to use aggregate types without creating values. For example, they are commonly used in getelementptr instructions or attributes like sret.

The goal of this issue is to evaluate the performance impact of the suggestion. Specifically. when lowering hamiltonians, we first create an aggregate value and then store this aggregate value to an alloca pointer. We can avoid creating the aggregate value and store individual fields in the alloca pointer. In other words

func.func @hamiltonian(%obs : !quantum.obs, %p1 : memref<1xf64>, %p2 : memref<3xf64>) {
    quantum.hamiltonian(%p1 : memref<1xf64>) %obs : !quantum.obs
    return
}

Produces

module {                                                                                      
  llvm.func @__catalyst__qis__HamiltonianObs(!llvm.ptr, i64, ...) -> i64                                 
  llvm.func @hamiltonian(%arg0: i64, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: !llvm.ptr, %arg7: !llvm.ptr, %arg8: i64, %arg9: i64, %arg10: i64) {
    %0 = llvm.mlir.constant(1 : i64) : i64                                                    
    %1 = llvm.alloca %0 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr
    %2 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %3 = llvm.insertvalue %arg1, %2[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>           
    %4 = llvm.insertvalue %arg2, %3[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %5 = llvm.insertvalue %arg3, %4[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>     
    %6 = llvm.insertvalue %arg4, %5[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %7 = llvm.insertvalue %arg5, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
    %8 = llvm.mlir.constant(1 : i64) : i64
    llvm.store %7, %1 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr
    %9 = llvm.call @__catalyst__qis__HamiltonianObs(%1, %8, %arg0) vararg(!llvm.func<i64 (ptr, i64, ...)>) : (!llvm.ptr, i64, i64) -> i64
    llvm.return
  } 
} 

when we could produce:

module {                                                                                      
  llvm.func @__catalyst__qis__HamiltonianObs(!llvm.ptr, i64, ...) -> i64                                 
  llvm.func @hamiltonian(%arg0: i64, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: !llvm.ptr, %arg7: !llvm.ptr, %arg8: i64, %arg9: i64, %arg10: i64) {
    %0 = llvm.mlir.constant(1 : i64) : i64                                                    
    %1 = llvm.alloca %0 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr
    // get offsets with gep
    // store values from args into gep
    %9 = llvm.call @__catalyst__qis__HamiltonianObs(%1, %8, %arg0) vararg(!llvm.func<i64 (ptr, i64, ...)>) : (!llvm.ptr, i64, i64) -> i64
    llvm.return
  } 
} 

Completion of this issue should examine the following:

  1. Implement the suggested improvements to the hamiltonian function.
  2. Conduct a performance evaluation of the two versions of the hamiltonian function.
  3. Are the improvements suggested in Performance Tips for Frontend Authors](https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type) worthwhile in this situation? Explain why or why not?
  4. On initial inspection it is not obvious that replacing the instances of llvm.insertvalue with llvm.store instructions would result in a positive improvement. If we find that it does in fact yield a speedup, why does it? For example, are there downstream optimizations that LLVM applies?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions