Skip to content

Introduce SuperOperatorMatrixForm and add matrix_form argument#707

Open
albertomercurio wants to merge 11 commits into
qutip:mainfrom
albertomercurio:liouvillian-matrix-form
Open

Introduce SuperOperatorMatrixForm and add matrix_form argument#707
albertomercurio wants to merge 11 commits into
qutip:mainfrom
albertomercurio:liouvillian-matrix-form

Conversation

@albertomercurio
Copy link
Copy Markdown
Member

@albertomercurio albertomercurio commented May 4, 2026

Checklist

Thank you for contributing to QuantumToolbox.jl! Please make sure you have finished the following tasks before opening the PR.

  • Please read Contributing to Quantum Toolbox in Julia.
  • Any code changes were done in a way that does not break public API.
  • Appropriate tests were added and tested locally by running: make test.
  • Any code changes should be julia formatted by running: make format.
  • All documents (in docs/ folder) related to code changes were updated and able to build locally by running: make docs.
  • (If necessary) the CHANGELOG.md should be updated (regarding to the code changes) and built by running: make changelog.

Request for a review after you have completed all the tasks. If you have not finished them all, you can also open a Draft Pull Request to let the others know this on-going work.

Description

This package currently supports the superoperator representation in the framework of vectorized density matrix. By doing so, every superoperator is represented as a matrix. However, this can be suboptimal in several cases, especially when the system size increases.

Thus, I implemented here the support for the matrix form representation. In this framework, the density matrix remains a matrix (Operator()). The right and left-right action is obtained through the use of SciMLOperators.jl.

The user just needs to set matrix_form = Val(true) in order to use this framework. For example

  • mesolve(H, psi0, tlist, c_ops; matrix_form = Val(true))
  • liouvillian(H, c_ops; matrix_form = Val(true))
  • liouvillian_dressed_nonsecular(H, fields, T_list; matrix_form = Val(true))

There is currently this PR (SciML/SciMLOperators.jl#370) in SciMLOperators.jl that improves the cache efficiency for such cases, reducing even more the memory usage. This PR cannot be merged before that PR.

Transverse Field Ising Model Benchmarks

I benchmark now both the memory and the computational efficiency of mesolve. The benchmarks are performed on the GPU NVIDIA 4090.

using LinearAlgebra
using QuantumToolbox
using QuantumToolbox: makeVal, getVal
using CUDA
using ProgressMeter
using Chairmarks
using CairoMakie

# %%

const Jx = 25
const hz = 50

const Δ = 0.1 # Detuning with respect to the drive
const U = -0.05 # Nonlinearity
const F = 2 # Amplitude of the drive
const nth = 0.2 # Thermal photons

const γ = 1 # Decay rate

function multisite_operator_gpu(args...; to_gpu::Val = Val(false))
    op = multisite_operator(args...)
    !getVal(to_gpu) && return op
    return CUSPARSE.CuSparseMatrixCSR(op)
end

function generate_system(N, ::Val{:ising}, to_gpu::Val)
    dims = ntuple(i -> 2, makeVal(N))
    Hz = hz * sum(i -> multisite_operator_gpu(dims, i => sigmaz(), to_gpu = to_gpu), 1:getVal(N))
    Hxx = Jx * sum(i -> multisite_operator_gpu(dims, i => sigmax(), i + 1 => sigmax(), to_gpu = to_gpu), 1:(getVal(N) - 1))
    H = Hz + Hxx

    # c_ops = [sqrt(γ) * local_op(sigmam(), i, N) for i in 1:getVal(N)]
    c_ops = ntuple(i -> sqrt(γ) * multisite_operator_gpu(dims, i => sigmam(), to_gpu = to_gpu), makeVal(N))

    # e_ops = [local_op(sigmaz(), getVal(N), N)]
    e_ops = ntuple(i -> multisite_operator_gpu(dims, i => sigmaz(), to_gpu = to_gpu), makeVal(N))

    return H, c_ops, e_ops
end

function initial_state(N, ::Val{:ising}, to_gpu::Val)
    state = tensor(ntuple(i -> basis(2, 0), makeVal(N))...)
    return getVal(to_gpu) ? cu(state) : state
end

function quantumtoolbox_mesolve(N, system_type::Val; matrix_form = Val(false), to_gpu::Val = Val(false))
    H, c_ops, e_ops = generate_system(N, system_type, to_gpu)

    tlist = range(0, 10, 100)
    ψ0 = initial_state(N, system_type, to_gpu)

    mesolve(H, ψ0, tlist[1:2], c_ops, e_ops = e_ops, progress_bar = Val(false)) # Warm-up

    benchmark_result =
        @be mesolve($H, $ψ0, $tlist, $c_ops, e_ops = $e_ops, progress_bar = Val(false), matrix_form = $matrix_form).expect

    return sum(s -> s.time, benchmark_result.samples) / length(benchmark_result.samples)
end

function run_benchmarks(::Val{Nmax}, ::Val{model}; matrix_form = Val(false), to_gpu = Val(false)) where {Nmax, model}
    Nvals = ntuple(i -> Val(i), Val(Nmax))
    return @showprogress map(Nvals[2:end]) do N
        quantumtoolbox_mesolve(N, Val(model); matrix_form = matrix_form, to_gpu = to_gpu)
    end
end

function run_summarysize(::Val{Nmax}, ::Val{model}; matrix_form = Val(false)) where {Nmax, model}
    Nvals = ntuple(i -> Val(i), Val(Nmax))
    return map(Nvals[2:end]) do N
        H, c_ops, e_ops = generate_system(N, Val(model), Val(false))
        L = liouvillian(H, c_ops; matrix_form = matrix_form)
        ρ = rand_dm(ntuple(i -> 2, makeVal(N)))
        L_cached = getVal(matrix_form) ? cache_operator(L, ρ) : L
        # L_cached = L
        Base.summarysize(L_cached)
    end
end

# %%

Nmax = Val(12)

summarysize_vec = run_summarysize(Nmax, Val(:ising); matrix_form = Val(false))
summarysize_mat = run_summarysize(Nmax, Val(:ising); matrix_form = Val(true))

benchmarks_vec = run_benchmarks(Nmax, Val(:ising); matrix_form = Val(false), to_gpu = Val(true))
benchmarks_mat = run_benchmarks(Nmax, Val(:ising); matrix_form = Val(true), to_gpu = Val(true))

# %%

fig = Figure()
ax_memory = Axis(fig[1, 1], xlabel = "N", ylabel = "Memory (MB)", yscale = log10, xticks = 2:2:getVal(Nmax))
ax_time = Axis(fig[2, 1], xlabel = "N", ylabel = "Time (s)", yscale = log10, xticks = 2:2:getVal(Nmax))

scatterlines!(ax_memory, 2:getVal(Nmax), collect(summarysize_vec) ./ 1.0e6, label = "Vectorized")
scatterlines!(ax_memory, 2:getVal(Nmax), collect(summarysize_mat) ./ 1.0e6, label = "Matrix")
scatterlines!(ax_time, 2:getVal(Nmax), collect(benchmarks_vec), label = "Vectorized")
scatterlines!(ax_time, 2:getVal(Nmax), collect(benchmarks_mat), label = "Matrix")

axislegend(ax_memory; position = :lt)
axislegend(ax_time; position = :lt)

fig
image

Liouvillian Dressed Nonsecular Benchmarks

I then test the liouvillian_dressed_nonsecular, which is known to be poorly sparse.

CPU case

using QuantumToolbox
using CUDA
using SciMLOperators
using Adapt
using Chairmarks

# %%

N = 9
ωc1 = 2
ωc2 = 1
ωq = 1
g = 0.6
γ1 = 0.01
γ2 = 0.01

a1 = tensor(destroy(N), qeye(N), qeye(2))
a2 = tensor(qeye(N), destroy(N), qeye(2))
σx = tensor(qeye(N), qeye(N), sigmax())
σz = tensor(qeye(N), qeye(N), sigmaz())

H = ωc1 * a1' * a1 + ωc2 * a2' * a2 + ωq / 2 * σz + g * ((a1 + a1') + (a2 + a2')) * (σx + σz)

fields = ((γ1 / ωc1) * (a1 + a1'), (γ2 / ωc2) * (a2 + a2'))
T_list = (0.0, 0.0)

L_gme_vec = liouvillian_dressed_nonsecular(H, fields, T_list)[3]
GC.gc(true)
L_gme_mat = liouvillian_dressed_nonsecular(H, fields, T_list; matrix_form = Val(true))[3]

ρ0 = ket2dm(fock(N * N * 2, 0; dims = (N, N, 2)))

L_gme_mat_cached = cache_operator(L_gme_mat, ρ0)

Base.summarysize(L_gme_vec) / Base.summarysize(L_gme_mat_cached)

# %%

tlist = range(0, 100, length = 100)
e_ops = (a1' * a1, )

@be mesolve($L_gme_vec, $ρ0, $tlist; progress_bar = Val(false), e_ops = $(e_ops))
@be mesolve($L_gme_mat, $ρ0, $tlist; progress_bar = Val(false), e_ops = $(e_ops), matrix_form = Val(true))
Vectorized Matrix Form Ratio
Memory Usage (Mb) 4083 5.99 681
Simulation Time (ms) 4400 480 9.1

GPU

Adapt.adapt_structure(to, x::QuantumObject) = QuantumObject(Adapt.adapt_structure(to, x.data), x.type, x.dimensions)
Adapt.adapt_structure(to, x::QuantumObjectEvolution) = QuantumObjectEvolution(Adapt.adapt_structure(to, x.data), x.type, x.dimensions)
Adapt.adapt_structure(to, x::SciMLOperators.AddedOperator) = SciMLOperators.AddedOperator(Adapt.adapt_structure(to, x.ops))
Adapt.adapt_structure(to, x::SciMLOperators.MatrixOperator) = SciMLOperators.MatrixOperator(to(x.A))
Adapt.adapt_structure(to, x::QuantumToolbox.SpostSuperOperator) = QuantumToolbox.SpostSuperOperator(to(x.R))
Adapt.adapt_structure(to, x::QuantumToolbox.SprePostSuperOperator) = QuantumToolbox.SprePostSuperOperator(to(x.L), to(x.R))

L_gme_vec_gpu = CUSPARSE.CuSparseMatrixCSR(L_gme_vec)
L_gme_mat_gpu = adapt(CUSPARSE.CuSparseMatrixCSR, L_gme_mat)
ρ0_gpu = adapt(CuArray, ρ0)

# %%

e_ops_gpu = (CUSPARSE.CuSparseMatrixCSR(a1' * a1), )

@be mesolve($L_gme_vec_gpu, $ρ0_gpu, $tlist; progress_bar = Val(false), e_ops = $(e_ops_gpu))
@be mesolve($L_gme_mat_gpu, $ρ0_gpu, $tlist; progress_bar = Val(false), e_ops = $(e_ops_gpu), matrix_form = Val(true))
Vectorized Matrix Form Ratio
Simulation Time (ms) 110 39 2.8

Related Issues

This PR fixes #617

function LinearAlgebra.mul!(v::AbstractMatrix, op::SprePostSuperOperator, u::AbstractMatrix, α::Number, β::Number)
iscached(op) || throw(ArgumentError("The cache for the SprePostSuperOperator must be initialized before multiplication. Use `cache_operator` to initialize the cache."))
mul!(op.cache, op.L, u) # cache = L * u
mul!(v, op.cache, op.R, α, β) # v = α * (L * u * R) + β * v
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is potentially dense * sparse, and hurt performance greatly

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have already have efficient methods for dense * sparse multiplication?

@albertomercurio albertomercurio marked this pull request as ready for review May 6, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suboptimal and bad performance in mesolve

2 participants