Optimising MPS Time Evolution Example

I’ve have been working on implementing time-evolution of a transmon chain (Bose-Hubbard model), and I’ve been following this example MPS Time Evolution. I got the time-evolution to work, but it seems to be “slow”, or not as fast as I would hope for it to be. I used 4th order Trotter gates, but other than that what I have done is very similar to this example. Profiling my code I see that the bottleneck is the apply function which applies the gates to evolve a single time-step. Comparing to a similar solver which uses a Krylov method to do the time-evolution, this one does a lot more allocations and uses a lot more memory. Do you have any tips for optimising this?

To clarify, my question is: can anything be done in this example MPS Time Evolution to reduce allocations and memory usage when looping over the time-steps?

Here are some other questions:
I did 4th order Trotter gates by using the recursion relation withouth thinking about it too much, is it possible that the gates can be simplified somehow? (There was a mention in an other post that reversing the gates might not generalize to higher order…)
In this example the state is normalized after applying the gates but the apply function also takes the keyword normalize, is there any difference between these two?
In this example the in-place .= is not used to equate the state with the applied state, should it be?
When dealing with vectors I can initialise them by using zeros or Vector{ComplexF64}(undef, n), is there something similar to MPS?

Here is what my time-evolution loop looks like

function mpsevolve(mps0::MPS, gates::Vector{ITensor}, dt::Real, t::Real; kwargs...) #keyword arguments for ITensors.apply
    out = [deepcopy(mps0)]
    for _ in dt:dt:t
        push!(out, apply(gates, out[end]; normalize = true, kwargs...))
    end
    return out
end

I understand that here I could calculate expectation values of the state and store those instead of storing the whole state, but I am more concerned with the possibly unnecessary memory usage of apply which causes a lot of garbage collection.

Thank you for your time :slight_smile:

Hi, thanks for your patience with a slow reply. It’s hard to be sure what is making your code slow, especially since we don’t know what speed you are comparing to and also many details about it. But I could make a few conjectures based on the information you provided.

The main thing to understand is that the apply function uses an automatic system where it looks at the next gate you want to apply and rearranges the sites of the MPS so that the sites acted on by that gate become neighbors of each other. So to get apply to perform well, you need to pass the gates in using an order that reduces this “swapping” overhead as much as possible.

The ideal kind of gates would be ones where the sites are always neighbors in the order of the initial MPS and the gates follow a 1D pattern like acting on sites (1,2), then (2,3), (3,4), (4,5), etc. then when you get to the end of the chain, reversing like (N-1,N), (N-2,N-1), (N-3,N-2), etc.

For a higher-order pattern, if you have some flexibility in choosing the order, try to make it respect the 1D MPS ordering as much as possible to reduce the swapping overhead.

If you can share what the gate order you are using is, we could help you more.

Thanks for the reply! I am generating the gates based on the Bose-Hubbard model, so that the gates are operating on two sites, which are always neighbours. I think the ordering what I have should be optimal. Here is the code for generating the gates

# The expansion is 2k-th order by default
function bosehubbardgates(indices::Vector{<:Index}, dt::Real; k::Integer = 2, w=1.0, U=1.0, J=1.0)
    k < 1 ? throw(ArgumentError("Trotter order k < 1")) : nothing
    if k == 1
        L = length(indices)
        out = ITensor[]
        for i in 1:L-1
            s1 = indices[i]
            s2 = indices[i + 1]
            UTermMat = matproduct(op("N", s1), (op("N", s1) - op("I", s1)))
            h = w * op("N", s1) * op("I", s2)
            h += -U/2 * UTermMat * op("I", s2)
            h += J * (op("adag", s1) * op("a", s2) + op("a", s1) * op("adag", s2))
            exph = exp(-im * dt / 2 * h)
            push!(out, exph)
        end
        s1 = indices[end]
        h = w * op("N", s1) - U/2 * matproduct(op("N", s1), (op("N", s1) - op("I", s1)))
        exph = exp(-im * dt / 2 * h)
        push!(out, exph)
        append!(out, reverse(out))
        return out
    else
        s_k = 1 / (4 - 4^(1 / (2*k - 1)))
        U_1 = bosehubbardgates(indices, s_k * dt; k = k - 1, w, U, J)
        U_2 = bosehubbardgates(indices, (1 - 4 * s_k) * dt; k = k - 1, w, U, J)
        return vcat(U_1, U_1, U_2, U_1, U_1)
    end
end

Even for higher order Trotter, the gates should be so that no swapping is required(?)

I am very curious about figuring out if my code is “slow”, or if this performance is just what is expected from MPS. Can you help me, how should we compare the speed, what do you want to know? I have compared the performance to a krylov based solver, and I found that around 13 qubits the MPS starts to outperform the krylov based solver. I want to use MPS to study measurement induced phase transition, where there is a layer of unitary evolution (the gates), followed by a layer of random measurements (on each site with probability p a measurement “gate” is placed). To study this I need to calculate many trajectories, and this is why for me the slowness of MPS is a problem. Even though the time for calculating a single trajectory is reasonable for high number of sites, because I need to calculate so many trajectories, using MPS to study this even for small number of sites becomes cumbersome.

Please ask what you need for comparing the performance. It is important for me to figure out if my code is as fast as it can. Regarding the measurements, I have checked with the profiler that the bottleneck is in the apply of the timesteps, and not in the measurements. But is there something that I can do in the applying the measurements that can make the apply in the timesteps slow? Maybe something about the orthogonolize! or normalize!?

Another question that came to mind is that is the conserving of quantum numbers how relevant with the MPS? I understand that in general e.g. conserving the boson number means that the elements in the vectors and operators is reduced, but I have difficulty with understanding how you could take advantage of this with MPS since the total boson number is not a local property? There is a possibility to conserve quantum numbers with the ITensors package, but could you tell me a little how does it work under the hood?