If your circuit is deep, circuit evolution/VQE with MPS/MPO will take a lot of time and memory because of the growth of entanglement, and would get worse if you are using nonlocal gates. What number of qubits and what circuit depth are you trying to run? As I said, beyond block sparse threading or dense tensor operation threading (described in the link to in the documentation above), there isn’t much parallelization you can take advantage of for that kind of calculation, at least nothing easily accessible in ITensor right now (and for a large number of qubits and large circuit depth, no known algorithms that would ultimately circumvent the growth of entanglement, besides an error corrected quantum computer).
Could you raise that as an issue here: Issues · ITensor/ITensors.jl · GitHub with a minimal runnable example (for example, what is ansatz_to_U
, ansatz
, t
, psi_0
, etc.?). It’s hard to help without a code that we can run (please read Welcome to ITensor Discourse).
It looks like it may be a bug related to running a mixture of complex and real tensors (which should work, but may be broken right now). You could try apply(complex.(U), cu(complex(psi_0)); ...)