I am trying to use GPU to improve a code for studying quantum master equations. During the code in some parts, I need to calculate expectation values of a density matrix state. I calculate those expectation values using the trace definition \langle O \rangle = Tr( \rho O).
I found that there is a problem in general when I want to calculate traces using GPU.
using ITensors
using CUDA
sites = siteinds("S=1/2",50)
O = MPO(sites, "Id")
tr(O) #Here I got: 1.1258999068426202e15
#Now, if we try the same with GPU:
O = cu(O)
tr(O) #Here I got: ArgumentError: cannot take the CPU address of a CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}
For the time being you can perform the trace manually by contracting the bra and ket site indices of each site with delta tensors. EDIT: Though you have to make sure the delta tensors are constructed on GPU before contracting with the MPO, the current code isn’t doing that which is why it is failing.
The fix implemented by @kmp5 has some room for improvement in terms of performance, let us know if you see any performance issues and we can try to optimize the implementation.