ITensor on GPU and conserved Quantum numbers

Last year, a question was asked about conserved quantum numbers and calculation on GPU: Error with conserved quantum numbers and ITensorGPU

The answer was that this was not yet implemented then.

I am now running ITensors on the GPU (I’m not using ITensorGPU, but the newly implemented GPU backend for ITensors) and as soon as I include a conserved quantum number, the code does not work anymore. Is this feature still not implemented for GPU, or might the mistake come from something else?

If this is still not implemented, is there some way around this (while still running on GPU)? I have a Hamiltonian that conserves the electron number parity and I want to calculate the ground state energy in the case of even parity and the ground state energy in the case of odd parity.

Thanks for asking. The recently added GPU support was mainly for dense tensors, but at least some quantum number conserving tensor operations do already work with it. It would be helpful to know what bug you are running into. Could you please post the error message you got?

We are still testing that out but it should be available soon.

However, even once it is technically working, it may be nontrivial to make DMRG with conserved quantities fast on GPUs because of the block sparse structure, speedups will depend on the particulars of the block sparsity. So it may take longer to thoroughly investigate the performance and figure out the best practices, where it works well, etc.

@kmp5 could you give a summary of the status of using conserved quantities on GPU?

@melina could you share a minimal code you ran that isn’t working so we can be sure to fix it in future work?

Thank you for your replies!

Here is a minimal example:

using ITensors
using CUDA

let
    N_dots = 10

    sites = siteinds("Electron",N_dots; conserve_nfparity=true)
    #sites = siteinds("Electron",N_dots)

    state_even = ["Emp" for n=1:N_dots]

    psi0_even = cu(MPS(sites,state_even))

    nsweeps = 10
    maxdim = 20
    cutoff = 1E-5
    outputlevel=1

    os = OpSum()
    for j=1:N_dots
        os += 1.0,"Nup",j
        os += 1.0,"Ndn",j
    end
    for j=1:N_dots-1
        os += 1.0,"Cdagup",j,"Cup",j+1
        os += 1.0,"Cdagup",j+1,"Cup",j
        os += 1.0,"Cdagdn",j,"Cdn",j+1
        os += 1.0,"Cdagdn",j+1,"Cdn",j
    end

    H = cu(MPO(os,sites))
        
    energy, psi = dmrg(H, psi0_even; nsweeps, maxdim, cutoff, outputlevel)
end

If the line sites = siteinds("Electron",N_dots) is the active one, the code runs without a problem. If the line sites = siteinds("Electron",N_dots; conserve_nfparity=true) is the active one, the error message is:

ERROR: LoadError: Setting the type parameter of the type DenseVector{Float32} at position NDTensors.SetParameters.Position{1}() to Float32 is not currently defined. Either that type parameter position doesn’t exist in the type, or set_parameter has not been overloaded for this type.

Hi all! Thank you @melina for documenting this problem. Right now the blocksparse code primarily has an issue with its constructor and similar function. I believe that all or most of the actual algebra works properly - like these examples

  state = [isodd(n) ? "Up" : "Dn" for n in 1:N]
  # Initialize wavefunction to a random MPS
  # of bond-dimension 10 with same quantum
  # numbers as `state`
  psi0 = randomMPS(sites, state, 20)

  N = 3
  sites = siteinds("S=1/2", N; conserve_qns=true)
  c = NDTensors.cu(psi0)
  T1 = c[1]
  T2 = c[2]
  NDTensors.cpu(T1 * T2) == NDTensors.cpu(T1) * NDTensors.cpu(T2) # true 
  cU, cS, cV = svd(T1, ind(T1, 1));
  U, S, V = svd(NDTensors.cpu(T1), ind(T1, 1));
  data(cU) == data(U) # true 
  data(cS) == data(S) # true 
  data(cV) == data(V) # true 

  cQ, cR = qr(T1, ind(T1, 1))
  Q, R = qr(NDTensors.cpu(T1), ind(T1, 1))
  data(cQ) == data(Q) # true
  data(cR) == data(R) # true

The problem is if you look at what is being returned from these operations you find their types are wrong

typeof(data(cU)) = Vector{Float64}

obviously we expect a CuVector in this case not a CPU Vector. I think all we need to do to get blocksparse code “working” is to find and fix the areas where BlockSparse constructors are being called and make sure they have consistency in their datatypes. Now I cannot speak to how well the code will perform, but it that should fix the run time errors.
Much of this issue should get addressed in my UnallocatedZeros PR and where I have started modifying and streamlining the blocksparse constructors.

By the way, for general users it is not recommended accessing internal ITensor data with the data function, since that is considered an internal detail.

For comparisons like that you can do:

ITensors.cpu(cU) ≈ U

Also it is safer to use for comparison for floating point operations since they can be different up to rounding on CPU vs. GPU.

EDIT: I just noticed the indices output by the repeated calls to qr and svd wouldn’t be the same so you would need to do:

using ITensors: cpu

cU, cS, cV = svd(T1, sites[1]);
cu = commonind(cU, cS)
cv = commonind(cV, cS)
U, S, V = svd(cpu(T1), sites[1]);
u = commonind(U, S)
v = commonind(V, S)
replaceinds(cpu(cU), cu => u) ≈ U # true 
replaceinds(cpu(cS), (cu, cv) => (u, v)) ≈ S # true 
replaceinds(cpu(cV), cv => v) ≈ V # true 

cQ, cR = qr(T1, ind(T1, 1))
cq = commonind(cQ, cR)
Q, R = qr(cpu(T1), sites[1])
q = commonind(Q, R)
replaceinds(cpu(Q), cq => q) ≈ Q
replaceinds(cpu(R), cq => q) ≈ R
1 Like