Error with conserved quantum numbers and ITensorGPU

Hello,

I am working on getting GPU acceleration working for my DMRG studies, and am running into an error when using the ITensorGPU library.

DMRG runs fine with and without GPU acceleration when I set conserve_qns = false. When I set conserve_qns = true, DMRG runs fine without GPU acceleration but gets a DimensionMismatch error with GPU acceleration.

Here is the error I am seeing:

ERROR: LoadError: DimensionMismatch("new dimensions (5, 3, 3) must be consistent with array size (9,)")
Stacktrace:
  [1] reshape(a::CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, dims::Tuple{Int64, Int64, Int64})
    @ CUDA ~/.julia/packages/CUDA/5jdFl/src/array.jl:665
  [2] reshape
    @ ./reshapedarray.jl:116 [inlined]
  [3] permute!(B::NDTensors.DenseTensor{Float64, 3, Tuple{Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, A::NDTensors.DenseTensor{Float64, 3, Tuple{Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}})
    @ ITensorGPU ~/.julia/packages/ITensorGPU/QvjZs/src/tensor/cudense.jl:532
  [4] #1
    @ ~/.julia/packages/ITensorGPU/QvjZs/src/tensor/cudense.jl:61 [inlined]
  [5] permutedims!!
    @ ~/.julia/packages/ITensorGPU/QvjZs/src/tensor/cudense.jl:65 [inlined]
  [6] permutedims!!
    @ ~/.julia/packages/ITensorGPU/QvjZs/src/tensor/cudense.jl:63 [inlined]
  [7] permutedims
    @ ~/.julia/packages/NDTensors/c2BpJ/src/dense.jl:438 [inlined]
  [8] permutedims
    @ ~/.julia/packages/ITensors/OjQuG/src/itensor.jl:1786 [inlined]
  [9] _permute(as::NDTensors.NeverAlias, T::NDTensors.DenseTensor{Float64, 3, Tuple{Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, new_inds::Tuple{Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}})
    @ ITensors ~/.julia/packages/ITensors/OjQuG/src/itensor.jl:1791
 [10] permute(as::NDTensors.NeverAlias, T::ITensor, new_inds::Tuple{Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}})
    @ ITensors ~/.julia/packages/ITensors/OjQuG/src/itensor.jl:1795
 [11] #permute#223
    @ ~/.julia/packages/ITensors/OjQuG/src/itensor.jl:1776 [inlined]
 [12] permute(T::ITensor, new_inds::Tuple{Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}, Index{Vector{Pair{QN, Int64}}}})
    @ ITensors ~/.julia/packages/ITensors/OjQuG/src/itensor.jl:1761
 [13] permute(M::MPO, #unused#::Tuple{typeof(linkind), typeof(siteinds), typeof(linkind)})
    @ ITensors ~/.julia/packages/ITensors/OjQuG/src/mps/dmrg.jl:14
 [14] #dmrg#990
    @ ~/.julia/packages/ITensors/OjQuG/src/mps/dmrg.jl:45 [inlined]
 [15] dmrg
    @ ~/.julia/packages/ITensors/OjQuG/src/mps/dmrg.jl:41 [inlined]

Is this a bug in ITensorGPU, or is there possibly something missing in my code?

Yes, unfortunately our GPU backend doesn’t work for QN conserving tensors right now. It will be a bit of a project to get it working and we haven’t had the time yet but hopefully we will support it soon!

Hi @mtfishman

Thank you for getting back to me. That’s a bummer! It would be great to add that to ITensorGPU documentation so that others do not run into this.

Any idea on when QN conserving tensors might be added to the GPU backend? I am potentially interested in helping with the effort.

Thank you!
Roman

I think it wouldn’t be too hard to get some version working. A strategy would be to work through primitive operations like tensor contraction and SVD which DMRG relies on, try running the code using QN ITensors allocated on GPU and work through the errors you come across. So for example run the code:

using ITensors
using ITensorGPU

i = Index([QN(0) => 2, QN(1) => 2])
j = sim(i) 
k = sim(i) 
A = cu(randomITensor(i, dag(j)))
B = cu(randomITensor(j, dag(k)))
A * B
svd(A, i)

and work through the errors that come up.

I think the bigger challenge will be making it efficient, since there aren’t libraries for performing block sparse tensor contractions on GPU. For the CPU code we basically just loop over the QN blocks and perform dense contractions of blocks (with optional multithreading over block contractions that can be done in parallel), but for many small blocks I think this will be inefficient on GPU if done naively.