Integer Division Error when running `dmrg` on GPU

Hi!

I’ve been trying to work with ITensorGPU, but have been running into the following DivideError:

Stack Trace
ERROR: DivideError: integer division error
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/libcublas.jl:106 [inlined]
  [2] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/src/pool.jl:312 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/libcublas.jl:22 [inlined]
  [4] cublasDnrm2_v2(handle::Ptr{CUDA.CUBLAS.cublasContext}, n::Int64, x::CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, incx::Int64, result::Base.RefValue{Float64})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
  [5] nrm2
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/wrappers.jl:168 [inlined]
  [6] nrm2
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/wrappers.jl:173 [inlined]
  [7] norm
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/linalg.jl:108 [inlined]
  [8] norm
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/linalg.jl:107 [inlined]
  [9] norm(T::NDTensors.DenseTensor{Float64, 3, Tuple{Index{Int64}, Index{Int64}, Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}})
    @ ITensorGPU ~/.julia/packages/ITensorGPU/x16B1/src/tensor/cudense.jl:33
 [10] norm(T::ITensor)
    @ ITensors ~/.julia/packages/ITensors/4aoLl/src/itensor.jl:1757
 [11] initialize(iter::KrylovKit.LanczosIterator{ProjMPO, ITensor, KrylovKit.ModifiedGramSchmidt2}; verbosity::Int64)
    @ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/factorizations/lanczos.jl:170
 [12] eigsolve(A::ProjMPO, x₀::ITensor, howmany::Int64, which::Symbol, alg::KrylovKit.Lanczos{KrylovKit.ModifiedGramSchmidt2, Float64})
    @ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/lanczos.jl:11
 [13] #eigsolve#38
    @ ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/eigsolve.jl:202 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:322 [inlined]
 [15] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:321 [inlined]
 [17] macro expansion
    @ ./timing.jl:382 [inlined]
 [18] dmrg(PH::ProjMPO, psi0::MPS, sweeps::Sweeps; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ITensors ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:289
 [19] dmrg
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:211 [inlined]
 [20] #dmrg#1016
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:73 [inlined]
 [21] dmrg(H::MPO, psi0::MPS, sweeps::Sweeps)
    @ ITensors ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:66
 [22] top-level scope
    @ REPL[18]:1
 [23] top-level scope
    @ ~/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155

I was wondering if anyone has come across this “integer division error” before? Or if anyone has any tips for how to proceed with debugging? I’m not quite sure how to resolve this. I’ve tried uninstalling and reinstalling, but the issue persists.

The example I’m using is one from the ITensorGPU tests (specifically, test_dmrg.jl). I’ve copied the exact code I used below:

Code Example
using ITensors, ITensorGPU, Random

N = 32
sites = siteinds("S=1/2", N)
Random.seed!(432)
psi0 = randomCuMPS(sites)

# Example: Transverse-Field Ising
ampo = AutoMPO()
for j in 1:N
  j < N && add!(ampo, -1.0, "Sz", j, "Sz", j + 1)
  add!(ampo, -0.5, "Sx", j)
end
H = cuMPO(MPO(ampo, sites))

sweeps = Sweeps(5)
maxdim!(sweeps, 10, 20)
cutoff!(sweeps, 1E-12)
noise!(sweeps, 1E-10)
energy, psi = dmrg(H, psi0, sweeps; outputlevel=0)

Here’s the info about the versions of ITensor and ITensorGPU that I have installed, as well as some details about the Julia/CUDA installation:

Version Info

Edit: Also, in light of PR 1107, let me also mention that I am using cuTENSOR version 1.0.1 here.

Thanks for the report, we will try to reproduce the issue you are seeing.

We are unable to reproduce the error you see.

We have merged [ITensorGPU][Bug] Fix Jenkins using older CUDA/cuTENSOR by kmp5VT · Pull Request #1107 · ITensor/ITensors.jl · GitHub and the fix should now be available if you upgrade to ITensorGPU v0.1.3. Could you please upgrade to that version, make sure you are using the latest versions of CUDA, CUDA.jl, cuTENSOR, and cuTENSOR.jl compatible with ITensorGPU v0.1.3, and see if you still see an issue? All of our GPU tests are passing, which includes that DMRG example in your first post.