Integer Division Error when running `dmrg` on GPU

shoumikdc · April 2, 2023, 11:38pm

Hi!

I’ve been trying to work with ITensorGPU, but have been running into the following DivideError:

Stack Trace

ERROR: DivideError: integer division error
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/libcublas.jl:106 [inlined]
  [2] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/src/pool.jl:312 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/libcublas.jl:22 [inlined]
  [4] cublasDnrm2_v2(handle::Ptr{CUDA.CUBLAS.cublasContext}, n::Int64, x::CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, incx::Int64, result::Base.RefValue{Float64})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
  [5] nrm2
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/wrappers.jl:168 [inlined]
  [6] nrm2
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/wrappers.jl:173 [inlined]
  [7] norm
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/linalg.jl:108 [inlined]
  [8] norm
    @ ~/.julia/packages/CUDA/ZdCxS/lib/cublas/linalg.jl:107 [inlined]
  [9] norm(T::NDTensors.DenseTensor{Float64, 3, Tuple{Index{Int64}, Index{Int64}, Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}})
    @ ITensorGPU ~/.julia/packages/ITensorGPU/x16B1/src/tensor/cudense.jl:33
 [10] norm(T::ITensor)
    @ ITensors ~/.julia/packages/ITensors/4aoLl/src/itensor.jl:1757
 [11] initialize(iter::KrylovKit.LanczosIterator{ProjMPO, ITensor, KrylovKit.ModifiedGramSchmidt2}; verbosity::Int64)
    @ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/factorizations/lanczos.jl:170
 [12] eigsolve(A::ProjMPO, x₀::ITensor, howmany::Int64, which::Symbol, alg::KrylovKit.Lanczos{KrylovKit.ModifiedGramSchmidt2, Float64})
    @ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/lanczos.jl:11
 [13] #eigsolve#38
    @ ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/eigsolve.jl:202 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:322 [inlined]
 [15] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:321 [inlined]
 [17] macro expansion
    @ ./timing.jl:382 [inlined]
 [18] dmrg(PH::ProjMPO, psi0::MPS, sweeps::Sweeps; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ITensors ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:289
 [19] dmrg
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:211 [inlined]
 [20] #dmrg#1016
    @ ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:73 [inlined]
 [21] dmrg(H::MPO, psi0::MPS, sweeps::Sweeps)
    @ ITensors ~/.julia/packages/ITensors/4aoLl/src/mps/dmrg.jl:66
 [22] top-level scope
    @ REPL[18]:1
 [23] top-level scope
    @ ~/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155

I was wondering if anyone has come across this “integer division error” before? Or if anyone has any tips for how to proceed with debugging? I’m not quite sure how to resolve this. I’ve tried uninstalling and reinstalling, but the issue persists.

The example I’m using is one from the ITensorGPU tests (specifically, test_dmrg.jl). I’ve copied the exact code I used below:

Code Example

using ITensors, ITensorGPU, Random

N = 32
sites = siteinds("S=1/2", N)
Random.seed!(432)
psi0 = randomCuMPS(sites)

# Example: Transverse-Field Ising
ampo = AutoMPO()
for j in 1:N
  j < N && add!(ampo, -1.0, "Sz", j, "Sz", j + 1)
  add!(ampo, -0.5, "Sx", j)
end
H = cuMPO(MPO(ampo, sites))

sweeps = Sweeps(5)
maxdim!(sweeps, 10, 20)
cutoff!(sweeps, 1E-12)
noise!(sweeps, 1E-10)
energy, psi = dmrg(H, psi0, sweeps; outputlevel=0)

Here’s the info about the versions of ITensor and ITensorGPU that I have installed, as well as some details about the Julia/CUDA installation:

Version Info

Edit: Also, in light of PR 1107, let me also mention that I am using cuTENSOR version 1.0.1 here.

mtfishman · April 3, 2023, 1:34pm

Thanks for the report, we will try to reproduce the issue you are seeing.

mtfishman · April 3, 2023, 9:25pm

We are unable to reproduce the error you see.

We have merged [ITensorGPU][Bug] Fix Jenkins using older CUDA/cuTENSOR by kmp5VT · Pull Request #1107 · ITensor/ITensors.jl · GitHub and the fix should now be available if you upgrade to ITensorGPU v0.1.3. Could you please upgrade to that version, make sure you are using the latest versions of CUDA, CUDA.jl, cuTENSOR, and cuTENSOR.jl compatible with ITensorGPU v0.1.3, and see if you still see an issue? All of our GPU tests are passing, which includes that DMRG example in your first post.

Topic		Replies	Views
Error with conserved quantum numbers and ITensorGPU ITensor Julia Questions julia , dmrg	3	418	August 12, 2022
Naive use of CUDA for DMRG leads to MethodError ITensor Julia Questions	4	57	November 8, 2024
DMRG code example error ITensor Julia Questions	4	107	April 28, 2024
"ERROR: LoadError: SystemError: close: No space left on device" while running ITensor Julia code on cluster ITensor Julia Questions julia , dmrg , cluster	4	447	July 1, 2022
Issues using Block Sparse Multithreading DMRG and Numerical Methods julia , multithreading	4	111	October 2, 2024

Integer Division Error when running `dmrg` on GPU

Related topics