cuTENSOR not working with DMRG

Hello!

I have been trying the GPU functionality, but I can’t get DMRG to run when using cuTENSOR. Based on what I read in Running on GPUs · ITensors.jl, it should be possible, right? Everything works fine when I only load CUDA in my script, and I can’t find anyone else who has raised a question about cuTENSOR not working with DMRG, so I’m unsure what I could be doing wrong.

Here is a small code example I patched together to recreate the issue:

Code
using ITensors, ITensorMPS
using Adapt: adapt
using CUDA
using cuTENSOR # If I comment this out, everything works fine

struct Trivial end
struct U1 end

function heisenberg_xxx_1d(
    site_count::Integer,
    symmetry::Union{Type{Trivial}, Type{U1}};
    J::Real = -1.0,
    spin::Union{Integer, Rational} = 1,
)
    # https://docs.itensor.org/ITensorMPS/stable/IncludedSiteTypes.html#%22S1%22-Operators
    site_type(::Val{1}) = "S=1"
    site_type(::Val{1//2}) = "S=1/2"

    conserve(::Type{Trivial}) = (;)
    conserve(::Type{U1}) = (conserve_sz = true,)

    sites = siteinds(site_type(Val(spin)), site_count; conserve(symmetry)...)

    os = OpSum()
    lattice = square_lattice(site_count, 1; yperiodic = false)
    for b in lattice
        os -= J / 2, "S+", b.s1, "S-", b.s2
        os -= J / 2, "S-", b.s1, "S+", b.s2
        os -= J, "Sz", b.s1, "Sz", b.s2
    end

    H = MPO(os, sites)

    return H, sites
end

function main()
    @show CUDA.versioninfo()
    if isdefined(Main, :cuTENSOR)
        @show cuTENSOR.has_cutensor()
        @show cuTENSOR._initialized
    end

    site_count = 10
    symmetry = Trivial
    # symmetry = U1
    sweep_count = 4

    H, sites = heisenberg_xxx_1d(site_count, symmetry)
    states = [iseven(n) ? "Dn" : "Up" for n in 1:site_count]
    psi0 = random_mps(sites, states; linkdims = 10)

    # Use adapt to avoid conversion to Float32
    H = adapt(CuArray, H)
    psi0 = adapt(CuArray, psi0)
    @show typeof(storage(H[1]))
    @show typeof(storage(psi0[1]))

    sweeps = Sweeps(sweep_count)
    maxdim!(sweeps, 100)
    cutoff!(sweeps, 1e-12)

    energy, psi = dmrg(H, psi0, sweeps)

    return nothing
end

main()

Here is the output if I run the code:

Output

julia> include(“ITensorMPS-gpu-test.jl”)
CUDA runtime 12.9, artifact installation
CUDA driver 12.6
NVIDIA driver 560.94.0

CUDA libraries:

  • CUBLAS: 12.9.1
  • CURAND: 10.3.10
  • CUFFT: 11.4.1
  • CUSOLVER: 11.7.5
  • CUSPARSE: 12.5.10
  • CUPTI: 2025.2.1 (API 28.0.0)
  • NVML: 12.0.0+560.35.2

Julia packages:

  • CUDA: 5.8.2
  • CUDA_Driver_jll: 0.13.1+0
  • CUDA_Runtime_jll: 0.17.1+0

Toolchain:

  • Julia: 1.11.6
  • LLVM: 16.0.6

1 device:
0: NVIDIA GeForce GTX 1050 Ti (sm_61, 2.751 GiB / 4.000 GiB available)
CUDA.versioninfo() = nothing
cuTENSOR.has_cutensor() = true
cuTENSOR._initialized = Base.RefValue{Bool}(true)
typeof(storage(H[1])) = NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}
typeof(storage(psi0[1])) = NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}
ERROR: LoadError: CUTENSORError: operation not supported (yet) (code 15, CUTENSOR_STATUS_NOT_SUPPORTED)
Stacktrace:
[1] throw_api_error(res::cuTENSOR.cutensorStatus_t)
@ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/libcutensor.jl:14
[2] check
@ ~/.julia/packages/cuTENSOR/bRLMl/src/libcutensor.jl:25 [inlined]
[3] cutensorCreatePlan
@ ~/.julia/packages/GPUToolbox/cZlg7/src/ccalls.jl:33 [inlined]
[4] cuTENSOR.CuTensorPlan(desc::Ptr{cuTENSOR.cutensorOperationDescriptor}, pref::Ptr{cuTENSOR.cutensorPlanPreference}; workspacePref::cuTENSOR.cutensorWorksizePreference_t)
@ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/types.jl:160
[5] CuTensorPlan
@ ~/.julia/packages/cuTENSOR/bRLMl/src/types.jl:149 [inlined]
[6] plan_contraction(A::AbstractArray, Ainds::Vector{…}, opA::cuTENSOR.cutensorOperator_t, B::AbstractArray, Binds::Vector{…}, opB::cuTENSOR.cutensorOperator_t, C::AbstractArray, Cinds::Vector{…}, opC::cuTENSOR.cutensorOperator_t, opOut::cuTENSOR.cutensorOperator_t; jit::cuTENSOR.cutensorJitMode_t, workspace::cuTENSOR.cutensorWorksizePreference_t, algo::cuTENSOR.cutensorAlgo_t, compute_type::Nothing)
@ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:343
[7] plan_contraction
@ ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:301 [inlined]
[8] #contract!#87
@ ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:272 [inlined]
[9] contract!
@ ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:259 [inlined]
[10] mul!
@ ~/.julia/packages/cuTENSOR/bRLMl/src/interfaces.jl:57 [inlined]
[11] contract!(exposedR::NDTensors.Expose.Exposed{…}, labelsR::Tuple{…}, exposedT1::NDTensors.Expose.Exposed{…}, labelsT1::Tuple{…}, exposedT2::NDTensors.Expose.Exposed{…}, labelsT2::Tuple{…}, α::Bool, β::Bool)
@ NDTensorscuTENSORExt ~/.julia/packages/NDTensors/Lb78J/ext/NDTensorscuTENSORExt/contract.jl:45
[12] contract!
@ ~/.julia/packages/NDTensors/Lb78J/ext/NDTensorscuTENSORExt/contract.jl:25 [inlined]
[13] _contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:143 [inlined]
[14] _contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:131 [inlined]
[15] contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:219 [inlined]
[16] contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:188 [inlined]
[17] contract(tensor1::NDTensors.DenseTensor{…}, labelstensor1::Tuple{…}, tensor2::NDTensors.DenseTensor{…}, labelstensor2::Tuple{…}, labelsoutput_tensor::Tuple{…})
@ NDTensors ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:113
[18] contract(::Type{…}, tensor1::NDTensors.DenseTensor{…}, labels_tensor1::Tuple{…}, tensor2::NDTensors.DenseTensor{…}, labels_tensor2::Tuple{…})
@ NDTensors ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:91
[19] contract
@ ~/.julia/packages/SimpleTraits/7VJph/src/SimpleTraits.jl:332 [inlined]
[20] _contract(A::NDTensors.DenseTensor{Float64, 3, Tuple{…}, NDTensors.Dense{…}}, B::NDTensors.DenseTensor{Float64, 2, Tuple{…}, NDTensors.Dense{…}})
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:3
[21] _contract(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:9
[22] contract(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:74
[23] *(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:61
[24] orthogonalize!(M::MPS, j::Int64; maxdim::Nothing, normalize::Nothing)
@ ITensorMPS ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1643
[25] orthogonalize!
@ ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1603 [inlined]
[26] #orthogonalize!#337
@ ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1656 [inlined]
[27] orthogonalize!
@ ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1655 [inlined]
[28]
@ ITensorMPS ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:192
[29] dmrg
@ ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:158 [inlined]
[30] dmrg#513
@ ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:28 [inlined]
[31] dmrg
@ ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:21 [inlined]
[32] main()
@ Main ~/ITensorGPU/ITensorMPS-gpu-test.jl:63
[33] top-level scope
@ ~/ITensorGPU/ITensorMPS-gpu-test.jl:68
[34] include(fname::String)
@ Main ./sysimg.jl:38
[35] top-level scope
@ REPL[4]:1
in expression starting at /home/per/ITensorGPU/ITensorMPS-gpu-test.jl:68
Some type information was truncated. Use show(err) to see complete types.

Here is the environment status:

Package versions

(ITensorGPU) pkg> status
Status ~/ITensorGPU/Project.toml
[79e6a3ab] Adapt v4.3.0
⌃ [052768ef] CUDA v5.8.2
[0d1a4710] ITensorMPS v0.3.20
[9136182c] ITensors v0.9.9
[011b41b2] cuTENSOR v2.2.3

If you need more information, just let me know.

Hi Per,
Could you please try changing the using / include statements as follows?

  • Remove the line using CUDA
  • Replace it with import CUDA: CuArray
  • Keep the using cuTENSOR line the same

I’m curious if you still get the error with this change.

Hi Miles,

I just tried your suggestion, and from what I can see, I get the same error as before.

I have also tried some other things before, like simply using cu() instead of adapt(), not caring about the precision conversion just to test, but that also didn’t change anything.

Thanks for the report, it sounds like a bug either on our end or in cuTENSOR, we’ll have to look into that.

@kmp5 any idea what’s going on here?

Hi @PerSehlstedt ,

I was able to run this on my machine and do not see any issue. The only differences between my setup and yours are: I have a A6000 instead of NVIDIA GeForce GTX 1050 Ti and I have updated CUDA drivers and NVIDIA drivers.

CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 570.172.8

CUDA libraries:

* CUBLAS: 12.6.4
* CURAND: 10.3.7
* CUFFT: 11.3.0
* CUSOLVER: 11.7.1
* CUSPARSE: 12.5.4
* CUPTI: 2024.3.2 (API 24.0.0)
* NVML: 12.0.0+570.172.8

Julia packages:

* CUDA: 5.8.2
* CUDA_Driver_jll: 0.13.1+0
* CUDA_Runtime_jll: 0.17.1+0

Toolchain:

* Julia: 1.11.6
* LLVM: 16.0.6

Preferences:

* CUDA_Runtime_jll.version: 12.6

1 device:
0: NVIDIA RTX A6000 (sm_86, 44.952 GiB / 47.988 GiB available)
CUDA.versioninfo() = nothing

From the stacktrace the error looks like it is coming from
cuTENSOR.CuTensorPlan(desc::Ptr{cuTENSOR.cutensorOperationDescriptor}, pref::Ptr{cuTENSOR.cutensorPlanPreference}; workspacePref::cuTENSOR.cutensorWorksizePreference_t) @ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/types.jl:160
and the error is
ERROR: LoadError: CUTENSORError: operation not supported (yet) (code 15, CUTENSOR_STATUS_NOT_SUPPORTED).
This makes me wonder if your device just doesn’t support this version of cuTENSOR. I can’t find anything online saying if the GeForce GTX 1050 Ti does or doesn’t support this library.

3 Likes

Thanks for the update!

I was starting to suspect this could be the case, but I had no idea how to verify it. My device is very old, so it makes sense. I just wanted to try running the code on my home computer to test how things work before doing something larger scale. I will try again on a different, more modern machine. Now I know that at least the code is not wrong, so thanks again.

Per

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.