cuTENSOR not working with DMRG

PerSehlstedt · August 28, 2025, 8:25am

Hello!

I have been trying the GPU functionality, but I can’t get DMRG to run when using cuTENSOR. Based on what I read in Running on GPUs · ITensors.jl, it should be possible, right? Everything works fine when I only load CUDA in my script, and I can’t find anyone else who has raised a question about cuTENSOR not working with DMRG, so I’m unsure what I could be doing wrong.

Here is a small code example I patched together to recreate the issue:

Code

using ITensors, ITensorMPS
using Adapt: adapt
using CUDA
using cuTENSOR # If I comment this out, everything works fine

struct Trivial end
struct U1 end

function heisenberg_xxx_1d(
    site_count::Integer,
    symmetry::Union{Type{Trivial}, Type{U1}};
    J::Real = -1.0,
    spin::Union{Integer, Rational} = 1,
)
    # https://docs.itensor.org/ITensorMPS/stable/IncludedSiteTypes.html#%22S1%22-Operators
    site_type(::Val{1}) = "S=1"
    site_type(::Val{1//2}) = "S=1/2"

    conserve(::Type{Trivial}) = (;)
    conserve(::Type{U1}) = (conserve_sz = true,)

    sites = siteinds(site_type(Val(spin)), site_count; conserve(symmetry)...)

    os = OpSum()
    lattice = square_lattice(site_count, 1; yperiodic = false)
    for b in lattice
        os -= J / 2, "S+", b.s1, "S-", b.s2
        os -= J / 2, "S-", b.s1, "S+", b.s2
        os -= J, "Sz", b.s1, "Sz", b.s2
    end

    H = MPO(os, sites)

    return H, sites
end

function main()
    @show CUDA.versioninfo()
    if isdefined(Main, :cuTENSOR)
        @show cuTENSOR.has_cutensor()
        @show cuTENSOR._initialized
    end

    site_count = 10
    symmetry = Trivial
    # symmetry = U1
    sweep_count = 4

    H, sites = heisenberg_xxx_1d(site_count, symmetry)
    states = [iseven(n) ? "Dn" : "Up" for n in 1:site_count]
    psi0 = random_mps(sites, states; linkdims = 10)

    # Use adapt to avoid conversion to Float32
    H = adapt(CuArray, H)
    psi0 = adapt(CuArray, psi0)
    @show typeof(storage(H[1]))
    @show typeof(storage(psi0[1]))

    sweeps = Sweeps(sweep_count)
    maxdim!(sweeps, 100)
    cutoff!(sweeps, 1e-12)

    energy, psi = dmrg(H, psi0, sweeps)

    return nothing
end

main()

Here is the output if I run the code:

Output

julia> include(“ITensorMPS-gpu-test.jl”)
CUDA runtime 12.9, artifact installation
CUDA driver 12.6
NVIDIA driver 560.94.0

CUDA libraries:

CUBLAS: 12.9.1
CURAND: 10.3.10
CUFFT: 11.4.1
CUSOLVER: 11.7.5
CUSPARSE: 12.5.10
CUPTI: 2025.2.1 (API 28.0.0)
NVML: 12.0.0+560.35.2

Julia packages:

CUDA: 5.8.2
CUDA_Driver_jll: 0.13.1+0
CUDA_Runtime_jll: 0.17.1+0

Toolchain:

Julia: 1.11.6
LLVM: 16.0.6

1 device:
0: NVIDIA GeForce GTX 1050 Ti (sm_61, 2.751 GiB / 4.000 GiB available)
CUDA.versioninfo() = nothing
cuTENSOR.has_cutensor() = true
cuTENSOR._initialized = Base.RefValue{Bool}(true)
typeof(storage(H[1])) = NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}
typeof(storage(psi0[1])) = NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}
ERROR: LoadError: CUTENSORError: operation not supported (yet) (code 15, CUTENSOR_STATUS_NOT_SUPPORTED)
Stacktrace:
[1] throw_api_error(res::cuTENSOR.cutensorStatus_t)
@ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/libcutensor.jl:14
[2] check
@ ~/.julia/packages/cuTENSOR/bRLMl/src/libcutensor.jl:25 [inlined]
[3] cutensorCreatePlan
@ ~/.julia/packages/GPUToolbox/cZlg7/src/ccalls.jl:33 [inlined]
[4] cuTENSOR.CuTensorPlan(desc::Ptr{cuTENSOR.cutensorOperationDescriptor}, pref::Ptr{cuTENSOR.cutensorPlanPreference}; workspacePref::cuTENSOR.cutensorWorksizePreference_t)
@ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/types.jl:160
[5] CuTensorPlan
@ ~/.julia/packages/cuTENSOR/bRLMl/src/types.jl:149 [inlined]
[6] plan_contraction(A::AbstractArray, Ainds::Vector{…}, opA::cuTENSOR.cutensorOperator_t, B::AbstractArray, Binds::Vector{…}, opB::cuTENSOR.cutensorOperator_t, C::AbstractArray, Cinds::Vector{…}, opC::cuTENSOR.cutensorOperator_t, opOut::cuTENSOR.cutensorOperator_t; jit::cuTENSOR.cutensorJitMode_t, workspace::cuTENSOR.cutensorWorksizePreference_t, algo::cuTENSOR.cutensorAlgo_t, compute_type::Nothing)
@ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:343
[7] plan_contraction
@ ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:301 [inlined]
[8] #contract!#87
@ ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:272 [inlined]
[9] contract!
@ ~/.julia/packages/cuTENSOR/bRLMl/src/operations.jl:259 [inlined]
[10] mul!
@ ~/.julia/packages/cuTENSOR/bRLMl/src/interfaces.jl:57 [inlined]
[11] contract!(exposedR::NDTensors.Expose.Exposed{…}, labelsR::Tuple{…}, exposedT1::NDTensors.Expose.Exposed{…}, labelsT1::Tuple{…}, exposedT2::NDTensors.Expose.Exposed{…}, labelsT2::Tuple{…}, α::Bool, β::Bool)
@ NDTensorscuTENSORExt ~/.julia/packages/NDTensors/Lb78J/ext/NDTensorscuTENSORExt/contract.jl:45
[12] contract!
@ ~/.julia/packages/NDTensors/Lb78J/ext/NDTensorscuTENSORExt/contract.jl:25 [inlined]
[13] _contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:143 [inlined]
[14] _contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:131 [inlined]
[15] contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:219 [inlined]
[16] contract!!
@ ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:188 [inlined]
[17] contract(tensor1::NDTensors.DenseTensor{…}, labelstensor1::Tuple{…}, tensor2::NDTensors.DenseTensor{…}, labelstensor2::Tuple{…}, labelsoutput_tensor::Tuple{…})
@ NDTensors ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:113
[18] contract(::Type{…}, tensor1::NDTensors.DenseTensor{…}, labels_tensor1::Tuple{…}, tensor2::NDTensors.DenseTensor{…}, labels_tensor2::Tuple{…})
@ NDTensors ~/.julia/packages/NDTensors/Lb78J/src/tensoroperations/generic_tensor_operations.jl:91
[19] contract
@ ~/.julia/packages/SimpleTraits/7VJph/src/SimpleTraits.jl:332 [inlined]
[20] _contract(A::NDTensors.DenseTensor{Float64, 3, Tuple{…}, NDTensors.Dense{…}}, B::NDTensors.DenseTensor{Float64, 2, Tuple{…}, NDTensors.Dense{…}})
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:3
[21] _contract(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:9
[22] contract(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:74
[23] *(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VGXV2/src/tensor_operations/tensor_algebra.jl:61
[24] orthogonalize!(M::MPS, j::Int64; maxdim::Nothing, normalize::Nothing)
@ ITensorMPS ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1643
[25] orthogonalize!
@ ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1603 [inlined]
[26] #orthogonalize!#337
@ ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1656 [inlined]
[27] orthogonalize!
@ ~/.julia/packages/ITensorMPS/AQY99/src/abstractmps.jl:1655 [inlined]
[28]
@ ITensorMPS ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:192
[29] dmrg
@ ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:158 [inlined]
[30] dmrg#513
@ ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:28 [inlined]
[31] dmrg
@ ~/.julia/packages/ITensorMPS/AQY99/src/dmrg.jl:21 [inlined]
[32] main()
@ Main ~/ITensorGPU/ITensorMPS-gpu-test.jl:63
[33] top-level scope
@ ~/ITensorGPU/ITensorMPS-gpu-test.jl:68
[34] include(fname::String)
@ Main ./sysimg.jl:38
[35] top-level scope
@ REPL[4]:1
in expression starting at /home/per/ITensorGPU/ITensorMPS-gpu-test.jl:68
Some type information was truncated. Use show(err) to see complete types.

Here is the environment status:

Package versions

(ITensorGPU) pkg> status
Status ~/ITensorGPU/Project.toml
[79e6a3ab] Adapt v4.3.0
⌃ [052768ef] CUDA v5.8.2
[0d1a4710] ITensorMPS v0.3.20
[9136182c] ITensors v0.9.9
[011b41b2] cuTENSOR v2.2.3

If you need more information, just let me know.

miles · August 28, 2025, 2:18pm

Hi Per,
Could you please try changing the using / include statements as follows?

Remove the line using CUDA
Replace it with import CUDA: CuArray
Keep the using cuTENSOR line the same

I’m curious if you still get the error with this change.

PerSehlstedt · August 28, 2025, 3:29pm

Hi Miles,

I just tried your suggestion, and from what I can see, I get the same error as before.

I have also tried some other things before, like simply using cu() instead of adapt(), not caring about the precision conversion just to test, but that also didn’t change anything.

mtfishman · September 2, 2025, 4:00pm

Thanks for the report, it sounds like a bug either on our end or in cuTENSOR, we’ll have to look into that.

@kmp5 any idea what’s going on here?

kmp5 · September 3, 2025, 12:29pm

Hi @PerSehlstedt ,

I was able to run this on my machine and do not see any issue. The only differences between my setup and yours are: I have a A6000 instead of NVIDIA GeForce GTX 1050 Ti and I have updated CUDA drivers and NVIDIA drivers.

CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 570.172.8

CUDA libraries:

* CUBLAS: 12.6.4
* CURAND: 10.3.7
* CUFFT: 11.3.0
* CUSOLVER: 11.7.1
* CUSPARSE: 12.5.4
* CUPTI: 2024.3.2 (API 24.0.0)
* NVML: 12.0.0+570.172.8

Julia packages:

* CUDA: 5.8.2
* CUDA_Driver_jll: 0.13.1+0
* CUDA_Runtime_jll: 0.17.1+0

Toolchain:

* Julia: 1.11.6
* LLVM: 16.0.6

Preferences:

* CUDA_Runtime_jll.version: 12.6

1 device:
0: NVIDIA RTX A6000 (sm_86, 44.952 GiB / 47.988 GiB available)
CUDA.versioninfo() = nothing

From the stacktrace the error looks like it is coming from
cuTENSOR.CuTensorPlan(desc::Ptr{cuTENSOR.cutensorOperationDescriptor}, pref::Ptr{cuTENSOR.cutensorPlanPreference}; workspacePref::cuTENSOR.cutensorWorksizePreference_t) @ cuTENSOR ~/.julia/packages/cuTENSOR/bRLMl/src/types.jl:160
and the error is
ERROR: LoadError: CUTENSORError: operation not supported (yet) (code 15, CUTENSOR_STATUS_NOT_SUPPORTED).
This makes me wonder if your device just doesn’t support this version of cuTENSOR. I can’t find anything online saying if the GeForce GTX 1050 Ti does or doesn’t support this library.

PerSehlstedt · September 3, 2025, 1:48pm

Thanks for the update!

I was starting to suspect this could be the case, but I had no idea how to verify it. My device is very old, so it makes sense. I just wanted to try running the code on my home computer to test how things work before doing something larger scale. I will try again on a different, more modern machine. Now I know that at least the code is not wrong, so thanks again.

Per

system · September 13, 2025, 1:48pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Naive use of CUDA for DMRG leads to MethodError ITensor Julia Questions	4	86	November 8, 2024
ITensorGPU installation and usage ITensor Julia Questions julia	5	270	April 21, 2023
Some troubles operating on the CUDA ITensor Julia Questions	7	213	June 17, 2024
Error with conserved quantum numbers and ITensorGPU ITensor Julia Questions julia , dmrg	3	444	August 12, 2022
About ITensorGPU installation ITensor Julia Questions	13	536	April 20, 2023

cuTENSOR not working with DMRG

Related topics