broadcast (sqrt) operation on GPU

ste · June 3, 2024, 1:17pm

hi,
I was playing around with the new GPU implementations, loving the simplicity of simply having to cu() stuff to get it on gpu!

I stumbled upon a little problem, at some point in my code I want to take the sqrt of a diagonal ITensor (say some singular values) - since sqrt(::ITensor) (or ^) is not implemented I would naively just broadcast it, say sqrt.(iten), but when I try to do that on GPU I’m getting the dreaded

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

I guess I could disable it, but I wonder if there’s a better/proper way of dealing with this kind of situation? hope I didn’t miss any basic documentation on this …

thanks!

mtfishman · June 3, 2024, 1:54pm

Thanks for the report. Interestingly this works for Dense but not Diag storage:

julia> using ITensors: Index, ITensor

julia> using Metal: mtl

julia> function main()
         device = mtl
         d = 2
         a = abs.(randn(d, d))
         i, j = Index.((d, d))
         t = device(ITensor(a, (i, j)))
         return sqrt.(t)
       end
main (generic function with 1 method)

julia> main()
ITensor ord=2 (dim=2|id=983) (dim=2|id=859)
NDTensors.Dense{Float32, Metal.MtlVector{Float32, Metal.MTL.Private}}

while:

julia> using ITensors: Index, diag_itensor

julia> using Metal: mtl

julia> function main()
         device = mtl
         d = 2
         a = abs.(randn(d))
         i = Index(d)
         t = device(diag_itensor(a, (i, j)))
         return sqrt.(t)
       end
main (generic function with 1 method)

julia> main()
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ ~/.julia/packages/GPUArrays/OqrUV/src/host/indexing.jl:48 [inlined]
  [6] getindex
    @ ~/.julia/packages/NDTensors/WB3bb/src/tensorstorage/tensorstorage.jl:26 [inlined]
  [7] getdiagindex
    @ ~/.julia/packages/NDTensors/WB3bb/src/diag/diagtensor.jl:57 [inlined]
  [8] permutedims!(R::NDTensors.DiagTensor{…}, T::NDTensors.DiagTensor{…}, perm::Tuple{…}, f::ITensors.var"#283#284"{…})
    @ NDTensors ~/.julia/packages/NDTensors/WB3bb/src/diag/diagtensor.jl:149
  [9] permutedims!!(R::NDTensors.DiagTensor{…}, T::NDTensors.DiagTensor{…}, perm::Tuple{…}, f::Function)
    @ NDTensors ~/.julia/packages/NDTensors/WB3bb/src/diag/diagtensor.jl:178
 [10] _map!!(f::Function, R::NDTensors.DiagTensor{…}, T1::NDTensors.DiagTensor{…}, T2::NDTensors.DiagTensor{…})
    @ ITensors ~/.julia/packages/ITensors/rH5UO/src/itensor.jl:1903
 [11] map!(f::Function, R::ITensor, T1::ITensor, T2::ITensor)
    @ ITensors ~/.julia/packages/ITensors/rH5UO/src/itensor.jl:1908
 [12] copyto!
    @ ~/.julia/packages/ITensors/rH5UO/src/broadcast.jl:489 [inlined]
 [13] copy
    @ ./broadcast.jl:928 [inlined]
 [14] materialize
    @ ./broadcast.jl:903 [inlined]
 [15] main()
    @ Main ./REPL[36]:7
 [16] top-level scope
    @ REPL[37]:1
 [17] top-level scope
    @ ~/.julia/packages/Metal/q9oGt/src/initialization.jl:57
Some type information was truncated. Use `show(err)` to see complete types.

It looks like it will be an easy fix, the culprit is this function: ITensors.jl/NDTensors/src/diag/diagtensor.jl at v0.6.11 · ITensor/ITensors.jl · GitHub which could be rewritten to avoid using a for loop over the diagonal elements which is where the scalar indexing issue is coming from.

@kmp5

mtfishman · June 3, 2024, 1:55pm

I made an issue about it here: [NDTensors] [BUG] Scalar indexing issue when broadcasting tensors with `Diag` storage · Issue #1482 · ITensor/ITensors.jl · GitHub.

ste · June 3, 2024, 3:26pm

thanks Matt! I didn’t even think of trying with a dense tensor, guess for now I can work around it by momentarily dense()-ing my tensor while waiting for a fix

mtfishman · June 13, 2024, 4:16pm

This is being worked on here: [NDTensors] Fix scalar indexing issue for Diag broadcast on GPU by kmp5VT · Pull Request #1497 · ITensor/ITensors.jl · GitHub.

mtfishman · June 14, 2024, 1:57pm

Also a note for you and anyone else who comes across issues where an operation doesn’t work on GPU but works on CPU: you can always convert your ITensor(s) to CPU with NDTensors.cpu(a), perform your operation f(...) on CPU, and the convert back to GPU. Unless the operation is a bottleneck of the calculation that is a perfectly reasonable strategy, and we do that internally when certain operations like matrix factorizations aren’t implemented on a certain GPU backend. Transferring back and forth from CPU to GPU can be slow if you are transferring a lot of data but often it won’t be the bottleneck in tensor network calculations if your algorithm is dominated by tensor contractions that are performed on GPU.

mtfishman · June 21, 2024, 12:20am

This should be fixed in NDTensors v0.3.33 which is being registered now. Please let us know if you have any more issues.

ste · June 21, 2024, 12:55pm

awesome, thanks again for the great work guys!

ste · June 23, 2024, 2:00pm

while playing some more with diagonal and GPU tensors I stumbled upon a few more odd things, I don’t know if should open a new thread or add them here, since they seem to belong somehow to the same family of errors. Or if you prefer me to file bug reports for this kind of stuff let me know!

I’m attaching below the full MWE with a list of examples that don’t work, but long story short I’m getting “Scalar indexing is disallowed” errors when I calculate the sum() of a cuda tensor, and also when computing a product of the form Aij*Dji, where D is a diagonal cuda tensor (whereas it works if D is not diagonal)

Probably unrelated, I saw there is a tr() function for ITensors, but that doesn’t work for me (neither for cpu nor gpu tensors). Since I couldn’t find any doc for it, I imagine it’s not recommended for use.


using ITensors, CUDA

gpu = NDTensors.cu

ij = (Index(3), Index(3))
jk = (ij[2], Index(3))
aij= random_itensor(ij)
bjk = random_itensor(jk)
cij = random_itensor(ij)

djk = diag_itensor([1,2,3.0+im],jk) 

cua = gpu(aij)
cub = gpu(bjk)
cuc = gpu(cij)
cud = gpu(djk)

cua * cub # ok, rank2 
cua * cuc # ok, rank0 

cua * cud # ok, rank2 

cub * cud  #ERROR: Scalar indexing is disallowed.

sum(aij) # ok, scalar
sum(djk) # ok, scalar

sum(cua) #ERROR: Scalar indexing is disallowed.

sum(cud) #ERROR: Scalar indexing is disallowed.

tr(aij) # ERROR: MethodError: no method matching checkdims_perm(::NDTensors.DenseTensor{Float64, 3, Tuple{…}, NDTensors.Dense{…}}, ::NDTensors.DenseTensor{Float64, 2, Tuple{…}, NDTensors.Dense{…}}, ::Tuple{Int64, Int64, Int64})

tr(djk) # ERROR: MethodError: no method matching checkdims_perm(::NDTensors.DenseTensor{…}, ::NDTensors.DenseTensor{…}, ::Tuple{…})

tr(cua) # ERROR: MethodError: no method matching checkdims_perm(::NDTensors.DenseTensor{Float64, 3, Tuple{…}, NDTensors.Dense{…}}, ::NDTensors.DenseTensor{Float64, 2, Tuple{…}, NDTensors.Dense{…}}, ::Tuple{Int64, Int64, Int64})

tr(cud) #ERROR: MethodError: no method matching checkdims_perm(::NDTensors.DenseTensor{…}, ::NDTensors.DenseTensor{…}, ::Tuple{…})

thanks!

mtfishman · June 23, 2024, 2:26pm

Thanks for reporting all of these issues, it’s very helpful for us. Can you make a new post for these new issues so we can keep track of them more easily?

system · July 3, 2024, 2:26pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPU issues: scalar indexing is disallowed on sum(::ITensor) and traces of products involving diagonal itensors ITensor Julia Questions	4	65	July 4, 2024
diag_itensor constructor on GPU arrays ITensor Julia Questions	4	42	June 23, 2024
scalar() on a zero-rank tensor on GPU ITensor Julia Questions	2	29	July 9, 2024
Problems when using GPU with ITensor ITensor Julia Questions julia	0	581	May 27, 2022
[ANN] Initial release of new ITensor GPU backends General Discussion julia , announcement	12	1633	June 4, 2024

broadcast (sqrt) operation on GPU

Related topics