Interesting that it works for NDTensors.cu
but not cu
.
The only difference between the two is that NDTensors.cu
preserves the element type of the tensors, while cu
converts to single precision. See Memory management · CUDA.jl, NDTensors.cu
is just a wrapper around using Adapt; x_gpu = adapt(CuArray, x_cpu)
.
I’m not sure why there would be an issue using single precision in the code you shared, it would be worth investigating. @kmp5 has been doing some analysis of timings and accuracy/stability of using single precision vs. double precision with DMRG, I think it is still an area of research to some extent which algorithms still have good convergence and also get speedups from using single precision, most of our experience is using double precision on CPU.