I have been able to use the GPU backend for DMRG studies, and some preliminary benchmarking shows that it becomes significantly faster than the QN conserving CPU backend as the bond dimension increases (since the GPU backend does not support QN conservation at this moment). However, with high bond dimensions, I quickly run out of memory on a single GPU. What are my options for addressing this limitation? Are there approaches that may reduce the memory footprint? Is there a way to distribute the computation over multiple GPUs?
I haven’t tested it in conjunction with the GPU code, but you could try out the write-to-disk feature we have for DMRG, which stores tensors that are not being immediately used in the calculation (like the environment tensors) on disk to reduce memory usage. You can enable it by setting the keyword argument
write_when_maxdim_exceeds: DMRG · ITensors.jl. I’m curious if that works so please report back and let us know!
We plan to start working on QN conserved calculations on GPU soon, which should help with memory. We would also like to investigate multi-GPU calculations but that is a longer term project.