Low GPU utilization (<25%) when accelerating DMRG with GPU

I’m currently using a GPU to accelerate DMRG computations. My MPS is not large, so the computational scale and memory usage are also small (less than 3GB), which are all within my expectations. However, the GPU utilization is quite low (less than 25%).

I’m not familiar with GPU computing in Julia, but I think low GPU utilization might result from two possibilities:

  1. The algorithm itself doesn’t fully leverage the GPU’s capabilities;
  2. An inherent issue caused by the small computational scale.

If it’s the latter, I’m considering trying to compute other ITensor tasks in parallel to utilize the idle GPU resources. But I have no knowledge about parallelization within a single GPU. Of course, I’m not asking about the specific implementation details of parallelization. Since you’re familiar with ITensor’s GPU acceleration, I’d just like to ask about the feasibility of this approach.

What are the bond dimensions of the MPO and MPS in your DMRG calculation? Additionally, are you conserving QNs? If so, see this post: TEBD time evolution with CUDA backend

The maximal bond dimension is 7 for the MPO, and is about 20 for the MPS most of the time. I am not using any conserving QNs.

Those bond dimensions are very small, I’m not surprised that you don’t see good GPU utilization in that case. In general we see that bigger tensors get better speedups on GPU (since there are more elements of the tensors for the GPU to parallelize over). Regarding parallelization, for such small bond dimensions I think the only relevant parallelization would be real space parallelization ([1301.3494] Real-Space Parallel Density Matrix Renormalization Group) but that would only be effective for large system sizes and even then I think it would be limited in how much speedup you would see, i.e. you can only speed it up up to the number of real space partitions you split up your system into.

Thank you for your reply!