question about multi-threading in ITensorParallel

Hello,

I am trying to understand how ITensorParallel works. In particular, I am looking at the example of using distributed.jl (ITensorParallel.jl/examples/01_parallel_mpo_sum_2d_hubbard_conserve_momentum.jl at main · ITensor/ITensorParallel.jl · GitHub).

  1. I noticed that on line 14 the number of threads for BLAS is set to one:
ITensors.BLAS.set_num_threads(1)

May I ask the reason for this setting? Is it because multithreading in BLAS interferes with the parallelization over sum of MPO?

  1. I also didn’t fully understand how threadedsum works in this scenario:
main(; Nx=8, Ny=4, nsweeps=10, maxdim=1000, Sum=ThreadedSum);

given that distributed is used and multiple processes are created at the beginning. Is it that in this case only one process is actually running and threaded over sum of MPO?

  1. Another general question is, suppose I have many cores (say 20) and I am doing DMRG with quantum numbers at relatively large bond dimension~o(10^4) and the MPO bond dimension is also large (~100), what is the optimal multi-threading/parallelization currently? I have tested a few cases just with different multi-threading in ITensor (Multithreading · ITensors.jl), and the optimal one seems to be using MKL with both Strided and Block Sparse off (maybe because block sparse only threaded over contraction but not SVD?) But I wonder if anyone has experience with using ITensorParallel?

Thank you!

  1. Yes, Julia threading and BLAS threading generally don’t work well together. It may be possible for them to work together, see: Julia Threads + BLAS Threads · ThreadPinning.jl, but we haven’t investigated it much and if you use them together naively you can see big slowdowns.
  2. Yes, in that case it will just perform threading over sums of MPOs on the first process.
  3. We don’t know, that would be considered a research question and it will depend a lot on the details of the system (what symmetries you have, what kinds of Hamiltonian terms you have, etc.). See [2103.09976] Low communication high performance ab initio density matrix renormalization group algorithms for a reference on advanced DMRG parallelization techniques.

ITensorParallel.jl should be considered as more of an experimental package not meant for general use. You’ll have to experiment with it yourself to see if it helps in your own use case. If you do try it out, it is helpful to get feedback about successes and failures, which you can share on the discourse forum or on the Github issues: Issues · ITensor/ITensorParallel.jl · GitHub if you think you have found an issue with the package.

1 Like

Thanks Matt, that’s really helpful!