Hi there,
I have a question about the performance of iTensor when it comes to multithreading.
Basically, I have a number of different configurations (in this case let’s assume they are based on the same physical system but each with different amount of site disorder). And I’m running a DMRG calculation, solving for the lowest ~20 excited states. Now, my naive approach in speeding up the calculation, was to run the script with
julia -t NUM mydmrg.jl
Where NUM is the number of threads that I have. And then with in the script, I simply have a line
Threads.@threads for case=1:numcases
run_dmrg(args, case)
end
Where each instance of run_dmrg() is a standalone dmrg calculation for a particular configuration. What I noticed however, was that this is much slower than if I’m just running the cases sequentially. That is, the script with numcases=1, runs for about an hour, where a calculation of numcases=40 takes about 15 hours.
I assume these are separate instances of the function and there are no internal communications between them whatsoever, so iTensor must have some multithreading built into the evaluation of dmrg? On the other hand, for the singular calculation (numcases=1), memory usage is only at 2.7% so there seems to be something else going on.