Why is the autoMPO in Julia so much slower than in C++

We are trying to generate a MPO from OpSum. However, we notice that this process takes much longer time in Julia than in C++. The operator that we are looking at corresponds to a long range Hamiltonian in a spin-1/2 fermion chain. For the same operator string, the MPO(os, sites) function takes more than 11 hours in Julia but only 36 seconds in C++. We are wondering if there is anything we can do to speed up the autoMPO process in Julia.

Thanks for the report. The algorithms are the same in the Julia and C++ versions, so likely we are missing some basic optimizations in the Julia version. Could you share a code that reproduces those timings so we can track this issue?

As a practical workaround, you could try saving the MPO constructed from C++ and load it into the Julia version, if you are interested in using the Julia DMRG/MPS code with those Hamiltonians.

Also @corbett5 has been working on a faster OpSum to MPO conversion function in the Julia version, see GitHub - ITensor/ITensorMPOConstruction.jl: Alternative backends for constructing MPOs from sums of operators., you could try his code.

@mankai_chow even if you aren’t interested in trying out my new package I would be very interested in seeing how the new algorithm performs on your MPO. If you’re willing to share your Julia code (or even just the description of the Hamiltonian) I’d be happy to give it a spin and let you know how it goes.

Dear Fishman and Corbett :

Thanks a lot for the help.

I cannot upload the file here, so please find a minimal example at the GitHub link https://github.com/mankai-chow/gen_mpo_min.git. On the same computer, the time for the toMPO function in the C++ code is 3.4s and the time for the MPO function in the Julia code is 65s. Since there is precompile issue in Julia, I run the function twice in Julia and take the latter time.

The MPO that we are interested in contains long range interaction. The example has maximal link dimension ~500, but in practice for large system sizes the link dimension could be very large. We overload the Electron type so that we can add an extra quantum number.

I have some trouble getting the new package to work for our model. The new function does not allow that a site has more than one operator acting on it in a term, while our model may need to act terms such as “Cdagup”, “Cdagdn” simultaneously on the same site. I am not sure how to implement that.

Best regards,
Zheng Zhou 周正.

Thanks Zheng,
We will be very interested to benchmark your case (Ben, thanks for taking that on). I’m certain we can get the speed in the Julia version to be the same or even better than the C++ version. This will be very helpful for testing that claim.

Regarding the "Cdagup", "Cdagdn" question, you should be able to put them on the same site like this:

os .+= "Cdagup",i,"Cdagdn",i

where i is the site and of course you can put a coefficient in front too which I have just left out. If you try that and it leads to an error or incorrect result, please let us know.

Dear Miles :

Thank you for helping. I have checked that this indeed produce an error. My minimal example is

os = OpSum()
os += "Cdagup", 2, "Cdagdn", 2, "Cup", 3, "Cdn", 1
hmt = MPO_new(os, sites ; cutoff = 1E-13)

and the error message is

ERROR: LoadError: A site has more than one operator acting on it in a term.

Best regards,
Zheng Zhou 周正.

This is a self imposed limitation (to achieve greater performance for the case I cared about) of my MPO construction algorithm. However if you provide an extra key-word argument to MPO_new it will convert the OpSum above into "Cdagup * Cdagdn", 2, "Cup", 3, "Cdn", 1 which is perfectly valid.

@mankai_chow I created a PR here Changes to work with ITensorMPOConstruction by corbett5 · Pull Request #1 · mankai-chow/gen_mpo_min · GitHub

After my modest changes I can construct the MPO in 0.65 seconds on my laptop.

1 Like

Dear Corbett :

I have tried the version you provide, and now the AutoMPO process takes 0.73s for me on the same computer, even faster than the C++ version of the code ! That exactly solves our questions. Thank you very much for your help.

Best regards,
Zheng Zhou 周正.

@mankai_chow what value of nm was the C++ code able to construct in 36 seconds? I am curious because I haven’t compared my algorithm with the ITensor’s C++ implementation before.

I was able to construct a MPO with nm = 100 and tol=4e-8` resulting in an MPO of bound dimension 1788 with the following timings:

calc_int_ps: 6.427996 seconds (119.91 M allocations: 3.640 GiB, 18.18% gc time)
constructing opsum: 1543.176156 seconds (165.25 M allocations: 9.714 TiB, 15.88% gc time)
constructing MPO: 51.784839 seconds (71.73 M allocations: 249.206 GiB, 30.54% gc time)

After I switched to construting a ITensorMPOConstruction.OpIDSum instead of an OpSum I got the following timings:

calc_int_ps: 6.213813 seconds (119.91 M allocations: 3.640 GiB, 18.60% gc time)
Constructing opsum: 6.255719 seconds (119.91 M allocations: 3.645 GiB, 18.68% gc time)
constructing MPO: 46.975967 seconds (55.70 M allocations: 246.379 GiB, 32.90% gc time)

I pushed the changes to the PR. You are the first person other than me to use ITensorsMPOConstruction. I am very glad it’s working for you. Let me know if you have more questions or problems.

Dear Corbett :

The case I was mentionning was for n_m=48 (I think this should be the maximal size that DMRG can deal with). We tested this on the Julia version using OpSum and MPO_new and the C++ version. For the AutoMPO process, the Julia version takes 9.3s excluding compilation, and the C++ version takes 39.1s.

Also thank you for the further improved code using ITensorMPOConstruction. Do we need to install any other packages in order to use that ?

Best regards,
Zheng Zhou 周正.

Thank you for those timings.

OpIDSum is part of ITensorMPOConstruction so you don’t need anything else. In fact the first step of the algorithm is to convert the OpSum to an OpIDSum. But if you’re only going up to 48 sites the OpSum implementation should be plenty fast.