Why is my eigsolve call so slow?

Hello all,

I am trying to rewrite my own DMRG code, but it is currently orders of magnitude slower than the original ITensor DMRG function. I narrowed down the problem to the eigsolve call, which I wrote as:

@time vals, vecs = eigsolve(
v -> noprime(M*v),
phi,
1,
:SR;
ishermitian = ishermitian,
tol=eigsolve_tol,
krylovdim=eigsolve_krylovdim,
maxiter=eigsolve_maxiter,
);

The time this takes grows with the bond dimension very rapidly. As an example, with maxdim=20 the time needed to optimize a single pair of tensor is:

site 5, 6
1.086098 seconds (1.55 k allocations: 1.833 GiB, 2.18% gc time, 0.00% compilation time)

I am using the same parameters as ITensor does, the only (?) difference from the original ITensor implementation is the fact that I am using an anonymous function instead of a projMPO, now I do not expect to get the same efficiency, but why would my implementation be so much slower?

The difference is most likely due to your point about not using the ProjMPO, but using the anonymous function instead. Inside of ProjMPO, we do not first contract the MPO, wrapped with the MPS basis (i.e. “projected”), into a matrix M then multiply by v as a second step, but instead multiply “pieces” of the projected MPO (left environment, local MPO tensors, right environment) one-by-one onto the vector v until all of them have been contracted and then the multiplication is complete.

Doing it this way results in a scaling of \chi^3 where \chi is the MPS bond dimension or rank, whereas the way you are doing it will have a scaling of \chi^4.

Hi Miles,

First of all, thank you for the clear and quick reply. So is something so “abstract” as projMPO absolutely necessary to reach the \chi^3 scaling? Could I redefine my function to do the contractions similarly to what you propose?

Update:
Using something like:

@time vals, vecs = eigsolve(
v -> noprime(v*localH*localR*localL),
phi,
1,
:SR;
ishermitian = ishermitian,
tol=eigsolve_tol,
krylovdim=eigsolve_krylovdim,
maxiter=eigsolve_maxiter,
);

With properly defined local tensors R,L,H brought down the time to a manageable scaling, still not as good as ITensor but now:

site 5, 6
0.120118 seconds (1.83 k allocations: 304.194 MiB, 0.00% compilation time)

Thanks for the help!

I definitely commend you for trying things out down to this level of detail and writing your own DMRG. It’s a good way to learn.

About projected MPO, it’s not really necessary to use that level of abstraction to get the best performance. We just coded it that way to enable our DMRG code to work on various cases such as a single MPO, a “lazy” sum of MPOs, a sum of an MPO and a projector onto another MPS, etc. So by introducing the abstraction of some kind of “generalized operator” type, of which ProjMPO is one case, it’s easier for us to extend the set of features without having to change the internals of the dmrg function each time. Hope that clarifies things –