In-place Addition


I am trying to determine if there is an in-place version of add or + implemented for MPS. I’ve got a preliminary version of my code working but in order to be performant I need to reduce the number of allocations in my “hot inner loop”. I figured a low-hanging fruit would be to start replacing all my operations with in-place equivalents.

I tried the typical . notation for broadcasting in Julia (some examples using iTensors shown here) but I get the following error (MWE included below): ERROR: ArgumentError: no valid permutation of dimensions

# define data from which to create MPS
n = 5 # number of sites
d = 2 # physical index size
N = d^n
L = 1
x = LinRange(0, L, N)
f = 2 * x
g = 3 * x

# Create MPS
i = siteinds(d, n)
fMPS = MPS(f, i)
gMPS = MPS(g, i)
cacheMPS = deepcopy(fMPS)

# sum and store result in fMPS
fMPS .+= gMPS # throws error 

# Or, store result in cache MPS
@. cacheMPS = fMPS + gMPS # same error

Should I take that to mean in-place MPS-MPS addition is not implemented and I should do it myself? I did take a look around the source for an add! and found 7 methods implemented in opsum_to_mpo_generic.jl but none of the method arguments appeared to be what I was looking for (or I might’ve just missed it).

It’s an interesting question. While in-place operations can be faster for some data structures, it’s not clear to me that in-place addition (however that would be defined) would be faster for MPS.

If you think of an MPS as a 1D array of tensors, then I suppose in-place addition for adding two MPS A and B would mean that instead of storing the new tensors (which it’s important to note could be of a different size and shape from those in A) temporarily in an MPS C, one would start overwriting the tensors of A.

I think to first order this would mostly just save memory (not having to store the other MPS C) and would not obviously be faster.

As far as speeding up your code, I would suggest doing some more profiling at a deeper level, meaning if your profiling does show that MPS addition is the bottleneck of your code, then which part of the MPS addition code inside of ITensor is the bottleneck of that bottleneck? Perhaps the optimization that is needed is something else and would not be solved by an in-place operation.

Hi Miles - thanks for the response.

I understand what you are saying. I think I was probably a little unclear. In the case of in-place MPS addition (e.g., something like A .+= B). I actually would be fine with using something like a cache MPS to store the result - something like C .= A + B.

The problem I am having is that the operation A + B allocates and happens several times in my innermost loop. For example I’ve tested it using @allocated C.= A + B and the result is non-zero. It seemed like a natural first step to remove/minimize allocations to improve my run time (which is quite bad currently, algorithm works but way too many allocations).

Please let me know if that makes sense.

Thanks for explaining a little more.

I think in general it’s going to be hard to make a useful enough in-place style addition for MPS, because one of the key aspects about MPS addition is that one does not know the bond dimensions of C until they are computed by an SVD or similar adaptive procedure. I guess one could always make C large enough that any A+B would fit, but then in general C would usually be too large. At any rate it would end up being a very special purpose thing I think.

One question back to you. I believe you that Julia is reporting a lot of allocations, but one key thing here is that Julia as a language allocates a huge amount in general but many of these allocations are very fast.

So my question is how do you know that allocations are what is driving the slower timings you are seeing? Is it because you are just seeing a large number being reported or is the profiling showing a lot of time is being spent doing allocations? Thanks.

Lastly, there may be other ways to speed up MPS addition, such as different algorithms than the one currently provided by ITensor, or other choices about how to use the algorithms. And are you adding many MPS at once and using a for loop to do it? Then there could be better ways to add many MPS at once. So a lot could depend on the context here.