Efficiency of changing indices

Hello all,

I am trying to write efficient code for large tensor contractions. Is there any difference in this aspect between changing indices with a delta() function or with swapind()?

For example, I might write something like

T = A*delta(i,j)*B*C

or

T = swapind(A,i,j)*B*C

or

swapind!(A,i,j)
T = A*B*C

I am guessing that from the point of view of memory, the last one might be the best, but I am not sure how delta contractions are implemented and optimized. For example, in the first case, where I write T = A*delta(i,j)*B*C, is an intermediate tensor A*delta(i,j) allocated and occupying extra memory?

Cheers

There is basically no advantage to using swapind! over swapind, in either case new tensor data isn’t allocated. In the case of swapind, a new ITensor is constructed, but it still points to the same original data, so the overhead is very small. In future versions, I plan to deprecate the in-place version (swapind!) since I think it is confusing to have multiple ways of doing nearly the same thing, it is a holdover from our port from C++ to Julia.

A * delta(i, j) does allocate new tensor data. I think it is difficult to avoid that without having surprising behavior given the semantics of memory management in Julia, for example:

A = random_itensor(j, k)
d = delta(i, j)
B = A * d
B .*= 2 # Does this modify `A` as well?

A = random_itensor(Float32, j, k)
d = delta(Float64, i, j)
B = A * d
B .*= 2 # What about now?

So, the rule of thumb I use is that if it becomes difficult to reason about whether the output aliases one of the inputs, and that can change based on seemingly benign changes to the input, it is safer to make a copy.

We can probably have a “mode” of contraction where users can specify that they are ok having the output of a tensor contraction potentially be an alias of one of the inputs, i.e.:

@allow_alias begin
  B = A * d
end

That is similar to the allow_alias flag that we have in ITensors.permute: ITensor · ITensors.jl (Miles will no doubt raise the point that if there was some form of reference counting or copy-on-write in Julia, we could handle this automatically to some extent, but that isn’t available right now in the language).

Also, probably you should be using replaceind rather than swapind if you are just trying to replace one index with another, that is closer to what delta is doing.

2 Likes

Finally, I would recommend using GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language to benchmark this yourself and compare the different options (and if there is something unexpected, let us know).

Thank you! So, do I understand correctly that, as convenient as A*delta(i,j) looks, it is always more efficient to use replaceind()?

Yes, but efficiency is relative, and I have rarely found cases where contracting with delta is a significant cost in an overall tensor network algorithm.

One way to think about it is that copying a tensor is O(n) in the number of elements of the tensor n, while contracting two tensors with n elements each or factorizing a tensor with n elements is O(n^(3/2)). So, I would focus on whatever is easiest to code and then benchmark/profile at the end to see what actually dominates the overall runtime. Also, often you can organize your code where you do the index replacements ahead of time as a setup step, so in an iterative algorithm that involves a lot of contractions/factorizations the cost of index replacement becomes even more insignificant.

1 Like

Indeed for my application I am more worried about memory efficiency, I have a loop with a line that might look like A*B*delta(i,j)*C*D*delta(k,l)*E*F*delta(m,n)... and I have quite big garbage collection time right now, now if I understood correctly I think that using delta() is unnecessarily allocating memory that then needs to be cleaned

That’s fair, your application may indeed be one where using replaceinds has a noticeable performance improvement over use delta, I was just making the broader point to you and other forum users that it is best to not overthink the distinction too much. In most cases I would put this under the category of “premature optimization”, I would recommend focusing on writing your code in a simple and readable way and you can always benchmark/profile your code after you write it to see if there are any unexpected performance bottlenecks.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.