@Bob it’s a very good question and I’m surprised this is one of the first questions we have received about this!
In general, we already use function barriers effectively to ensure tensor operations like tensor contractions, additions/permutations, and decompositions, which constitute the vast majority of most tensor network algorithms, are fast. Basically, our design philosophy is that the NDTensors.jl library, where all of the tensor operations are ultimately implemented, is designed to be type stable as much as possible. ITensors.jl
code is designed to be more dynamic in terms of the types so it is simpler and more convenient to write high level tensor network code, and therefore by design is not type stable. However, it is designed to be a light wrapper around calls to type-stable NDTensors.jl
operations, which creates a “function barrier” so that there is only a layer or two of dynamic dispatch and then Julia can compile fast code for the actual tensor operations (using just-in-time (JIT) compilation), or call out to fast compiled functions in libraries like BLAS or LAPACK.
So basically, the dynamic dispatch you see is at a higher level in the code (at the level ITensors.jl
), and you can generally think of dynamic dispatch as having an overhead for each tensor operation that is on the order of a microsecond. This is a helpful benchmark to show this:
using BenchmarkTools
using ITensors
function main(d)
i, j, k = Index.((d, d, d))
A = randomITensor(i, j)
B = randomITensor(j, k)
t_itensor = @belapsed $A * $B
t_matrix = @belapsed $(Matrix(A, i, j)) * $(Matrix(B, j, k))
return (; t_itensor, t_matrix)
end
If I run this with a few matrix dimensions I see the following:
julia> main(10)
(t_itensor = 1.3916e-6, t_matrix = 3.6658653846153845e-7)
julia> main(20)
(t_itensor = 2.435111111111111e-6, t_matrix = 1.3583e-6)
julia> main(50)
(t_itensor = 1.6083e-5, t_matrix = 1.4875e-5)
julia> main(100)
(t_itensor = 3.3167e-5, t_matrix = 3.2041e-5)
You can see that tensor operations like A * B
each have an overhead of on the order of a microsecond, which quickly becomes unnoticeable for reasonably large dimensions (i.e. the time starts to match the time of just doing the same operation as a matrix multiplication, which doesn’t have any dynamic dispatch). Most importantly, the overhead is constant in the tensor dimensions, so the larger the dimensions in your calculation, the less you will notice the overhead.
This overhead actually has a few sources. One is dynamic dispatch, as we discussed. Specifically, we unwrap the underlying tensor storage of the ITensors which is not exposed to the compiler since the ITensor
type is not parametrized, so Julia dispatches at runtime to the correct function calls in NDTensors.jl
like NDTensors.contract
based on the type of the storage, like Dense
, Diag
, BlockSparse
, etc. The other source of overhead in the tensor contraction code is analyzing the indices of the ITensors involved in the contraction to dynamically determine which tensors need to be permuted in what way to perform the contraction and determine the indices of the output tensors of the operations. The overhead of analyzing the tensor indices can often actually be larger than the dynamic dispatch overhead, especially for expressions with high order tensors, but still won’t be noticeable except for tensor contractions that involve tensors with relatively small index dimensions.
In practice, dynamic dispatch doesn’t fundamentally cause problems, other than a constant overhead per operation. If you see those overheads are an issue in practice for your application, please let us know! There are a few cases where I have seen that the overhead of analyzing the indices was noticeable for the entire computation time, for example an application contracting an MPS with a very small bond dimension many times to perform a gradient descent, but it is pretty rare to come across that and most applications in practice involve tensor dimensions where any dynamic overhead is not noticeable (and there are ways we could try to optimize away some of the overhead internally, if needed).
If needed, you can always drop down to the NDTensors.jl
level and write code in terms of NDTensors
operations and types like NDTensors.contract
and NDTensors.Tensor
. This will generally be type stable, and therefore have less dynamic overhead, at the expense of a less convenient interface. I’ve never needed to do that for writing high level tensor network code, but if you want to define a new low-level tensor operation (like operations for slicing tensor elements that we don’t already have available at a high level), that would be the way to do it. Most notably, element-wise operations on ITensors that involve explicitly looping over individual tensor elements are generally too slow, but they are fast for NDTensors.Tensor
objects.
Other Julia packages like DataFrames.jl take a similar design approach where they trade off type stability in the DataFrame
type for a more convenient and dynamic interface. You can see a discussion here: Why DataFrame is not type stable and when it matters | Blog by Bogumił Kamiński about that choice.