EDIT: For the latest information about GPU support in the Julia version of ITensor, take a look at the documentation page Running on GPUs Β· ITensors.jl.
We are happy to announce an initial release of our new GPU backends for ITensors.jl. A lot of credit goes to Karl Pierce (@kmp5) for his recent work making our tensor operations code in NDTensors.jl more generic to the data backend so that our GPU code is more integrated into the rest of the library, and @ryanlevy for helping out with initial testing and benchmarks. The new code design in NDTensors.jl will make it much easier to add support for new GPU backends as well as other backends like tensors with distributed data or non-abelian symmetries.
The new GPU backends make use of the new package extension system in Julia, and are a rewrite of, and successor to, the ITensorGPU.jl package spearheaded by Katie Hyatt.
Instead of being a separate package, you can use the new GPU backends simply by loading the desired Julia GPU package as well as ITensors.jl and start running ITensor on GPUs. For example, you could load CUDA.jl to perform tensor operations on NVIDIA GPUs or Metal.jl to perform tensor operations on Apple GPUs:
using ITensors
i, j, k = Index.((2, 2, 2))
A = randomITensor(i, j)
B = randomITensor(j, k)
# Perform tensor operations on CPU
A * B
###########################################
using CUDA # This will trigger the loading of `NDTensorsCUDAExt` in the background
# Move tensors to NVIDIA GPU
Agpu = cu(A)
Bgpu = cu(B)
# Perform tensor operations on NVIDIA GPU
Agpu * Bgpu
###########################################
using Metal # This will trigger the loading of `NDTensorsMetalExt` in the background
# Move tensors to Apple GPU
Agpu = mtl(A)
Bgpu = mtl(B)
# Perform tensor operations on Apple GPU
Agpu * Bgpu
The status of the GPU backends are:
- CUDA.jl (NVIDIA GPUs): Best supported, CUDA generally has the most functionality such as a full suite of matrix factorizations. Currently ITensor operations are just using cuBLAS/cuLAPACK but we plan to add optional support for using cuTENSOR to perform faster tensor contractions.
- Metal.jl (Apple GPUs) (UPDATED): Basic operations like ITensor contraction and addition work on GPU. Matrix factorizations like SVD, QR, and eigendecompositions arenβt implemented by Metal.jl/Apple yet, so they are currently performed on CPU in ITensor operations.
- AMDGPU.jl (AMD GPUs) (UPDATED): Supported at a similar level as
Metal.jl
, contraction and addition are performed on GPU while matrix factorizations are performed on CPU. (Support was added in [NDTensors] Add `AMDGPU.jl` (ROCm) based extension for NDTensors by wbernoudy Β· Pull Request #1325 Β· ITensor/ITensors.jl Β· GitHub). - oneAPI.jl (Intel GPUs): Not supported yet but basic support should be easy to add, up to limitations in what operations are available through oneAPI.jl.
See the JuliaGPU organization website for a list of GPU packages being developed in Julia.
Dense ITensor operations are best supported at the moment, though block sparse operations on GPU mostly work at this point for supported backends and we will continue developing and testing that in the near future.
The package extension-based GPU backends are available in the latest ITensors.jl release (as of v0.3.45). Keep in mind it is still fairly new so there may be some rough edges, but full DMRG runs using the CUDA backend work and show good speedups over CPU so we are pretty confident that it is ready for fairly sophisticated work. I wanted to announce this now since the ITensorGPU.jl backend was in a bit of a state of purgatory since it is only compatible with older versions of NDTensors.jl/ITensors.jl (because of the big changes we have been making to NDTensors.jl in preparation for this work), so we wanted to make these new GPU backend systems available as soon as possible to allow users to use GPUs with the latest versions of ITensor. In general, we strongly advise using the latest versions of the packages since we are regularly making improvements and fixing bugs.
A big part of this effort was making the ITensor tensor operation backend code, NDTensors.jl, more generic and better organized so that it mostly βjust worksβ when new data types are provided as the backend of an ITensor, given a relatively minimal set of definitions. For example, see the code for the NDTensorsCUDAExt, which is all that is needed to make the CUDA backend work with most ITensor operations. This also relies heavily on the wonderful work done in packages like CUDA.jl to make arrays on GPU work nearly as seamlessly as those on CPU. We hope to formalize a minimal interface for implementing a new data backend for ITensors once the dust settles on our rewrite of NDTensors.jl and we have more backends implemented.
Some things we plan to do in the future related to the GPU backends are:
- better support for more GPU backends,
- fine tuning the performance on GPU as we test out the backends more,
- improving support for block sparse operations on GPU, and
- adding optional support for using cuTENSOR for faster tensor contractions and permutations on NVIDIA GPUs.
Some bigger changes we plan to make are a full revamp of the ITensor storage system, where we plan to get rid of the storage types EmptyStorage
, Dense
, BlockSparse
, Diag
, etc. in favor of having ITensors more directly wrap arbitrary Julia AbstractArray
types. Some of the current storage types will become normal Julia array types living outside of NDTensors/ITensors, such as BlockSparseArray
and DiagonalArray
. These changes will basically just be in the background and are not expected to break any high level ITensor code (advanced users reaching into internals and using NDTensors directly may be effected, but hopefully for the better in the long run!). This will help make the NDTensors.jl library much simpler and more modular, and make it easier to implement new storage backends for advanced use cases, such as supporting tensors with non-abelian symmetries, which we are currently planning and designing, as well as distributed tensor data, isometric tensor types, and many other ideas we have. It will also help make it easier to develop NDTensors.jl, fix bugs that come up, and make performance improvements.
Please try it out and let us know if you come across any issues (either bugs or performance issues), either here or on Github. We still need to document these features as well, though for now that may take a back seat to rounding out the features like finalizing support for block sparse operations on GPU.