Hi,
I’m running a basic TDVP code using ITensor which I believe is showing some extreme RAM usage. For example, a calculation with 140 sites, with maxdim set at 1024, has the following information from the system monitors:
Info: TDVP time : 2.0
After sweep 1: maxlinkdim=330 maxerr=9.99E-13 current_time=0.0 - 0.125im time=46.038
After sweep 2: maxlinkdim=338 maxerr=9.99E-13 current_time=0.0 - 0.25im time=46.171
92.210864 seconds (414.42 M allocations: 264.396 GiB, 6.78% gc time)
Info: TDVP time : 2.25
After sweep 1: maxlinkdim=348 maxerr=1.00E-12 current_time=0.0 - 0.125im time=47.948
After sweep 2: maxlinkdim=357 maxerr=9.99E-13 current_time=0.0 - 0.25im time=49.685
97.634782 seconds (419.41 M allocations: 287.083 GiB, 5.80% gc time)
I would think from some very crude estimate, (the tensors are block sparse fermions), in the worst situation when the tensors are completely saturated, I would have
16 \ (\text{Complex F64}) \times 1024^2\ \text{(bond dim)} \times 2 \ \text{(physics dim)} \times 140 \sim 5\text{GB}
of memory usage, which should be no where near the memory allocation indicated here.
Or if we are concerned about the eigendecomp cost, etc, the effective local Hamiltonian should also have dim \sim (1024 \times 2)^2, which shouldn’t be expensive at all for something like ARPACK, or other solvers.
The above was on my personal computer with a single thread, and on the cluster I’m using the same code but with multithreading (-t N option), however I don’t think that changes the nature of the problem either since the threads should share memory (The jobs would mostly crash because of outofmemory error, the exact same simulation parameters when given 16G memory).
Is there something I’m missing here? I’ve been following the HPC guide on not re-allocating before GC kicks in, etc.