Large memory used when performing tdvp on cluster

Wang · October 1, 2023, 3:03pm

Hello!
I am trying to use GSE-TDVP (global subspace expansion TDVP ) to do a real time-evolution.

Doing a subspace expansion before every time step of 1-site TDVP until the maximum bond dimension grows to the upper bound.

My Halmitonian is just an one-dimensional Bose-Hubbard model without long-range interactions, which starting from a product state.

When the length of the chain is 24, I set the upper bound to be 600 and every time step of 1-site TDVP, it costs about 28GB of memory after the reaching of upper bound.

After sweep 1: maxlinkdim=602 maxerr=0.00E+00 current_time=0.0 - 0.05im time=24.817
 24.818610 seconds (2.61 M allocations: 28.250 GiB, 1.06% gc time)
tsofar = 100.0
maxlinkdim(psi) = 602
Size of psi: 34220216 bytes

But disaster occurs when I grow the size of the chain to 36 and the upper bound of bond dimension to 800, TDVP costs about 108GB of memory which results in an out-of-memory error and jobs on cluster get killed. ( near 256 GB of RAM of my cluster)

After sweep 1: maxlinkdim=819 maxerr=0.00E+00 current_time=0.0 - 0.05im time=122.393
122.396214 seconds (4.70 M allocations: 120.135 GiB, 2.65% gc time)
Free memory before GC is 44.067 GiB
Free memory after GC is 44.088 GiB
tsofar = 5.1000000000000005
maxlinkdim(psi) = 819
Size of psi: 140848968 bytes

I have used GC.gc() to free the memory after every time step of 1-site TDVP as mentioned in Large memory issue in time evolution using ITensors.jl with apply function - ITensor Julia Questions - ITensor Discourse, but it doesn’t play much role.

My julia version is 1.8.5.

So, is there any way to handle it? It bothers me for a long time.
Any help would be grateful!

 if maxlinkdim(psi) < max
            phis = Vector{MPS}(undef, kdim)
            for j=1:kdim
                prev = j == 1 ? psi : phis[j - 1]
                # phis[j] = prev-im*tstep*(apply(HAL, prev;cutoff=cutoff1,method=met))
                phis[j] = apply(gates,prev;cutoff=cutoff1,method=met)
                normalize!(phis[j])
            end
            psi=ITensorTDVP.extend(psi,phis;cutoff=cutoff2)
        end
        phis=0

        Nsite=1
        @time psi = tdvp(
            HAL,
            -im*tstep,
            psi;
            nsweeps=1,
            maxdim=max,
            cutoff=0,
            nsite=Nsite,
            outputlevel=1,
            normalize=true,
            # solver_backend="applyexp"
        )
GC.gc(true)

Wang · October 1, 2023, 4:12pm

I thought it was may be caused by the competition of different computation jobs in the same node on cluster which share a same RAM. So, I want to ask is there any way to control or constraint the resource TDVP function use in Julia or ITensors?

miles · October 5, 2023, 1:42am

If your job is sharing resources with another one on the same node, then that can definitely constrain the amount of RAM available and lead to crashes. You should try to set up your job request to request an entire node for your job, if possible.

Regarding ways to make Julia ITensor codes (hopefully) use less memory, we have added a small guide recently here:
https://itensor.github.io/ITensors.jl/dev/faq/HPC.html

Topic		Replies	Views
Extreme large RAM usage on tdvp ITensor Julia Questions	4	90	September 3, 2024
Global subspace expansion algorithm for TDVP ITensor Julia Questions	8	977	May 16, 2024
Maximal bond dimension in basis expansion ITensor Julia Questions	3	63	February 10, 2025
Large memory issue in time evolution using ITensors.jl with apply function ITensor Julia Questions julia , mps	6	762	August 20, 2023
out of memory problem ITensor Julia Questions	3	682	February 15, 2023

Large memory used when performing tdvp on cluster

Related topics