Memory usage in DMRG with Julia 1.x

I came across some problems with memory usage in Julia DMRG and wanted to share my findings (similar to discussions in Large amount of memory used, when dmrg runs on cluster and memory usage in dmrg (julia) )

  • tl;dr add GC.gc() to an observer or use a small --heap-size-hint, or use a version other than 1.9

Experiment

I created a test case of DMRG with no quantum numbers on a 16 site non-interacting chain of electrons (H=\sum_{i\sigma} c^\dagger_{i\sigma} c_{j\sigma}). I slowly increase the bond dimension, running over 15 sweeps

Sweeps
1 cutoff=1.0E-16, maxdim=64, mindim=1, noise=1.0E-03
2 cutoff=1.0E-16, maxdim=64, mindim=1, noise=1.0E-04
3 cutoff=1.0E-16, maxdim=64, mindim=1, noise=1.0E-08
4 cutoff=1.0E-16, maxdim=64, mindim=1, noise=1.0E-08
5 cutoff=1.0E-16, maxdim=128, mindim=1, noise=1.0E-08
6 cutoff=1.0E-16, maxdim=128, mindim=1, noise=1.0E-08
7 cutoff=1.0E-16, maxdim=128, mindim=1, noise=1.0E-08
8 cutoff=1.0E-16, maxdim=128, mindim=1, noise=1.0E-08
9 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08
10 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08
11 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08
12 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08
13 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08
14 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08
15 cutoff=1.0E-16, maxdim=256, mindim=1, noise=1.0E-08

I record the peak memory usage (maxRSS)

Code
# see https://github.com/JuliaLang/julia/blob/master/test/netload/memtest.jl
struct RUsage
    ru_utime_sec::Clong         #  user CPU time used
    ru_utime_usec::Clong        #  user CPU time used
    ru_stime_sec::Clong         #  system CPU time used
    ru_stime_usec::Clong        #  system CPU time used
    ru_maxrss::Clong            #  maximum resident set size
    ru_ixrss::Clong             #  integral shared memory sizeG
    ru_idrss::Clong             #  integral unshared data size
    ru_isrss::Clong             #  integral unshared stack size
    ru_minflt::Clong            #  page reclaims (soft page faults)
    ru_majflt::Clong            #  page faults (hard page faults)
    ru_nswap::Clong             #  swaps
    ru_inblock::Clong           #  block input operations
    ru_oublock::Clong           #  block output operations
    ru_msgsnd::Clong            #  IPC messages sent
    ru_msgrcv::Clong            #  IPC messages received
    ru_nsignals::Clong          #  signals received
    ru_nvcsw::Clong             #  voluntary context switches
    ru_nivcsw::Clong            #  involuntary context switches
end

function get_vmsize()
    ru = Vector{RUsage}(undef, 1)
    ccall(:getrusage, Cint, (Cint, Ptr{Cvoid}), 0, ru)
    return ru[1].ru_maxrss
end

This machine has 188GB so plenty of room, and there’s no slurm involved.

Results

I ran the test above on several Julia versions (1.8,1.9,1.10-beta).
Essentially if you use the latest release version of Julia (1.9.2), the peak memory usage (maxRSS) continues to grow compared to previous and newer versions


1.8.5 uses the least amount of memory, and 1.10-beta uses more (a trade off for faster compiling I believe) while 1.9.2 will slowly explode. In my production code with 1.9.2 the total RAM usage ends up at > 1TB after enough time, so don’t let the small size here fool you.

Possible Solutions

First, and probably the easiest, you can add in an observer that runs garbage collection

mutable struct GCObserver <: AbstractObserver
end

function ITensors.measure!(o::GCObserver; kwargs...)
  bond = kwargs[:bond]
  half_sweep = kwargs[:half_sweep]
  (bond==1 && half_sweep==2) && GC.gc()
end

I only tried the above, but let me know if anyone suggests a different frequency than 1/sweep.

That seems to work well and reduced the memory significantly but I have not benchmarked the total time differences for larger problems.
The other solution is the often recommended --heap-size-hint flag. I find that you really want this number smaller than your total memory so that the garbage collection is run (purple line vs red line), if the environment is very ram sensitive.

8 Likes

Thanks for the in-depth report Ryan! Really helpful.

Did you find a Julia Github issue or discourse post related to this? If not, we should ask about this on Julia discourse and see if we should raise an issue about it on Julia’s Github repository if there isn’t one already.

May be related to:

which should be fixed by:

which will be included in Julia 1.10 (which just had a new beta release Julia v1.10.0-beta2 is now available - Announcements - Julia Programming Language):
https://github.com/JuliaLang/julia/blob/v1.10.0-beta2/NEWS.md#compilerruntime-improvements

It looks like it may not be backported to Julia 1.9 so users should upgrade to Julia 1.10, use Julia 1.6, 1.7, or 1.8, or if using Julia 1.9, manually call the garbage collector as Ryan summarized in his first post.

Thanks for following up on this! Looks like 1.10-beta2 is definitely more sensible with the default GC

1 Like

Since 1.10 has a release candidate out and and a few versions of passed, I re-ran the above benchmark

Looks like 1.10 is going to be great! (Nothing new in NEWS.md about this)

And I managed to plot some timing data, this is only 6 sweeps at bond dimension 256
image

So there is a small (~25%) performance hit for all of this, but this is probably below the “BLAS limit”

1 Like

That’s great, thanks for tracking that @ryanlevy! Seems like they are making a lot of improvements to the GC.

Curious if the new multithreaded garbage collector also helps DMRG: Julia 1.10-rc1 gives a huge speed-up over Julia 1.9 - General Usage - Julia Programming Language, Multi-Threading · The Julia Language.

1 Like

Also will be interesting to be able to test out alternative GC backends once that becomes available, i.e. Introduce an interface for external GC backends by d-netto · Pull Request #51788 · JuliaLang/julia · GitHub.