out of memory problem

zz123 · January 24, 2023, 4:27am

Dear there,
I have a question about out of memory. I’m running an extended Hubbard model on a 4-leg ladder (Nx=20). And my Hamiltonian is H= -t \sum_{\langle rr',\sigma\rangle} c^\dag_{r\sigma}c_{r'\sigma} + h.c. + U \sum_r n_{r,\uparrow}n_{r,\downarrow} + V\sum_{\langle rr',\sigma\sigma'\rangle} n_{r,\sigma}n_{r',\sigma'} . I started with bonddim=1024 for 150 sweeps, then I’m trying to increase bonddim to 4000~5000 for more sweeps. And I already used write to disk function as
energy, psi = dmrg(H, psi0, sweeps; write_when_maxdim_exceeds=1000). But I still receive out-of-memory error. I wonder is there any other way I can solve this problem?

Thank you so much and happy new year!

miles · January 24, 2023, 4:49am

Thanks for the question. Depending on details, there may be some things you can do but also there might not (within the standard approaches available). It’s good that you’re already using the write_when_maxdim_exceeds feature. Do you see that it helps?

The main questions I’d have before investigating further are:

how much total RAM does your computer have?
is it on a cluster machine, and if so is your process the only one using the machine (and only one such process)?
how much RAM is the DMRG calculation using toward the end before it fails? on linux you can use the command “free -g” to check this (the -g means “in gigabytes”).

Unfortunately, for large enough maxdims most state-of-art DMRG codes ultimately just require using a computer with a lot of RAM as the main “solution” to high memory usage. (There are some more sophisticated efforts being researched involving distributing tensors across different machines but those are not widely available or in a general-purpose form yet.)

zz123 · January 24, 2023, 5:19am

Thanks for your reply!
write_when_maxdim_exceeds does help! But it seems it’s not enough. I mean before I use this feature, I can’t calculate even one sweep for bonddim=5000, then with it, I can calculate 3-5 sweeps before out-of-memory but my system didn’t converge yet.

And I’m running on sherlock cluster (so there’re tons of jobs running at same time), I request 128G RAM for my job, and the error message I receive is like:
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=7510091.0. Some of your processes may have been killed by the cgroup out-of-memory handler.

so my jobs were killed once it costs more than 128G I guess (maybe I sholud try to request more RAM for the job). And I’m not very sure how to know how much RAM DMRG used at each sweep on cluster. I’m trying to use julia profile to extract this information and will let you know!

Thanks!

miles · February 15, 2023, 2:26pm

Thanks for the helpful info. In response to your question, we’ve come up with a nice way to directly print out the amount of memory a DMRG calculation is using. Please see the code example here:

https://itensor.github.io/ITensors.jl/dev/examples/DMRG.html#Monitoring-the-Memory-Usage-of-DMRG

and let me know if you have any questions about it.

Topic		Replies	Views
OutOfMemoryError in DMRG DMRG and Numerical Methods julia , dmrg	4	61	July 8, 2024
Calculating approximate memory requirement for fixed bond dimension ITensor Julia Questions julia , dmrg	1	339	August 17, 2022
Memory usage in DMRG with Julia 1.x ITensor Julia Questions	6	556	November 13, 2023
Out of memory when applying gates ITensor Julia Questions dmrg , mps	4	316	February 1, 2023
Why does the memory consumption of DMRG far exceed the sum of MPS and MPO by several times? DMRG and Numerical Methods	6	260	November 27, 2023

out of memory problem

Related topics