I am crosschecking some of my VUMPS results with finite DMRG and I am having difficult storing the MPS so that I can restart the calculation (my cluster has a 24hr walltime, so this is essential).
I do not believe this has anything to do with issues about memory from older Julia versions, like mentioned here, as I use the heapsize keyword and implementing that same observer, which does not solve the problem.
The way in which I am saving is based on this very helpful discussion. To be clear my code example is here
struct DMRGSaver <: AbstractObserver
filepath::String
checkpoint::String
end
function ITensorMPS.checkdone!(o::DMRGSaver; kwargs...)
println("Energy per site = $(real(kwargs[:energy]) / length(siteinds(kwargs[:psi])))")
attrs = Dict("ψ" => kwargs[:psi], "energy" => kwargs[:energy], "currentSweep" => kwargs[:sweep])
GC.gc()
save(o.checkpoint, attrs)
attrs = 0
GC.gc()
return false
end
I have tried some naive things like suggesting GC and redefining the dictionary to zero to free memory, but that does not solve anything.
I am attaching an image of the memory usage over time. The .jld2
files are around 18GB, so it’s not unbelievable I would have memory issues, but with 200GB on a node, I am hoping I can find a way around this problem. I’m open to trying many things and happy to take suggestions if there’s something I’m doing incorrect.