The problem about space left on device

Hi,itensor
I am using ITensor.jl to compute some simple models and test them. For example, I try to test a 16*16 2D ising model. The code is as follows:

  sites = siteinds("S=1/2", N; conserve_szparity=true)
  lattice = square_lattice(Nx, Ny; yperiodic = false)

  ampo = OpSum()
  for b in lattice
    ampo .+= -4.0,"Sx",b.s1,"Sx",b.s2
  end
  for j in 1:N
    ampo .+= 2.0*h,"Sz",j
  end
  H = MPO(ampo,sites)
  state = [isodd(n) ? "Up" : "Dn" for n=1:N]
  psi0 = productMPS(sites,state)

  sweeps = Sweeps(10)
  maxdim!(sweeps,200,400,600,900,1200,1500,1700,2000,2000,2000)
  cutoff!(sweeps,1E-5,1E-5,1E-8,1E-8,1E-9,1E-10,1E-10,1E-12)
  noise!(sweeps,1E-5,1E-5,1E-8,1E-9,1E-10,1E-12,1E-12,0)
  @show sweeps

  @time energy,psi = dmrg(H,psi0,sweeps;write_when_maxdim_exceeds=600)

As you can see I have used write_when_maxdim_exceeds=600 to prevent run out of memory.
Morever,in the script that submits the task,the parameter
#SBATCH --mem=60000
OR
#SBATCH --mem=10G
has been used to to specify the memory size requested by each node
But l still get the error:

After sweep 1 energy=-794.757934867198 maxlinkdim=15 maxerr=9.98E-06 time=39.619
After sweep 2 energy=-806.959114424078 maxlinkdim=41 maxerr=9.99E-06 time=3.312
After sweep 3 energy=-811.314749468693 maxlinkdim=141 maxerr=9.99E-09 time=9.225
ERROR: LoadError: SystemError: close: No space left on device
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base ./error.jl:174
  [2] #systemerror#68
    @ ./error.jl:173 [inlined]
  [3] systemerror
    @ ./error.jl:173 [inlined]
  [4] close
    @ ./iostream.jl:63 [inlined]
  [5] open(::Serialization.var"#1#2"{ITensor}, ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:332
  [6] open
    @ ./io.jl:328 [inlined]
  [7] serialize
    @ ~/opt/julia/julia-1.7.2/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:775 [inlined]
  [8] setindex!
    @ ~/.julia/packages/SerializedElementArrays/cdFxy/src/SerializedElementArrays.jl:78 [inlined]
  [9] _makeL!(P::ITensors.DiskProjMPO, psi::MPS, k::Int64)
    @ ITensors ~/.julia/packages/ITensors/z9cMA/src/mps/abstractprojmpo.jl:157
 [10] makeL!
    @ ~/.julia/packages/ITensors/z9cMA/src/mps/diskprojmpo.jl:84 [inlined]
 [11] position!
    @ ~/.julia/packages/ITensors/z9cMA/src/mps/abstractprojmpo.jl:212 [inlined]
 [12] macro expansion
    @ ~/.julia/packages/ITensors/z9cMA/src/mps/dmrg.jl:208 [inlined]
 [13] macro expansion
    @ ~/.julia/packages/TimerOutputs/nDhDw/src/TimerOutput.jl:252 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/ITensors/z9cMA/src/mps/dmrg.jl:207 [inlined]
 [15] macro expansion
    @ ./timing.jl:299 [inlined]
 [16] dmrg(PH::ProjMPO, psi0::MPS, sweeps::Sweeps; kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:write_when_maxdim_exceeds,), Tuple{Int64}}})
    @ ITensors ~/.julia/packages/ITensors/z9cMA/src/mps/dmrg.jl:188
 [17] #dmrg#949
    @ ~/.julia/packages/ITensors/z9cMA/src/mps/dmrg.jl:47 [inlined]
 [18] macro expansion
    @ timing.jl:220 [inlined]
 [19] macro expansion
    @ ~/article/2dMEPS20.jl:50 [inlined]
 [20] top-level scope
    @ timing.jl:220
in expression starting at 2dMEPS20.jl:17

I want to know why the error still occurs, is there a problem with the code or with the clusterļ¼Œor there is something wrong with my operationsļ¼Ÿ

Sorry to hear thatā€™s happening. If I was in your situation, the thing I would do to start the job again and then log into the node of the cluster that it is running on to verify that no other jobs are sharing the node with yours, and to monitor how much ram the job is really using. You can use the Linux command free -g to see a report of memory usage in gigabytes.

Hiļ¼Œmiles
After my attempts, I make sure that I do not share node with other programs, I use the command and get the parameters of the node asļ¼šRealMem=62.8Gļ¼Œ AvailMem=59.1G.

Referring to this answer:this .I use the tempname() function, and the location I get is ā€œ/tmp/jl_rcHSPG7wxgā€.

ls this the location where the file is written when write_when_maxdim_exceeds is used? If so, is there a way for me to change the location where the file is written when using write_when_maxdim_exceeds , I try to write these temp files to a directory where more data can be stored.

Thanks for your kindeness

Hi kevinh,
I see - I misunderstood your question as that you were running out of ram, but I see that you were running out of disk space. So I agree we should have a feature that allows setting the path where the files get written.

I just made a pull request with a new option for the dmrg function that will let you do this. Unfortunately it wonā€™t be available until the next numbered version, but if you need it sooner you could either:

  • wait until the pull request is merged, then do ] dev ITensors to switch to the latest dev version
  • do ] dev ITensors now and edit the code yourself in the same way, though youā€™d have to change it back if you want to upgrade ITensor again in the future

Best,
Miles

Hiļ¼Œmiles
I read my question again, and I found that I did misunderstand the error message when I first asked the question, which is my fault.

Your answer is very professional and effective, I think it will help those who have the same problem like me.And I also have a small wish that this feature can be included in the official documentation of ITensor.jl.

I am very grateful to ITensor for allowing setting the path where the files get written in a later version. Thank you for your help and dedication!

1 Like

That is a good reminder - I just made sure to document the new write_path option in the docstring for the dmrg function.

If you do need to use this feature right away, please do the ] dev ITensors approach which will get you that feature right away, then when the next version is available you can do ] free ITensors and up ITensors to upgrade officially.

Following the answer, I have solved the related problem.Thank you for your kindness and help.