Is there a way to save checkpoints for Julia's TD-DMRG?

Hello!

I’m using the TDVP method for DMRG in ITensorMPS. The systems I’m working on are fairly expensive and I have no way to create checkpoint files in case I run out of time. Is there a built in function in ITensor Julia that allows for a checkpoint file to be created, or any advice about how to implement this?

Thank you!

Would it be sufficient to save the state periodically during the evolution? If so you can use the observer system (which is basically a callback system) for that: ITensorMPS.jl/examples/solvers/04_tdvp_observers.jl at v0.3.25 · ITensor/ITensorMPS.jl · GitHub.

I apologize for my late response! I’ve tried a few things with the observer system, but the best I can get is to save both the MPS and MPO to a HDF5 file and use those to restart a calculation (from this old forum post: Write MPS to file in Julia - ITensor Support Q&A). My current issue is that my calculation is sometimes cancelled before the results can be printed. Is there a way I can output to file during a calculation? This is my current working versions of the example you linked to, both the initial file and the restarted version! Thank you!!

First TDVP calculation:

using ITensorMPS: MPO, MPS, OpSum, expect, inner, siteinds, tdvp
using Observers: observer
using HDF5

function main()
    function heisenberg(N)
        os = OpSum()
        for j in 1:(N - 1)
            os += 0.5, "S+", j, "S-", j + 1
            os += 0.5, "S-", j, "S+", j + 1
            os += "Sz", j, "Sz", j + 1
        end
        return os
    end

    N = 10
    s = siteinds("S=1/2", N; conserve_qns = true)
    H = MPO(heisenberg(N), s)

    step(; sweep) = sweep
    current_time(; current_time) = current_time
    return_state(; state) = state
    measure_sz(; state) = expect(state, "Sz"; sites = length(state) ÷ 2)
    obs = observer(
        "steps" => step, "times" => current_time, "states" => return_state, "sz" => measure_sz
    )

    init = MPS(s, n -> isodd(n) ? "Up" : "Dn")
    state = tdvp(
        H, -1.0im, init; time_step = -0.1im, cutoff = 1.0e-12, (step_observer!) = obs, outputlevel = 1
    )

    println("\nResults")
    println("=======")
    for n in 1:length(obs.steps)
        print("step = ", obs.steps[n])
        print(", time = ", round(obs.times[n]; digits = 3))
        print(", |⟨ψⁿ|ψⁱ⟩| = ", round(abs(inner(obs.states[n], init)); digits = 3))
        print(", |⟨ψⁿ|ψᶠ⟩| = ", round(abs(inner(obs.states[n], state)); digits = 3))
        print(", ⟨Sᶻ⟩ = ", round(obs.sz[n]; digits = 3))
        println()

        fo = h5open("output.h5", "w")
        write(fo, "MPS", state)
        write(fo, "MPO", H)
        close(fo)
    end
    return nothing

end

main()

Second TDVP calculation:

using ITensorMPS: MPO, MPS, OpSum, expect, inner, siteinds, tdvp
using Observers: observer
using HDF5

function main()
    function heisenberg(N)
        os = OpSum()
        for j in 1:(N - 1)
            os += 0.5, "S+", j, "S-", j + 1
            os += 0.5, "S-", j, "S+", j + 1
            os += "Sz", j, "Sz", j + 1
        end
        return os
    end

    f = h5open("output.h5", "r")
    psi = read(f, "MPS", MPS)
    H = read(f, "MPO", MPO)
    close(f)

    N = 10
    #s = siteinds("S=1/2", N; conserve_qns = true)
    #H = MPO(heisenberg(N), s)

    step(; sweep) = sweep
    current_time(; current_time) = current_time
    return_state(; state) = state
    measure_sz(; state) = expect(state, "Sz"; sites = length(state) ÷ 2)
    obs = observer(
        "steps" => step, "times" => current_time, "states" => return_state, "sz" => measure_sz
    )

    init = psi
    state = tdvp(
        H, -1.0im, init; time_step = -0.1im, cutoff = 1.0e-12, (step_observer!) = obs, outputlevel = 1
    )

    println("\nResults")
    println("=======")
    for n in 1:length(obs.steps)
        print("step = ", obs.steps[n])
        print(", time = ", round(obs.times[n]; digits = 3))
        print(", |⟨ψⁿ|ψⁱ⟩| = ", round(abs(inner(obs.states[n], init)); digits = 3))
        print(", |⟨ψⁿ|ψᶠ⟩| = ", round(abs(inner(obs.states[n], state)); digits = 3))
        print(", ⟨Sᶻ⟩ = ", round(obs.sz[n]; digits = 3))
        println()

        fo = h5open("output_2.h5", "w")
        write(fo, "MPS", state)
        write(fo, "MPO", H)
        close(fo)
    end
    return nothing

end

I didn’t read your code carefully but that approach looks reasonable. What do you mean by:

? It looks like you are saving your MPO/MPS every step, which is the most frequent it could be saved.

Thank you! I think this confirms that my current issue is not with ITensor but with Julia and buffering. I will assign a simulation some amount of time, and the time will run out before the simulation has completed with no results printed or HDF5 file created. I’ve tried using flush(stdout) and hf5flush({filename}) with no luck, so I’ve reached out to the support staff for the cluster I use. Thank you so much!

Looking at your script more closely, I see that your observer is just saving the states in memory but not writing them to disk with HDF5. I think what you want to do is modify return_state so that it writes the states to disk at each step.

Thank you so much for looking at it again! I made sure to change return_state; here is my approach:

    return_state = function (; state, kwargs...)
        h5open("save.h5", "w") do fo
            write(fo, "MPS", state)
        end
        return state
    end

and used this in obs. Thank you for your help!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.