Checkpointing TDVP / VUMPS

I’m trying to both checkpoint a VUMPS calculation and force the increase in bond-dimension to follow a scheduler. What I have written here works fine for the scheduling.

    for i in 1:outer_iters
        wvfcn = input * "-" * string(maxdims[i]) * ".jld2"
        println(wvfcn)
        if isfile(wvfcn) 
          println("WaveFunction already exists")
          ψ = load(wvfcn)["ψ"]

        else
          println("\nCheck translational invariance of initial infinite MPS")
          @show norm(contract(ψ.AL[1:N]..., ψ.C[N]) - contract(ψ.C[0], ψ.AR[1:N]...))
          vumps_kwargs = (tol=vumps_tol, maxiter=max_vumps_iters, outputlevel, eager)
          subspace_expansion_kwargs = (cutoff=cutoff, maxdim=maxdims[i])
          #ψ = vumps_subspace_expansion(H, ψ; outer_iters=iter, subspace_expansion_kwargs, vumps_kwargs)
          ψ = @time subspace_expansion(ψ, H; subspace_expansion_kwargs...)
          println("\nRun VUMPS with new bond dimension $(maxlinkdim(ψ))")
          ψ = @time tdvp(H, ψ; time_step=-Inf, vumps_kwargs...)
          data = Dict("ψ" => ψ)
          save(wvfcn, data)
        end
    end

However, if I try and load a previous wavefunction from a jld2 file, I get the following error for both using the vumps_subspace_expansion wrapper or not.

norm(contract(ψ.AL[1:N]..., ψ.C[N]) - contract(ψ.C[0], ψ.AR[1:N]...)) = 5.094392794687761e-7
The two-site subspace expansion produced a zero-norm expansion at (1, 2). This is likely due to the long-range nature of the QN conserving Hamiltonian.
The two-site subspace expansion produced a zero-norm expansion at (2, 3). This is likely due to the long-range nature of the QN conserving Hamiltonian.
 16.511079 seconds (17.67 M allocations: 1.139 GiB, 7.27% gc time, 99.70% compilation time)

Run VUMPS with new bond dimension 8
Using VUMPS solver with time step -Inf
Running VUMPS with multisite_update_alg = sequential
ERROR: LoadError: DimensionMismatch: In scalar(T) or T[], ITensor T is not a scalar (it has indices ((dim=4|id=514|"Electron,Site,c=0,n=2"), (dim=4|id=265|"Electron,Site,c=0,n=2")', (dim=4|id=265|"Electron,Site,c=0,n=2"), (dim=4|id=514|"Electron,Site,c=0,n=2")', (dim=4|id=662|"Electron,Site,c=1,n=1"), (dim=4|id=257|"Electron,Site,c=1,n=1")', (dim=4|id=257|"Electron,Site,c=1,n=1"), (dim=4|id=662|"Electron,Site,c=1,n=1")')).
Stacktrace:

Could someone educate me on what is going wrong or the correct way to set-up such checkpoints?

Probably that means that you have an index mismatch somewhere. As a way to debug that, you can try printing your tensors and see where it might be happening.

Sorry, why is that behaviour possible though? If I’m just loading the saved version of the output, I can’t see why the indices would be disturbed? The same script works if I do not load the wavefunction and just let it run, so that seems highly unlikely.

I’m not sure why in your particular case, but that is almost always the issue when there is an error like that.

@mtfishman

Thanks for the quick response, The tensor I save is

ψ = InfiniteCanonicalMPS
[1] ((dim=4|id=268|"Electron,Site,c=1,n=1"), (dim=64|id=85|"Link,c=0,l=2"), (dim=64|id=34|"Link,c=1,l=1"))
[2] ((dim=4|id=832|"Electron,Site,c=1,n=2"), (dim=64|id=34|"Link,c=1,l=1"), (dim=64|id=85|"Link,c=1,l=2"))

and the tensor I load is


ψ = InfiniteCanonicalMPS
[1] ((dim=4|id=268|"Electron,Site,c=1,n=1"), (dim=64|id=85|"Link,c=0,l=2"), (dim=64|id=34|"Link,c=1,l=1"))
[2] ((dim=4|id=832|"Electron,Site,c=1,n=2"), (dim=64|id=34|"Link,c=1,l=1"), (dim=64|id=85|"Link,c=1,l=2"))

but then the error says


ERROR: LoadError: DimensionMismatch: In scalar(T) or T[], ITensor T is not a scalar (it has indices ((dim=4|id=832|"Electron,Site,c=0,n=2"), (dim=4|id=777|"Electron,Site,c=0,n=2")', (dim=4|id=777|"Electron,Site,c=0,n=2"), (dim=4|id=832|"Electron,Site,c=0,n=2")', (dim=4|id=268|"Electron,Site,c=1,n=1"), (dim=4|id=128|"Electron,Site,c=1,n=1")', (dim=4|id=128|"Electron,Site,c=1,n=1"), (dim=4|id=268|"Electron,Site,c=1,n=1")'))

so I don’t understand where these extra indices are coming from ? Is this what you meant.

Do you have a better way of saving and loading wavefunctions?

I wasn’t trying to imply that saving and loading changes the indices, and what you print confirms that they didn’t change. However, it seems like some part of the code logic is leading to an index mismatch, which may be more related to how you reorganized the code in order to implement this checkpointing (I’m being purposefully vague because I’m hoping to nudge you toward debugging this yourself, I don’t have time to read your code in detail).

From the error you printed:

you can see that there are multiple indices on sites n=1 and n=2 that have different ID numbers, so there is an index mismatch happening. I think it is up to you to dig into the code, print out indices at different steps, and try to figure out why that is happening (that is how I would debug this kind of problem, unless you prefer a debugger which might also help in that process).

I appreciate the pedagogical nudge, but this seems like a genuine bug of subspace expansion. There is not an additional code logic. I’ve rewritten it so it is explicitly just identical to the loop from inside the wrapper (from inside just an example code) just saving the wave-function

@time for outer_iter in 1:outer_iters
        wvfcn = input * "wvfcn.jld2"
        println(wvfcn)
        if isfile(wvfcn)
            println("WaveFunction already exists")
            ψ = load(wvfcn)["ψ"]
            println("Loaded wavefunction to $(wvfcn)")
            @show ψ
        end
        #else
        println(
            "\nIncrease bond dimension $(outer_iter) out of $(outer_iters), starting from dimension $(maxlinkdim(ψ))",
        )
        println(
            "cutoff = $(subspace_expansion_kwargs[:cutoff]), maxdim = $(subspace_expansion_kwargs[:maxdim])",
        )
        ψ = @time subspace_expansion(ψ, H; subspace_expansion_kwargs...)
        println("\nRun VUMPS with new bond dimension $(maxlinkdim(ψ))")
        ψ = @time tdvp(H, ψ; time_step=-Inf, vumps_kwargs...)
        data = Dict("ψ" => ψ)
        save(wvfcn, data)
        println("Saved wavefunction to $(wvfcn)")
        @show ψ
        #end
    end

which runs fine if starting from scratch even with saving and loading the wavefunction, but gives DimensionMismatch if stopped and restarted from a intermediary bond-dimension. This seems to indicate it is the dimension rather than the indices that’s a problem.

I believe check-pointing is best practice, so I think it is worth finding a general solution that would be helpful more broadly. I would not ask for detailed code review.

Quick thing to check - are you constructing H with the same indices as psi? (ala save/load w/ HDF5 example)

2 Likes

Thanks, this is exactly the thing I had done wrong. Thanks for pointing me in the right direction!

1 Like