SVD failing in DMRG

Hey all,
in the last days I run sometimes in problems where the dmrg function fails due to some error in a SVD calculated at some sweep number:

Start of simulation
##################################################
After sweep 1 energy=-57.216430047955 maxlinkdim=10 maxerr=1.23E-01 time=50.673
After sweep 2 energy=-57.610363826140 maxlinkdim=10 maxerr=6.96E-02 time=27.161
After sweep 3 energy=-57.954014138941 maxlinkdim=10 maxerr=1.69E-02 time=26.724
After sweep 4 energy=-58.086359546880 maxlinkdim=10 maxerr=4.72E-03 time=25.953
After sweep 5 energy=-58.094465745634 maxlinkdim=10 maxerr=3.14E-03 time=25.303
After sweep 6 energy=-58.095061609641 maxlinkdim=10 maxerr=2.97E-03 time=25.246
After sweep 7 energy=-58.095174885906 maxlinkdim=10 maxerr=2.94E-03 time=25.344
After sweep 8 energy=-58.095237650353 maxlinkdim=10 maxerr=2.94E-03 time=25.321
After sweep 9 energy=-58.095299232888 maxlinkdim=10 maxerr=2.93E-03 time=23.758
After sweep 10 energy=-59.513098784853 maxlinkdim=60 maxerr=6.07E-06 time=71.063
After sweep 11 energy=-59.594473384592 maxlinkdim=60 maxerr=9.15E-06 time=77.712
After sweep 12 energy=-59.620823610536 maxlinkdim=60 maxerr=1.85E-05 time=75.851
After sweep 13 energy=-59.622819105192 maxlinkdim=60 maxerr=1.78E-05 time=76.022
After sweep 14 energy=-59.623108803869 maxlinkdim=60 maxerr=1.90E-05 time=76.164
After sweep 15 energy=-59.623142914994 maxlinkdim=60 maxerr=1.95E-05 time=76.331
After sweep 16 energy=-59.623145429623 maxlinkdim=60 maxerr=1.99E-05 time=75.937
After sweep 17 energy=-59.623142510522 maxlinkdim=60 maxerr=2.00E-05 time=75.934
After sweep 18 energy=-59.623141432224 maxlinkdim=60 maxerr=2.01E-05 time=76.780
After sweep 19 energy=-59.635776138916 maxlinkdim=110 maxerr=2.78E-06 time=93.577
After sweep 20 energy=-59.636247577622 maxlinkdim=110 maxerr=3.44E-06 time=102.476
After sweep 21 energy=-59.630274401129 maxlinkdim=110 maxerr=1.96E-04 time=130.475
After sweep 22 energy=-59.630743109198 maxlinkdim=110 maxerr=1.39E-04 time=133.933
After sweep 23 energy=-59.632001211138 maxlinkdim=110 maxerr=6.42E-05 time=133.059
After sweep 24 energy=-59.633931869556 maxlinkdim=110 maxerr=2.51E-05 time=133.139
After sweep 25 energy=-59.635502176972 maxlinkdim=110 maxerr=9.90E-06 time=131.035
After sweep 26 energy=-59.636118353095 maxlinkdim=110 maxerr=4.95E-06 time=128.213
After sweep 27 energy=-59.636292952968 maxlinkdim=110 maxerr=3.66E-06 time=127.692
After sweep 28 energy=-59.637890640527 maxlinkdim=160 maxerr=9.27E-07 time=153.293
After sweep 29 energy=-59.637940200469 maxlinkdim=160 maxerr=1.01E-06 time=163.434
After sweep 30 energy=-59.637944429888 maxlinkdim=160 maxerr=1.06E-06 time=161.888
After sweep 31 energy=-59.637945357337 maxlinkdim=160 maxerr=1.08E-06 time=130.339
After sweep 32 energy=-59.637945423548 maxlinkdim=160 maxerr=1.10E-06 time=130.544
After sweep 33 energy=-59.637945459074 maxlinkdim=160 maxerr=1.11E-06 time=130.376
After sweep 34 energy=-59.637945433403 maxlinkdim=160 maxerr=1.11E-06 time=129.854
After sweep 35 energy=-59.637945379951 maxlinkdim=160 maxerr=1.11E-06 time=130.898
After sweep 36 energy=-59.637945425959 maxlinkdim=160 maxerr=1.11E-06 time=130.653
The SVD algorithm `"divide_and_conquer"` has thrown an error,
likely because of a convergance failure. You can try
other SVD algorithms that may converge better using the
`alg` (or `svd_alg` if called through `factorize` or MPS/MPO functionality) keyword argument:

 - "divide_and_conquer" is a divide-and-conquer algorithm
   (LAPACK's `gesdd`). It is fast, but may lead to some innacurate
   singular values for very ill-conditioned matrices.
   It also may sometimes fail to converge, leading to errors
   (in which case `"qr_iteration"` or `"recursive"` can be tried).

 - `"qr_iteration"` (LAPACK's `gesvd`) is typically slower 
   than "divide_and_conquer", especially for large matrices,
   but is more accurate for very ill-conditioned matrices 
   compared to `"divide_and_conquer"`.

 - `"recursive"` is ITensor's custom SVD algorithm. It is very
   reliable, but may be slow if high precision is needed.
   To get an `svd` of a matrix `A`, an eigendecomposition of
   ``A^{\dagger} A`` is used to compute `U` and then a `qr` of
   ``A^{\dagger} U`` is used to compute `V`. This is performed
   recursively to compute small singular values.

Returning `nothing`. For an output `F = svd(A, ...)` you can check if
`isnothing(F)` in your code and try a different algorithm.

To suppress this message in the future, you can wrap the `svd` call in the
`@suppress` macro from the `Suppressor` package.

ERROR: LoadError: MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(!Matched::Union{LinRange, StepRangeLen}) at /p/software/juwels/stages/2022/software/Julia/1.7.1-gcccoremkl-11.2.0-2021.4.0/share/julia/base/range.jl:826
  iterate(!Matched::Union{LinRange, StepRangeLen}, !Matched::Integer) at /p/software/juwels/stages/2022/software/Julia/1.7.1-gcccoremkl-11.2.0-2021.4.0/share/julia/base/range.jl:826
  iterate(!Matched::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at /p/software/juwels/stages/2022/software/Julia/1.7.1-gcccoremkl-11.2.0-2021.4.0/share/julia/base/dict.jl:695
  ...
Stacktrace:
  [1] indexed_iterate(I::Nothing, i::Int64)
    @ Base ./tuple.jl:92
  [2] replacebond!(M::MPS, b::Int64, phi::ITensor; kwargs::Base.Pairs{Symbol, Any, NTuple{8, Symbol}, NamedTuple{(:maxdim, :mindim, :cutoff, :eigen_perturbation, :ortho, :normalize, :which_decomp, :svd_alg), Tuple{Int64, Int64, Float64, Nothing, String, Bool, Nothing, String}}})
    @ ITensors ~/.julia/1.7.1/default-gpu/packages/ITensors/ZMKMP/src/mps/mps.jl:453
  [3] macro expansion
    @ ~/.julia/1.7.1/default-gpu/packages/ITensors/ZMKMP/src/mps/dmrg.jl:250 [inlined]
  [4] macro expansion
    @ /p/software/juwels/stages/2022/software/Julia/1.7.1-gcccoremkl-11.2.0-2021.4.0/extensions/packages/TimerOutputs/5tW2E/src/TimerOutput.jl:252 [inlined]
  [5] macro expansion
    @ ~/.julia/1.7.1/default-gpu/packages/ITensors/ZMKMP/src/mps/dmrg.jl:249 [inlined]
  [6] macro expansion
    @ ./timing.jl:299 [inlined]
  [7] dmrg(PH::ProjMPO, psi0::MPS, sweeps::Sweeps; kwargs::Base.Pairs{Symbol, TerminationObserver{TriCriticalIsing}, Tuple{Symbol}, NamedTuple{(:observer,), Tuple{TerminationObserver{TriCriticalIsing}}}})
    @ ITensors ~/.julia/1.7.1/default-gpu/packages/ITensors/ZMKMP/src/mps/dmrg.jl:188
  [8] #dmrg#925
    @ ~/.julia/1.7.1/default-gpu/packages/ITensors/ZMKMP/src/mps/dmrg.jl:47 [inlined]
. . .

As you can see, the problem nicely converges but than after sweep 36 it somehow fails. Simulating the model with slightly modified parameters results in a stable simulation again.

Is there any way I can catch such an error and redo the step with a different SVD algorithm, or set the SVD algorithm to a different one from the start?

Thanks!

1 Like

What version of ITensors.jl are you using? At some point I added a feature where it catches these kinds of errors and tries another SVD backend.

Hey,
ok… I was doing calculations on a supercluster where I am bound to use Julia version “1.7.1”.
This limits me to use ITensors version “0.2.15”. Maybe I need to put my own Julia version on the cluster allowing me to use newer versions of ITensors.
Thanks!

Also, in the meantime you’ll see in the middle of the error message above that it shows optional keywords you can pass that will change the SVD backend, precisely to deal with these kinds of crashes (which unfortunately are often due to the LAPACK backend). You can try passing one of these keywords to see if that helps, if you can’t upgrade to use Matt’s new automatic feature.

Ok, since I am scanning a phasediagram with a large number of points I do not want to set this flag for all points where the SVD is converging. So probably I will rather install my own version on the cluster:)

Makes sense, however for DMRG specifically the SVD is not often the most time consuming step. So switching to another backend may only slow down your code by a little bit. But you should test it out on a few runs to check if you do go that route.

Ok makes sense. Thanks!

Also, you should be able to upgrade to the latest ITensors.jl version if you are using Julia 1.7. The only limitation is that at some point we dropped support for Julia versions below 1.6. So if you can’t upgrade to the latest ITensor version, it is for some other reason.

You can try to force Julia to upgrade to the latest version with:

julia> ] add ITensors@0.3.22

Either that should work, or Julia will tell you why it can’t upgrade.

Hey,
sorry for the late reply.

I tried explicitly adding the 0.3.22 version, however it fails with this error:
[
add ITensors@0.3.22
Resolving package versions…
ERROR: Unsatisfiable requirements detected for package ITensors [9136182c]:
ITensors [9136182c] log:
├─possible versions are: 0.1.0-0.2.15 or uninstalled
└─restricted to versions 0.3.22 by an explicit requirement — no versions left
]
I removed the ITensors package and tried readding but still failing. Not sure if this is because the Cluster only has this version available (the cluster is behind a strict firewall, so that it is not possible to directly install from github…) or something else is failing…

Thanks for the effort.

Hm, that’s a strange error message. Perhaps trying update ITensors instead of using the add command will work better, or give a more helpful error message?

This does nothing, just stating no changes to the manifest.toml etc.