Bug in factorize_qr leading to very slow calls to orthogonalize!

srwhite · January 11, 2024, 7:36pm

If I factorize a tensor with indices i,j,k, and I specify i,j as the left indices, then the factor L has a new third index, i,j,m. m should not be larger than k. With factorize_qr, it can be larger than k, but not with factorize_svd. In some sense this is just a property of qr, since q being orthonormal it must be square–perhaps one should have called qr on the transpose of the matrix. I discovered this with extremely large times for truncate!, which was calling orthogonalize!, and the non-truncating factorizations were blowing up in size. So this property of qr must be avoided, switching to svd (or perhaps lq?).
I have a serialized example case which this code reads. I don’t see a place to post it here so I’ll email it to Miles and Matt.

using ITensors, Serialization

Mb,linds,ltags = deserialize("factors.ser")

maxdim = 1000
@show norm(Mb)
L, R = factorize(Mb, linds; which_decomp="svd",tags=ltags, maxdim)
@show dims(Mb),dim.(linds),dims(L)
L, R = factorize(Mb, linds; which_decomp=nothing,tags=ltags, maxdim)
@show dims(Mb),dim.(linds),dims(L)
L, R = factorize(Mb, linds; which_decomp="qr",tags=ltags, maxdim)
@show dims(Mb),dim.(linds),dims(L)

srwhite · January 11, 2024, 7:54pm

I forgot to include the output:
julia dofact.jl
norm(Mb) = 515.1285760520133
(dims(Mb), dim.(linds), dims(L)) = ((20, 16, 74), [16, 74], (16, 74, 20))
(dims(Mb), dim.(linds), dims(L)) = ((20, 16, 74), [16, 74], (16, 74, 92))
(dims(Mb), dim.(linds), dims(L)) = ((20, 16, 74), [16, 74], (16, 74, 92))

miles · January 11, 2024, 10:00pm

Thanks, Steve, for catching and digging into this. Some initial tests indicate it’s probably to do with the QN-conserving version of our QR code as I don’t see it happening with dense tensors. I am able to reproduce the error you get above when I load the serialized data. We’ll keep working on it and let you know.

miles · January 11, 2024, 10:10pm

One update on this: I found that when I make a new, randomly generated ITensor with the same indices as Mb above that things work correctly again. So it seemed to be something particular to the data inside Mb.

I then did this:

@show checkflux(Mb)

and it gives ERROR: LoadError: Fluxes not all equal. So it seems that Mb is an invalid QN-conserving ITensor with some blocks that aren’t compatible with the overall flux (in this case apparently zero flux). Do you know if this could be possibly so, or did the tensor come from another part of the library (i.e. a bug somewhere else in ITensor)?

srwhite · January 11, 2024, 10:24pm

Aha! This comes from a conversion of an MPO into an MPS, where the anomalous norms of MPO’s gets fixed. The sites are Electron sites, so the index of size 16 is from combining site indices s and s’. This is my own code (Superpose.jl, which I may have sent you previously), so this indicates that there is a bug in that conversion. The relevant code is very short; perhaps you will see a bug in it immediately:

function MPOconverter(sites)
    n = length(sites)
    invmat = 
     [0.75   0.25  -0.5  -0.5
      0.25  -0.25   0.5  -0.5
      0.25  -0.25  -0.5   0.5
     -0.25   0.25   0.5   0.5]
    bigim = Matrix(1.0*I,16,16)
    bigim[7:10,7:10] = invmat
    converter = fill(ITensor(),n)
    unconverter = fill(ITensor(),n)
    mpsinds = Index[]
    for i=1:n
        c = combiner(dag(sites[i]),prime(sites[i]))
        cind = combinedind(c)
        con = ITensor(bigim,dag(cind),cind')
        converter[i] = noprime(c*con;tags="Link")
        uncon = ITensor(inv(bigim)',cind,dag(cind'))
        unconverter[i] = noprime(uncon * dag(c);tags="Link")
        push!(mpsinds,cind)
    end
    converter,unconverter,mpsinds
end

function MPOtoMPS(mpo,mpsinds,conver)
    sites = dag.(firstsiteinds(mpo))
    mps = MPS(mpsinds)
    for i=1:length(mpo)
        mps[i] = mpo[i] * conver[i]
    end
    mps
end

miles · January 12, 2024, 1:41am

It’s a good question of whether the bug is apparent or not. I don’t think I see it right away, but if I had to guess it could be the constructors like

con = ITensor(bigim,dag(cind),cind')

where you make an ITensor from a matrix, but also with indices that carry quantum numbers. In that case it might be that the matrix is non-zero in multiple blocks that have a different total quantum number flux.

srwhite · January 12, 2024, 7:01am

This code seems to be fine. I tracked the bad flux to the creation of the MPO, and found the bug there, in my code.
So the factorize_qr issue is not so serious: it just responds badly to MPSes with bad fluxes. Thanks for looking at this!

mtfishman · January 12, 2024, 12:52pm

In terms of an actionable item for this, the main question would be where to try to catch this kind of issue.

@srwhite, does your code throw an error if you enable debug checks with ITensors.enable_debug_checks()? That can enable some debug checks including checking that tensors have proper fluxes, but I’m not sure if it would catch this particular issue. If not, we could add a debug check of checkflux() somewhere.

Also, the QR code could check that the block structure is correct (the blocks should be of the form of a permutation matrix, i.e. only one block per row or column, my guess is that isn’t the case here).

srwhite · January 12, 2024, 5:15pm

enable_debug_checks() did not catch anything. checkflux() does. It seems a checkflux() at the beginning of the QN version of factorize_qr would take care of this. The MPO creation function simply flipped the sign of the QN on one of the link indices for each tensor from what it should have been. That routine now has a checkflux() in it, along with the fix.
There is a slight worry that the qr logic is flawed and that this case just revealed it, but I doubt it: it is natural to identify relevant blocks with the assistance of the flux (but I haven’t actually looked at it).

miles · January 12, 2024, 8:55pm

I created a PR that now does checkflux on the input to factorize if enable_debug_checks() is on.

srwhite · February 1, 2024, 4:30pm

It looks like a checkflux function needs to be added to projmposum.jl. When I turn on ITensors.enable_debug_checks() when doing dmrg over a vector of MPO’s, I get:
ERROR: LoadError: MethodError: no method matching checkflux(::ProjMPOSum)

Closest candidates are:
checkflux(::ITensor)
@ ITensors ~/.julia/packages/ITensors/MnaxI/src/qn/flux.jl:25
checkflux(::ITensor, ::Any)
@ ITensors ~/.julia/packages/ITensors/MnaxI/src/qn/flux.jl:17
checkflux(::ITensors.AbstractProjMPO)
@ ITensors ~/.julia/packages/ITensors/MnaxI/src/mps/abstractprojmpo.jl:243

mtfishman · February 1, 2024, 9:03pm

Ok thanks for reporting that, should be a simple fix.

mtfishman · February 5, 2024, 6:12pm

checkflux(::ProjMPOSum) is being added in [ITensors] Implement missing `AbstractProjMPO` `checkflux` by mtfishman · Pull Request #1330 · ITensor/ITensors.jl · GitHub which will be included in the next release.

Topic		Replies	Views
QR Factorization of singular matrix constructed from SVD ITensor Julia Questions julia , mps	2	457	September 22, 2022
implement full QR/SVD with QNs preserved ITensor Development julia	2	249	August 6, 2022
Difference between svd and factorize_svd functions ITensor Julia Questions julia	1	231	January 12, 2023
Get different results from svd when permuting the indices of tensor ITensor Julia Questions	3	43	July 21, 2024
How to replace SVD in ITensors by random SVD ITensor Julia Questions	1	251	March 15, 2024

Bug in factorize_qr leading to very slow calls to orthogonalize!

Related topics