Custom Operators using ITensors.op in Distributed HPC Environment

Dear ITensors Team,

I am attempting to run ITensors in an HPC environment using Distributed.jl. In my code, I am defining some custom operators. However, these operators fail to work correctly on individual workers and raise the following error: UndefVarError: @OpName_str not defined in Main

This occurs despite the fact that the code has been properly transferred to the workers using @everywhere.

It seems I might be misunderstanding how ITensors.op operates in this distributed context. Could you kindly provide guidance or point me toward a solution?

Thank you for your support.

First here is a minimal code without Distributed which works as accepted:

using ITensors, ITensorMPS
import LinearAlgebra: diagm


# set parameter
const dim_charge_basis = 21

# Simple function to create a custom operator
function create_operator_matrix(op_name::Symbol, dim_charge_basis::Int)
    if op_name == :z
        cutoff = dim_charge_basis/2
        return diagm(Array(range(-cutoff,cutoff-1)))
    else
        return I
    end
end

# Overload ITensors.op for our custom boson operator
ITensors.op(::OpName"Z", ::SiteType"Boson", d::Int) = create_operator_matrix(:z, dim_charge_basis)

# A test function 
function mini_ham(dim_charge_basis::Int)
    os = OpSum()
    os += 0.5, "Z", 1, "Z", 1
    sites = siteinds("Boson", 1, dim = dim_charge_basis)
    mpo = MPO(os, sites)
    return mpo
end

op_on_worker = mini_ham(21)
@show op_on_worker

Here is the distributed version:

using ITensors
using Distributed

# Add one worker for testing
addprocs(1)

@everywhere begin
    using ITensors, ITensorMPS
    import LinearAlgebra: diagm
    
    # set parameter
    const dim_charge_basis = 21
    
    # Simple function to create a custom operator
    function create_operator_matrix(op_name::Symbol, dim_charge_basis::Int)
        if op_name == :z
            cutoff = dim_charge_basis/2
            return diagm(Array(range(-cutoff,cutoff-1)))
        else
            return I
        end
    end
    
    # Overload ITensors.op for our custom boson operator
    ITensors.op(::OpName"Z", ::SiteType"Boson", d::Int) = create_operator_matrix(:z, dim_charge_basis)
    
    # A test function 
    function mini_ham(dim_charge_basis::Int)
        os = OpSum()
        os += 0.5, "Z", 1, "Z", 1
        sites = siteinds("Boson", 1, dim = dim_charge_basis)
        mpo = MPO(os, sites)
        return mpo
    end
end

# Fetch operator from worker 2
op_on_worker = remotecall_fetch(mini_ham, 2, 21)
@show op_on_worker

# Compare with expected 2x2 matrix
expected_op = remotecall_fetch(() -> diagm(Array(range(-charge_cutoff, charge_cutoff))), 2)
@show expected_op

Which results in the following error:

ERROR: LoadError: On worker 2:
LoadError: UndefVarError: `@OpName_str` not defined in `Main`
Suggestion: this global was defined as `ITensors.SiteTypes.var"@OpName_str"` but not assigned a value.
Hint: a global variable of this name also exists in ITensors.
Hint: a global variable of this name also exists in ITensorMPS.

I am using the following environment:
Julia Version = 1.11.3
ITensorMPS = 0.2.6
ITensors = 0.6.22

Hello,
I found a way to resolve the issue by wrapping all the necessary code that I want to share across workers into a new module. Here’s the implementation:

module MyCustomOps

using ITensors, ITensorMPS
import LinearAlgebra: diagm

export mini_ham

const dim_charge_basis = 21

function create_operator_matrix(op_name::Symbol, dim_charge_basis::Int)
    if op_name == :z
        cutoff = dim_charge_basis ÷ 2
        return diagm(Array(range(-cutoff, cutoff)))
    else
        return I
    end
end

ITensors.op(::OpName"Z", ::SiteType"Boson", d::Int) = create_operator_matrix(:z, dim_charge_basis)

function mini_ham(dim_charge_basis::Int)
    os = OpSum()
    os += 0.5, "Z", 1, "Z", 1
    sites = siteinds("Boson", 1, dim=dim_charge_basis)
    return MPO(os, sites)
end

end # module

I then run it using the following script:


using Distributed

import LinearAlgebra: diagm

addprocs(1)

@everywhere include("MyCustomOps.jl")

# set parameter

dim_charge_basis = 21

# Test on worker process

op_on_worker = remotecall_fetch(MyCustomOps.mini_ham, 2, dim_charge_basis)

@show op_on_worker

This approach resolves the issue by ensuring all the necessary code is encapsulated within a module, allowing it to be properly shared among workers.

As the problem was related to Julia’s global variables, Distributed.jl, or scoping rules rather than ITensors itself, I will close this question. Apologies for the misplacement.

Thank you!

2 Likes

Glad you found the solution! Thanks for updating the post.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.