Hi,
I am doing a TEBD computation on an HPC, and I want to understand how to use multithreading effectively. I am using QN conserving systems. Following the instructions in the multithreading page of the documentation, especially the section on multithreaded block sparse operations, I have enabled block sparse multithreading and disabled BLAS and Strided multithreading. Following is my code -
using ITensors, ITensorMPS
using Dates
using LinearAlgebra
using Strided
ITensors.enable_threaded_blocksparse(true)
BLAS.set_num_threads(1)
Strided.set_num_threads(1)
let
println("BLAS version - ", BLAS.vendor())
println(" number of threads - ", Sys.CPU_THREADS)
L = 50
M = 10
J = 1.0
Delta = 1.0
h = 0.0
chi = 200
dt = 0.05
tend = 20
h_list = [h for i=1:L]
tsteps = Int(tend/dt)
cutoff = 1e-12
#| creating the lattice and the gates
s = siteinds("S=1/2", L; conserve_sz=true)
gates = gates_order4(J,h_list,Delta,dt,s) # this function creates the 2-site gates
#| creating the initial MPS - particles in the middle
particles = zeros(Int, L)
center = floor(Int, L/2)
start = center - floor(Int, M/2)+1
stop = center + ceil(Int, M/2)
particles[start:stop] .= 1
psi = MPS(s, n -> particles[n]==1 ? "Up" : "Dn")
t_start = time_ns()
for t in 1:tsteps
psi = apply(gates, psi; maxdim = chi,cutoff = 1e-12)
normalize!(psi)
end
println("time taken = ", (time_ns() - t_start)/1e9, "s")
end
When I use 1 CPU, the code runs faster compared to when I use more than 1 CPUs. I also get some warnings.
Following is the output that I get for the two cases -
1 CPU
========================================
SLURM Job Information
========================================
Nodes: 1
Tasks: 1
CPUs per task: 1
Total CPUs: 1
Node List: node2
Partition: short
Submit Directory: /home/tamoghna.ray/project_entropy
Start Time: Tue Nov 18 17:06:45 IST 2025
========================================
WARNING: You are trying to enable block sparse multithreading, but you have started Julia with only a single thread. You can start Julia with `N` threads with `julia -t N`, and check the number of threads Julia can use with `Threads.nthreads()`. Your system has 32 threads available to use, which you can determine by running `Sys.CPU_THREADS`.
BLAS version - lbt
number of threads - 32
time taken = 994.608243258s
16 CPUs
========================================
SLURM Job Information
========================================
Nodes: 1
Tasks: 1
CPUs per task: 16
Total CPUs: 16
Node List: node3
Partition: short
Submit Directory: /home/tamoghna.ray/project_entropy
Start Time: Tue Nov 18 17:06:56 IST 2025
========================================
WARNING: You are enabling block sparse multithreading, but your BLAS configuration LBTConfig([ILP64] libopenblas64_.so) is currently set to use 16 threads. When using block sparse multithreading, we recommend setting BLAS to use only a single thread, otherwise you may see suboptimal performance. You can set it with `using LinearAlgebra; BLAS.set_num_threads(1)`.
WARNING: You are enabling block sparse multithreading, but Strided.jl is currently set to use 16 threads for performing dense tensor permutations. When using block sparse multithreading, we recommend setting Strided.jl to use only a single thread, otherwise you may see suboptimal performance. You can set it with `NDTensors.Strided.disable_threads()` and see the current number of threads it is using with `NDTensors.Strided.get_num_threads()`.
BLAS version - lbt
number of threads - 32
time taken = 1314.07988932s
Although I have set BLAS and Strided threads to 1, I am getting these warnings stating that BLAS and Strided currently set to use 16 threads.
Following is the bash file that I use to submit the jobs -
#!/bin/bash
#SBATCH --job-name=BM_16
#SBATCH --partition=short,long
#SBATCH --output=/home/tamoghna.ray/project_entropy/error_files/%x_check%N_%j.out
#SBATCH --error=/home/tamoghna.ray/project_entropy/error_files/%x_check%N_%j.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=60G
# SBATCH --mail-user=tamoghna.ray@icts.res.in
# SBATCH --mail-type=ALL
#SBATCH --array=0
export JULIA_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Print Slurm job information
echo "========================================"
echo "SLURM Job Information"
echo "========================================"
echo "Nodes: $SLURM_JOB_NUM_NODES"
echo "Tasks: $SLURM_NTASKS"
echo "CPUs per task: $SLURM_CPUS_PER_TASK"
echo "Total CPUs: $SLURM_CPUS_ON_NODE"
echo "Node List: $SLURM_JOB_NODELIST"
echo "Partition: $SLURM_JOB_PARTITION"
echo "Submit Directory: $SLURM_SUBMIT_DIR"
echo "Start Time: $(date)"
echo "========================================"
echo ""
# Run Julia script with parameters
julia test_multithreading.jl
What am I missing and what should I do to use multithreading efficiently for TEBD?