I’m encountering conflicting threading behavior when running DMRG calculations with Julia (v1.11.3) and MKL. Despite efforts to limit thread counts, CPU usage spikes to ~2000% with degraded performance. Here are key observations and questions:
Observations:
-
High CPU Usage, Poor Performance:
- With
-t8
(Julia threads) andBLAS.set_num_threads(1)
in the script,BLAS.get_num_threads()
returns 1, buttop
shows CPU usage at ~2000%. - DMRG sweep speeds resemble
disable_threaded_blocksparse()
, and processes frequently toggle betweenSleep
/Running
states. - Resolved by setting
export MKL_NUM_THREADS=1
(CPU usage drops to ~300%, performance improves as expected).
- With
-
MKL Thread Confirmation:
println(ccall((:MKL_Get_Max_Threads, MKL.libmkl_rt), Cint, ()))
returns 128 (max allowed) unlessMKL_NUM_THREADS=1
is set (then returns 1).
-
Environment Details:
julia> versioninfo() Julia Version 1.11.3 Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 128 × Intel(R) Xeon(R) Gold 6142 JULIA_NUM_THREADS: 4 (default), 128 virtual cores
Questions:
-
Thread Priority Conflict:
Why does MKL ignoreBLAS.set_num_threads(1)
and default to 128 threads unless restricted byMKL_NUM_THREADS=1
? -
Optimal Thread Configuration:
The ITensor docs warn about conflicts between sparse multithreading and BLAS. ShouldMKL_NUM_THREADS=1
always be enforced, or is there a scenario whereMKL_NUM_THREADS=n
(withn < Julia threads
) improves performance? -
Environment Variables:
Are there additional variables (e.g.,OPENBLAS_NUM_THREADS
,JULIA_EXCLUSIVE=1
) or Julia-specific settings (e.g.,LinearAlgebra.BLAS.set_num_threads
vs.MKL.jl
) that should be prioritized for thread control?
Thanks for any guidance on resolving threading conflicts and optimizing MKL/Julia configurations!