I’m encountering conflicting threading behavior when running DMRG calculations with Julia (v1.11.3) and MKL. Despite efforts to limit thread counts, CPU usage spikes to ~2000% with degraded performance. Here are key observations and questions:
Observations:
-
High CPU Usage, Poor Performance:
- With
-t8(Julia threads) andBLAS.set_num_threads(1)in the script,BLAS.get_num_threads()returns 1, buttopshows CPU usage at ~2000%. - DMRG sweep speeds resemble
disable_threaded_blocksparse(), and processes frequently toggle betweenSleep/Runningstates. - Resolved by setting
export MKL_NUM_THREADS=1(CPU usage drops to ~300%, performance improves as expected).
- With
-
MKL Thread Confirmation:
println(ccall((:MKL_Get_Max_Threads, MKL.libmkl_rt), Cint, ()))returns 128 (max allowed) unlessMKL_NUM_THREADS=1is set (then returns 1).
-
Environment Details:
julia> versioninfo() Julia Version 1.11.3 Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 128 × Intel(R) Xeon(R) Gold 6142 JULIA_NUM_THREADS: 4 (default), 128 virtual cores
Questions:
-
Thread Priority Conflict:
Why does MKL ignoreBLAS.set_num_threads(1)and default to 128 threads unless restricted byMKL_NUM_THREADS=1? -
Optimal Thread Configuration:
The ITensor docs warn about conflicts between sparse multithreading and BLAS. ShouldMKL_NUM_THREADS=1always be enforced, or is there a scenario whereMKL_NUM_THREADS=n(withn < Julia threads) improves performance? -
Environment Variables:
Are there additional variables (e.g.,OPENBLAS_NUM_THREADS,JULIA_EXCLUSIVE=1) or Julia-specific settings (e.g.,LinearAlgebra.BLAS.set_num_threadsvs.MKL.jl) that should be prioritized for thread control?
Thanks for any guidance on resolving threading conflicts and optimizing MKL/Julia configurations!