- I am running a few independent codes, c++ iTensor (v3), on my Linux (ubuntu) with 64 CPUs.
- Is there a way to run each code on a specific number of CPUs?
- Let’s say if I have 8 jobs, each should occupy 8 CPUs and not interfere with each other.
- I am asking this, as I noticed without control over the way codes are spreading on CPUs, breaks down their speed hugely!!.
below, I attached the message after compiling the code on my machine.
g++ -m64 -std=c++17 -fconcepts -fPIC -c -I. -I'/home/pgi3/itensor_new' -O2 -DNDEBUG -Wall -Wno-unknown-pragmas -Wno-unused-variable -o main.o main.cc
g++ -m64 -std=c++17 -fconcepts -fPIC -I. -I'/home/pgi3/itensor_new' -O2 -DNDEBUG -Wall -Wno-unknown-pragmas -Wno-unused-variable main.o -o main -L'/home/pgi3/itensor_new/lib' -litensor -L/opt/intel/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_rt -lmkl_core -liomp5 -lpthread
When you say you are running 8 jobs, do you mean on a shared computer cluster? Or on a single machine (your machine, say)?
If on a cluster, then you will need to configure your job submission script to request a certain number of CPUs. For example, if your cluster uses the “slurm” system for managing jobs, there are certain flags or options you can pass to control this.
If on your own machine, then you are asking about multithreading and using a certain number of CPU cores. This is controlled by command line variables. The main ones to know about are the ones controlling the multithreading behavior of BLAS such as MKL_NUM_THREADS (since I see you’re using MKL). The other kind of multithreading present in ITensor is done using OpenMP and is multithreading over separate non-zero blocks (if present) due to conserved quantum numbers and symmetries. You can control the amount of this multithreading by setting the variable OMP_NUM_THREADS. A best practice for this situation is to set MKL_NUM_THREADS=1 if you are setting OMP_NUM_THREADS to something greater than 1.
Lastly, if you are talking about a single machine, you just might not be able to run 8 separate processes eight times faster than a single process. CPUs share a lot of cache memory and other resources and multithreading often does not offer ideal speedups beyond a certain point.
Many thanks for your fast and detailed reply.