Julia Threads + BLAS Threads
This page is concerned with the performance and pinning issues that can occur if you run a multithreaded Julia code that, on each thread, performs linear algebra operations (BLAS/LAPACK calls). In this case, one must ensure that cores aren't oversubscribe due to the two levels of multithreading.
Relevant discourse threads, see here and here.
OpenBLAS
If
OPENBLAS_NUM_THREADS=1, OpenBLAS uses the calling Julia thread(s) to run BLAS computations, i.e. it "reuses" the Julia thread that runs a computation.If
OPENBLAS_NUM_THREADS=N>1, OpenBLAS creates and manages its own pool of BLAS threads (Nin total). There is one BLAS thread pool (for all Julia threads).Julia default:
OPENBLAS_NUM_THREADS=8(Julia version ≤ 1.8) andOPENBLAS_NUM_THREADS=Sys.CPU_THREADS(Julia version ≥ 1.8).
When you start Julia in multithreaded mode, i.e. julia -tX or JULIA_NUM_THREADS=X, it is generally recommended to set OPENBLAS_NUM_THREADS=1 or, equivalently, BLAS.set_num_threads(1). Given the behavior above, increasing the number of BLAS threads to N>1 can very easily lead to worse performance, in particular when N<<X! Hence, if you want to or need to deviate from unity, make sure to "jump" from OPENBLAS_NUM_THREADS=1 to OPENBLAS_NUM_THREADS=# of cores or similar.
Intel MKL
Given
MKL_NUM_THREADS=N, MKL startsNBLAS threads per Julia thread that makes a BLAS call.Default:
MKL_NUM_THREADS=# of physical cores, i.e. excluding hyperthreads. (Verified experimentally but would be good to find a source for this.)
When you start Julia in multithreaded mode, i.e. julia -tX or JULIA_NUM_THREADS=X, we recommend to set MKL_NUM_THREADS=(# of cores)/X or, equivalently, BLAS.set_num_threads((# of cores)/X) (after using MKL). Unfortunately, the default is generally suboptimal as soon as you don't run Julia with a single thread. Hence, make sure to tune the settings appropriately.
Side comment: It is particularly bad / confusing that OpenBLAS and MKL behave very differently for multithreaded Julia.
Be aware that calling an MKL function (for the first time) can spoil the pinning of Julia threads! A concrete example is discussed here. TLDR: You want to make sure that MKL_DYNAMIC=false. Apart from setting the environment variable you can also dynamically call ThreadPinning.mkl_set_dynamic(0). Note that, by default, ThreadPinning.jl will warn you if you call one of the pinning functions while MKL_DYNAMIC=true.
threadinfo(; blas=true, hints=true)
To automatically detect whether you (potentially) have suboptimal BLAS thread settings, you can provide the keyword arguments blas=true and hints=true to threadinfo. An example can be found here.