Distributed.jl + Threads
ThreadPinning.jl has dedicated support for pinning Julia threads of Julia workers (Distributed.jl) in multi-processing applications, see Querying - Distributed.jl and Pinning - Distributed.jl. Note that you can use these tools irrespective of whether your parallel application is pure (i.e. each Julia workers runs a single Julia thread) or hybrid (i.e. each Julia worker runs multiple Julia threads).
Basic example
julia> using Distributed
julia> withenv("JULIA_NUM_THREADS" => 2) do
addprocs(4) # spawn 4 workers with 2 threads each
end
4-element Vector{Int64}:
2
3
4
5
julia> @everywhere using ThreadPinning
julia> distributed_getcpuids()
Dict{Int64, Vector{Int64}} with 4 entries:
5 => [246, 185]
4 => [198, 99]
2 => [135, 226]
3 => [78, 184]
julia> distributed_getispinned() # none pinned yet
Dict{Int64, Vector{Bool}} with 4 entries:
5 => [0]
4 => [0]
2 => [0]
3 => [0]
julia> distributed_pinthreads(:sockets) # pin to sockets (round-robin)
julia> distributed_getispinned() # all pinned
Dict{Int64, Vector{Bool}} with 4 entries:
5 => [1, 1]
4 => [1, 1]
2 => [1, 1]
3 => [1, 1]
julia> distributed_getcpuids()
Dict{Int64, Vector{Int64}} with 4 entries:
5 => [66, 67]
4 => [2, 3]
2 => [0, 1]
3 => [64, 65]
julia> socket(1, 1:4), socket(2, 1:4) # check
([0, 1, 2, 3], [64, 65, 66, 67])
julia> distributed_pinthreads(:numa) # pin to numa domains (round-robin)
julia> distributed_getcpuids()
Dict{Int64, Vector{Int64}} with 4 entries:
5 => [48, 49]
4 => [32, 33]
2 => [0, 1]
3 => [16, 17]
julia> numa(1, 1:2), numa(2, 1:2), numa(3, 1:2), numa(4, 1:2) # check
([0, 1], [16, 17], [32, 33], [48, 49])