Core-to-Core Latency

Let's measure the inter-core latencies of one of the compute nodes of Noctua 1 at PC2.

using ThreadPinning
latencies = ThreadPinning.bench_core2core_latency()
40×40 Matrix{Float64}:
   0.0   217.05  204.85  206.0   203.3   204.95  211.7   205.2   209.5   210.1   209.65  209.3   198.7   194.95  …  271.1   267.5   265.0   260.85  266.9   267.15  265.8   266.7   265.55  258.85  262.1   263.95  269.4
 215.55    0.0   214.65  215.15  219.8   222.1   219.55  217.1   223.1   224.7   220.25  219.45  213.2   214.6      266.6   269.45  269.85  270.6   271.25  271.1   267.45  265.65  263.15  259.85  260.85  263.2   267.45
 224.2   214.75    0.0   216.05  217.35  219.25  216.45  212.95  219.15  224.4   221.45  219.65  214.55  215.75     270.35  272.55  270.95  275.0   272.15  272.95  267.5   264.15  260.35  263.2   260.8   262.2   264.65
 218.4   216.7   211.9     0.0   220.05  218.5   213.2   215.35  225.85  226.7   220.15  218.7   218.6   216.1      266.85  265.75  266.0   265.8   264.7   265.25  259.7   260.9   260.25  258.6   259.6   262.75  262.0
 221.95  218.5   217.25  212.7     0.0   221.6   220.15  223.75  226.15  224.0   219.45  220.2   214.35  219.3      264.55  267.0   262.6   262.35  264.2   262.2   262.2   263.25  262.4   262.35  264.3   263.7   262.55
 219.85  212.5   214.6   216.25  218.75    0.0   221.5   221.45  222.6   223.8   227.35  222.8   217.95  221.55  …  265.75  267.95  263.8   264.5   265.4   262.95  265.7   264.55  261.9   263.7   265.25  259.95  261.35
 219.15  214.0   214.65  217.8   218.85  217.9     0.0   217.15  227.75  225.6   224.05  217.4   216.8   215.15     266.85  269.95  267.85  264.1   262.55  266.15  267.6   267.1   266.25  263.75  260.95  264.4   267.9
   ⋮                                       ⋮                                       ⋮                             ⋱                            ⋮                                       ⋮
 269.75  265.85  263.4   265.65  265.0   266.8   265.8   264.15  261.35  258.3   262.65  264.45  265.5   268.05     216.55  221.35  219.5   221.05  220.1   211.45    0.0   219.05  219.45  214.95  213.8   212.75  214.9
 267.95  265.7   262.2   262.75  262.0   266.35  264.1   260.45  257.4   264.05  268.05  259.85  264.6   265.4      218.6   225.85  226.8   219.3   220.75  215.7   215.95    0.0   221.05  218.35  214.5   214.2   217.3
 265.65  262.7   263.2   261.8   261.7   260.8   260.95  257.55  259.15  262.05  264.95  263.1   259.55  259.75  …  221.15  223.75  222.8   226.45  226.25  221.05  221.8   219.6     0.0   215.2   216.55  220.45  222.2
 264.4   263.0   265.85  263.6   265.35  257.25  254.2   258.5   261.6   259.95  259.45  262.3   262.65  259.25     219.0   217.95  218.6   223.2   220.75  215.1   215.2   218.25  216.35    0.0   215.75  216.8   220.15
 258.75  262.2   264.2   262.55  262.4   262.6   259.05  258.65  257.5   259.4   265.45  260.1   260.2   261.6      217.4   220.85  219.15  218.05  214.5   214.15  215.8   224.5   217.2   217.35    0.0   218.75  222.8
 264.5   263.35  257.0   262.9   258.65  264.95  266.05  260.75  259.15  264.8   263.95  265.5   267.3   265.35     221.95  222.65  224.25  221.05  220.95  216.5   220.25  220.5   217.2   218.5   218.05    0.0   223.1
 266.6   266.35  262.65  262.2   264.45  267.2   266.8   264.25  263.75  262.75  264.5   266.55  267.4   271.6      223.9   226.0   225.6   228.65  225.3   219.85  218.25  220.55  220.75  217.1   220.2   225.15    0.0

Of course, it is easier to make sense of the result if we visualize it. Here, we use Plots.jl's heatmap function.

using Plots
heatmap(latencies; c = :viridis, frame=:box)

core2core.png

The two sockets / CPUs of the system with 20 cores each are clearly visible since the inter-core latency of cores on different sockets is, expectedly, higher than the same for cores sitting on the same socket / in the same CPU. Note that due to fluctuations in our imperfect benchmark the result is not precisely symmetric (which, of course, it should be in theory).