<html>
<body>
Can anybody give me any ideas which might help me optimize my new cluster
for a more linear speed increase as I add computing cores? The new intel
Core2 CPUs are inherently very fast, and my mdrun simulation performance
is becoming asymptotic to a value only about twice the speed I can get
from a single core.<br><br>
I have included the log output from mdrun_mpi when using 5 cores at the
foot of this email. But here is the system overview<br>
<br>
My cluster system which comprises two computers running Fedora Core 6 and
MPI-GAMMA. Both have Intel Core2 CPUs running at 3GHz core speed
(overclocked). The main machine now has a sparkling new Core2 Quad
4-processor CPU and the remote still has a Core2-duo dual core
CPU.<br><br>
Networking hardware is crossover CAT6 cables. The GAMMA software is
connected thru one Intel PRO/1000 board in each computer, with MTU 9000.
A Gigabit adapter with Realtek chipset is the primary Linux network in
each machine, with MTU 1500. For the common filesystem I am running NFS
on a mounted filesystem with "async" declared in the exports
file. The mount is /dev/hde1 to /media and then /media is exported via
NFS to the cluster machine. File I/O does not seem to be a
bottleneck.<br><br>
With mdrun_mpi I am calculating a 240aa protein and ligand for 10,000
time intervals. Here are the results for various combinations of one,
two, three, four and five cores.<br><br>
One local core only running mdrun:
<x-tab> </x-tab>18.3
hr/nsec<x-tab> </x-tab>2.61 Gflops<br>
Two local
cores:<x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab>9.98
hr/nsec<x-tab> </x-tab>4.83 Gflops<br>
Three local
cores:<x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab>7.35
hr/nsec<x-tab> </x-tab>6.65 Gflops<br>
Four local cores (one also controlling)<x-tab> </x-tab>7.72
hr/nsec<x-tab> </x-tab>6.42 Gflops<br>
Three local cores and two remote cores:<x-tab> </x-tab>7.59
hr/nsec<x-tab> </x-tab>6.72 GFlops<br>
One local and 2 remote
cores:<x-tab> </x-tab><x-tab> </x-tab>9.76
hr/nsec<x-tab> </x-tab>5.02 GFlops<br><br>
I get good performance with one local core doing control, and three doing
calculations, giving 6.66 Gflops. However, adding two extra remote cores
only increases the speed a very small amount to 6.72 Gflops, even though
the log (below) shows good task distribution (I think).<br><br>
Is there some problem with scaling when using these new fast CPUs? Can I
tweak anything in mdrun_mpi to give better scaling?<br><br>
Sincerely<br>
Trevor<br>
------------------------------------------<br>
Trevor G Marshall, PhD<br>
School of Biological Sciences and Biotechnology, Murdoch University,
Western Australia<br>
Director, Autoimmunity Research Foundation, Thousand Oaks,
California<br>
Patron, Australian Autoimmunity Foundation.<br>
------------------------------------------<br>
<br>
<x-tab> </x-tab>M E G A -
F L O P S A C C O U N T I N G<br><br>
<x-tab> </x-tab>Parallel
run - timing based on wallclock.<br>
RF=Reaction-Field FE=Free Energy
SCFE=Soft-Core/Free Energy<br>
T=Tabulated
W3=SPC/TIP3p W4=TIP4p (single or pairs)<br>
NF=No Forces<br><br>
Computing:
M-Number M-Flops %
of Flops<br>
-----------------------------------------------------------------------<br>
LJ
928.067418 30626.224794
1.1<br>
Coul(T)
886.762558 37244.027436
1.4<br>
Coul(T)
[W3]
92.882138 11610.267250
0.4<br>
Coul(T) +
LJ
599.004388 32945.241340
1.2<br>
Coul(T) + LJ
[W3]
243.730360 33634.789680
1.2<br>
Coul(T) + LJ
[W3-W3]
3292.173000 1257610.086000 45.6<br>
Outer nonbonded
loop
945.783063 9457.830630
0.3<br>
1,4 nonbonded interactions
41.184118 3706.570620
0.1<br>
Spread Q
Bspline
51931.592640 103863.185280 3.8<br>
Gather F
Bspline
51931.592640 623179.111680 22.6<br>
3D-FFT
40498.449440 323987.595520 11.7<br>
Solve
PME
3000.300000 192019.200000 7.0<br>
NS-Pairs
1044.424912 21932.923152
0.8<br>
Reset In
Box
24.064040
216.576360 0.0<br>
Shift-X
961.696160 5770.176960
0.2<br>
CG-CoM
8.242234 239.024786
0.0<br>
Sum
Forces
721.272120
721.272120 0.0<br>
Bonds
25.022502 1075.967586
0.0<br>
Angles
36.343634 5924.012342
0.2<br>
Propers
13.411341 3071.197089
0.1<br>
Impropers
12.171217 2531.613136
0.1<br>
Virial
241.774175 4351.935150
0.2<br>
Ext.ens.
Update
240.424040 12982.898160
0.5<br>
Stop-CM
240.400000 2404.000000
0.1<br>
Calc-Ekin
240.448080 6492.098160
0.2<br>
Constraint-V
240.424040 1442.544240
0.1<br>
Constraint-Vir
215.884746 5181.233904
0.2<br>
Settle
71.961582 23243.590986
0.8<br>
-----------------------------------------------------------------------<br>
Total
2757465.194361 100.0<br>
-----------------------------------------------------------------------<br><br>
NODE (s) Real (s) (%)<br>
Time:
408.000 408.000 100.0<br>
6:48<br>
(Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>
Performance: 14.810
6.758 3.176
7.556<br><br>
Detailed load balancing info in percentage of average<br>
Type NODE: 0
1 2 3 4 Scaling<br>
-------------------------------------------<br>
LJ:423 0 3 41
32 23%<br>
Coul(T):500
0 0 0 0
20%<br>
Coul(T) [W3]: 0 0 32 291
176 34%<br>
Coul(T) + LJ:500 0 0
0 0 20%<br>
Coul(T) + LJ [W3]: 0 0 24 296
178 33%<br>
Coul(T) + LJ [W3-W3]: 60 116 108 106 107
86%<br>
Outer nonbonded loop:246 42 45 79
85 40%<br>
1,4 nonbonded interactions:500 0 0
0 0 20%<br>
Spread Q Bspline: 98 100 102 100 97
97%<br>
Gather F Bspline: 98 100 102 100 97
97%<br>
3D-FFT:100 100 100 100
100 100%<br>
Solve PME:100 100 100 100
100 100%<br>
NS-Pairs:107 96 91 103
100 93%<br>
Reset In Box: 99 100 100 100
99 99%<br>
Shift-X: 99 100 100 100
99 99%<br>
CG-CoM:110
97 97 97 97 90%<br>
Sum Forces:100 100 100 99
99 99%<br>
Bonds:499 0 0 0
0 20%<br>
Angles:500
0 0 0 0
20%<br>
Propers:499
0 0 0 0
20%<br>
Impropers:500 0
0 0 0 20%<br>
Virial: 99 100 100
100 99 99%<br>
Ext.ens. Update: 99 100 100 100 99
99%<br>
Stop-CM: 99 100 100 100
99 99%<br>
Calc-Ekin: 99 100 100 100
99 99%<br>
Constraint-V: 99 100 100 100
99 99%<br>
Constraint-Vir: 54 111 111 111 111
89%<br>
Settle: 54 111 111 111
111 89%<br><br>
Total Force: 93 102 97 104
102 95%<br><br>
<br>
Total Shake: 56 110 110 110
110 90%<br><br>
<br>
Total Scaling: 95% of max performance<br><br>
Finished mdrun on node 0 Sun May 27 07:29:57 2007<br>
<b> <br>
</b></body>
</html>