<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; ">Hi Trevor,<DIV><BR class="khtml-block-placeholder"></DIV><DIV>It's probably due to memory bandwidth limitations, as well as Intel's design.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Intel managed to get quad cores to market by gluing together two dual-core chips. All communication between them has to go over the front side bus though, and all eight cores in a system share the bandwidth to memory.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>This can become a problem when you're running in parallel, since all eight processes are communicating (=using the bus bandwidth) at once, and have to share it. You will probably get much better performance by running multiple (8) independent simulations.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Essentially, there's no such thing as a free lunch. Intel's quad-core chips are cheap, but have the same drawback as their first generation dual-core chips. AMD's solution with real quad-cores and on-chip memory controllers in Barcelona is looking a whole lot better, but I also expect it to be quite a bit more expensive.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>You might want to test the CVS version for better scaling. The lower amount of data communicated there might improve performance a bit for you.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Cheers,</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Erik</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><BR><DIV><DIV>On May 27, 2007, at 6:28 PM, Trevor Marshall wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite"> Can anybody give me any ideas which might help me optimize my new cluster for a more linear speed increase as I add computing cores? The new intel Core2 CPUs are inherently very fast, and my mdrun simulation performance is becoming asymptotic to a value only about twice the speed I can get from a single core.<BR><BR> I have included the log output from mdrun_mpi when using 5 cores at the foot of this email. But here is the system overview<BR>  <BR> My cluster system which comprises two computers running Fedora Core 6 and MPI-GAMMA. Both have Intel Core2 CPUs running at 3GHz core speed (overclocked). The main machine now has a sparkling new Core2 Quad 4-processor CPU and the remote still has a Core2-duo dual core CPU.<BR><BR> Networking hardware is crossover CAT6 cables. The GAMMA software is connected thru one Intel PRO/1000 board in each computer, with MTU 9000. A Gigabit adapter with Realtek chipset is the primary Linux network in each machine, with MTU 1500. For the common filesystem I am running NFS on a mounted filesystem with "async" declared in the exports file. The mount is /dev/hde1 to /media and then /media is exported via NFS to the cluster machine. File I/O does not seem to be a bottleneck.<BR><BR> With mdrun_mpi I am calculating a 240aa protein and ligand for 10,000 time intervals. Here are the results for various combinations of one, two, three, four and five cores.<BR><BR> One local core only running mdrun: <X-TAB>     </X-TAB>18.3 hr/nsec<X-TAB>    </X-TAB>2.61 Gflops<BR> Two local cores:<X-TAB>        </X-TAB><X-TAB>        </X-TAB><X-TAB>        </X-TAB><X-TAB>        </X-TAB>9.98 hr/nsec<X-TAB>    </X-TAB>4.83 Gflops<BR> Three local cores:<X-TAB>      </X-TAB><X-TAB>        </X-TAB><X-TAB>        </X-TAB><X-TAB>        </X-TAB>7.35 hr/nsec<X-TAB>    </X-TAB>6.65 Gflops<BR> Four local cores (one also controlling)<X-TAB> </X-TAB>7.72 hr/nsec<X-TAB>    </X-TAB>6.42 Gflops<BR> Three local cores and two remote cores:<X-TAB> </X-TAB>7.59 hr/nsec<X-TAB>    </X-TAB>6.72 GFlops<BR> One local and 2 remote cores:<X-TAB>   </X-TAB><X-TAB>        </X-TAB>9.76 hr/nsec<X-TAB>    </X-TAB>5.02 GFlops<BR><BR> I get good performance with one local core doing control, and three doing calculations, giving 6.66 Gflops. However, adding two extra remote cores only increases the speed a very small amount to 6.72 Gflops, even though the log (below) shows good task distribution (I think).<BR><BR> Is there some problem with scaling when using these new fast CPUs? Can I tweak anything in mdrun_mpi to give better scaling?<BR><BR> Sincerely<BR> Trevor<BR> ------------------------------------------<BR> Trevor G Marshall, PhD<BR> School of Biological Sciences and Biotechnology, Murdoch University, Western Australia<BR> Director, Autoimmunity Research Foundation, Thousand Oaks, California<BR> Patron, Australian Autoimmunity Foundation.<BR> ------------------------------------------<BR>  <BR> <X-TAB>        </X-TAB>M E G A - F L O P S   A C C O U N T I N G<BR><BR> <X-TAB>        </X-TAB>Parallel run - timing based on wallclock.<BR>    RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy<BR>    T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)<BR>    NF=No Forces<BR><BR>  Computing:                        M-Number         M-Flops  % of Flops<BR> -----------------------------------------------------------------------<BR>  LJ                              928.067418    30626.224794     1.1<BR>  Coul(T)                         886.762558    37244.027436     1.4<BR>  Coul(T) [W3]                     92.882138    11610.267250     0.4<BR>  Coul(T) + LJ                    599.004388    32945.241340     1.2<BR>  Coul(T) + LJ [W3]               243.730360    33634.789680     1.2<BR>  Coul(T) + LJ [W3-W3]           3292.173000  1257610.086000    45.6<BR>  Outer nonbonded loop            945.783063     9457.830630     0.3<BR>  1,4 nonbonded interactions       41.184118     3706.570620     0.1<BR>  Spread Q Bspline              51931.592640   103863.185280     3.8<BR>  Gather F Bspline              51931.592640   623179.111680    22.6<BR>  3D-FFT                        40498.449440   323987.595520    11.7<BR>  Solve PME                      3000.300000   192019.200000     7.0<BR>  NS-Pairs                       1044.424912    21932.923152     0.8<BR>  Reset In Box                     24.064040      216.576360     0.0<BR>  Shift-X                         961.696160     5770.176960     0.2<BR>  CG-CoM                            8.242234      239.024786     0.0<BR>  Sum Forces                      721.272120      721.272120     0.0<BR>  Bonds                            25.022502     1075.967586     0.0<BR>  Angles                           36.343634     5924.012342     0.2<BR>  Propers                          13.411341     3071.197089     0.1<BR>  Impropers                        12.171217     2531.613136     0.1<BR>  Virial                          241.774175     4351.935150     0.2<BR>  Ext.ens. Update                 240.424040    12982.898160     0.5<BR>  Stop-CM                         240.400000     2404.000000     0.1<BR>  Calc-Ekin                       240.448080     6492.098160     0.2<BR>  Constraint-V                    240.424040     1442.544240     0.1<BR>  Constraint-Vir                  215.884746     5181.233904     0.2<BR>  Settle                           71.961582    23243.590986     0.8<BR> -----------------------------------------------------------------------<BR>  Total                                       2757465.194361   100.0<BR> -----------------------------------------------------------------------<BR><BR>                NODE (s)   Real (s)      (%)<BR>        Time:    408.000    408.000    100.0<BR>                        6:48<BR>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)<BR> Performance:     14.810      6.758      3.176      7.556<BR><BR> Detailed load balancing info in percentage of average<BR> Type        NODE:  0   1   2   3   4 Scaling<BR> -------------------------------------------<BR>              LJ:423   0   3  41  32     23%<BR>         Coul(T):500   0   0   0   0     20%<BR>    Coul(T) [W3]:  0   0  32 291 176     34%<BR>    Coul(T) + LJ:500   0   0   0   0     20%<BR> Coul(T) + LJ [W3]:  0   0  24 296 178     33%<BR> Coul(T) + LJ [W3-W3]: 60 116 108 106 107     86%<BR> Outer nonbonded loop:246  42  45  79  85     40%<BR> 1,4 nonbonded interactions:500   0   0   0   0     20%<BR> Spread Q Bspline: 98 100 102 100  97     97%<BR> Gather F Bspline: 98 100 102 100  97     97%<BR>          3D-FFT:100 100 100 100 100    100%<BR>       Solve PME:100 100 100 100 100    100%<BR>        NS-Pairs:107  96  91 103 100     93%<BR>    Reset In Box: 99 100 100 100  99     99%<BR>         Shift-X: 99 100 100 100  99     99%<BR>          CG-CoM:110  97  97  97  97     90%<BR>      Sum Forces:100 100 100  99  99     99%<BR>           Bonds:499   0   0   0   0     20%<BR>          Angles:500   0   0   0   0     20%<BR>         Propers:499   0   0   0   0     20%<BR>       Impropers:500   0   0   0   0     20%<BR>          Virial: 99 100 100 100  99     99%<BR> Ext.ens. Update: 99 100 100 100  99     99%<BR>         Stop-CM: 99 100 100 100  99     99%<BR>       Calc-Ekin: 99 100 100 100  99     99%<BR>    Constraint-V: 99 100 100 100  99     99%<BR>  Constraint-Vir: 54 111 111 111 111     89%<BR>          Settle: 54 111 111 111 111     89%<BR><BR>     Total Force: 93 102  97 104 102     95%<BR><BR> <BR>     Total Shake: 56 110 110 110 110     90%<BR><BR> <BR> Total Scaling: 95% of max performance<BR><BR> Finished mdrun on node 0 Sun May 27 07:29:57 2007<BR> <B>  <BR> </B><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">_______________________________________________</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">gmx-users mailing list<SPAN class="Apple-converted-space">    </SPAN><A href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</A></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><A href="http://www.gromacs.org/mailman/listinfo/gmx-users">http://www.gromacs.org/mailman/listinfo/gmx-users</A></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Please search the archive at <A href="http://www.gromacs.org/search">http://www.gromacs.org/search</A> before posting!</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Please don't post (un)subscribe requests to the list. Use the<SPAN class="Apple-converted-space"> </SPAN></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">www interface or send it to <A href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</A>.</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Can't post? Read <A href="http://www.gromacs.org/mailing_lists/users.php">http://www.gromacs.org/mailing_lists/users.php</A></DIV> </BLOCKQUOTE></DIV><BR></DIV></BODY></HTML>