[gmx-users] GROMACS not scaling well with Core4 Quad technology CPUs

Erik Lindahl lindahl at cbr.su.se
Mon May 28 08:57:22 CEST 2007


Hi,

On May 28, 2007, at 1:59 AM, Trevor Marshall wrote:

> Erik,
> I also have older systems which use Opteron 165 CPUs. I have run  
> tests of the AMD Opteron 165 CPUs (2.18GHz) against the Intel Core2  
> Duos (3GHz). Twelve concurrent AutoDock jobs on each machine show  
> the Core2 duos outperforming the Opterons by a factor of two.

Yes, but are those AutoDock jobs MPI-parallel or just multiple  
independent scalar jobs not communicating between the cores?

Gromacs also provides beautiful performance (close to 100% scaling)  
if you run e.g. 8 independent jobs on a dual quad-core box.


> The data I posted showed inconsistencies which have nothing to do  
> with memory bandwidth, and I was rather hoping for an analysis  
> based upon the manner in which GROMACS mdrun distributes its  
> computing tasks.

Gromacs isn't doing the distribution. That's entirely up to the MPI  
library and the OS.


> I don't believe my data shows memory bandwidth-limiting effects.  
> For example, three 'local' CPUs on the quad core are faster  
> (6.65Gflops) than one of the Quads (5.02 Gflops) and two from the  
> cluster. How does that support the memory bandwidth hypothesis?

As far as I understand you're using gigabit ethernet. Even with Gamma  
that's going to be way higher latency and lower bandwidth compared to  
the shared memory communication on a quad-core machine.

>
> I figured that it might be possible that the GAMMA MP software is  
> causing overhead, but when I examined the distribution of tasks by  
> GROMACS (in the log I provided) it would seem that the tasks which  
> mdrun distributed to GAMMA actually were distributed well, but that  
> that the manner in which CPU0 hogged most of the mdrun calculations  
> might be a bottleneck. It was insight into GROMACS' mdrun  
> distribution methodology which I was seeking. Is there any  
> quantitative data available for me to review?

If you're interested in comparing the scaling performance of quad- 
core compared to other hardware I would start with the benchmarks on  
the www site.

If it's about getting the highest possible performance you could  
either play with the "-load" option to grompp, or check out the CVS  
development tree with full domain decomposition and dynamic load  
balance implemented (warning, there could still be bugs).

Cheers,

Erik





More information about the gromacs.org_gmx-users mailing list