[gmx-users] scalability of Gromacs with MPI

Jan Thorbecke janth at xs4all.nl
Mon Jan 23 16:00:59 CET 2006


Dear Users,

At this moment I'm working on a benchmark for Gromacs. The benchmark  
is set up to run from 32 to 128 CPU's. The scalability is fine up to  
64 CPU's beyond that the code is not scaling anymore (see table  
below). What prevents it from scaling are the (ring) communication  
parts move_x, and move_f. Those parts together take about 20 s. on  
128 CPU's.

CPU's | 3.3 and fftw3 |
------|---------------|
32    |  142 s.       |
64    |   88 s.       |
128   |   70 s.       |


I have no background in Molecular Dynamics and just look at the code  
from a performance point of view. My questions are:

- Has anybody scaled Gromacs upto more that 64 CPU's? My guess is  
that inherent to the MD problem solved by Gromacs, there is a limit  
in the number of processors that could be used efficiently. At some  
point the communication of the forces to all other CPU's will  
dominate the wallclock time time.

- I tried to change the ring communication in move_x and move-f to  
collective communication, but that does not help the scalability.  
Does anybody tried other communication schemes?

- Are there options to try with grompp to set-up a different domain  
decomposition (for example blocks in x,y,z in stead of lines in x) or  
other parallelisation strategies?


thanks for your help,

Jan

The run uses (from md0.log):

nsb->nnodes:     64
nsb->cgtotal: 260548
nsb->natoms:  552588
nsb->shift:      33
nsb->bshift:      0

......

parameters of the run:
    integrator           = md
    nsteps               = 1000
    init_step            = 0
    ns_type              = Grid
    nstlist              = 10
    ndelta               = 2
    bDomDecomp           = FALSE
    decomp_dir           = 0
    nstcomm              = 1
    comm_mode            = Linear
    nstcheckpoint        = 0
    nstlog               = 0
    nstxout              = 5000
    nstvout              = 5000
    nstfout              = 0
    nstenergy            = 250
    nstxtcout            = 0
    init_t               = 0
    delta_t              = 0.002
    xtcprec              = 1000
    nkx                  = 320
    nky                  = 320
    nkz                  = 56
    pme_order            = 4
    ewald_rtol           = 1e-05
    ewald_geometry       = 0
    epsilon_surface      = 0
    optimize_fft         = TRUE
    ePBC                 = xyz
    bUncStart            = FALSE
    bShakeSOR            = FALSE
    etc                  = Berendsen
    epc                  = Berendsen
    epctype              = Semiisotropic
    tau_p                = 1




More information about the gromacs.org_gmx-users mailing list