[gmx-users] No performance increase with single vs multiple nodes

Mark Abraham mark.j.abraham at gmail.com
Mon Oct 9 04:45:48 CEST 2017


Hi,

On Sun, Oct 8, 2017 at 2:40 AM Matthew W Hanley <mwhanley at syr.edu> wrote:

> I am running gromacs 2016.3 on CentOS 7.3 with the following command using
> a PBS scheduler:
>
>
> #PBS -N TEST
>
> #PBS -l nodes=1:ppn=32
>
> export OMP_NUM_THREADS=1
>
> mpirun -N 32  mdrun_mpi -deffnm TEST -dlb yes -pin on -nsteps 50000 -cpi
> TEST
>
>
> However, I am seeing no performance increase when using more nodes:
>
> On 32 MPI ranks
>                Core t (s)   Wall t (s)        (%)
>        Time:    28307.873      884.621     3200.0
>                  (ns/day)    (hour/ns)
> Performance:      195.340        0.123
>
> On 64 MPI ranks
>                Core t (s)   Wall t (s)        (%)
>        Time:    25502.709      398.480     6400.0
>                  (ns/day)    (hour/ns)
> Performance:      216.828        0.111
>
> On 96 MPI ranks
>                Core t (s)   Wall t (s)        (%)
>        Time:    51977.705      541.434     9600.0
>                  (ns/day)    (hour/ns)
> Performance:      159.579        0.150
>
> On 128 MPI ranks
>                Core t (s)   Wall t (s)        (%)
>        Time:   111576.333      871.690    12800.0
>                  (ns/day)    (hour/ns)
> Performance:      198.238        0.121
>
> ?
>

There's several dozen lines of performance analysis at the end of the log
file, which you need to inspect and compare if you want to start to
understand what is going on :-)


> Doing an strace of the mdrun process shows mostly this:
>

strace is not a profiling tool. That's a bit like trying to understand the
performance of 100m sprinters by counting how often they call their
relatives on the phone. ;-) GROMACS does lots of arithmetic, not lots of
calls to system functions.

Mark


More information about the gromacs.org_gmx-users mailing list