[gmx-users] load imbalance in multiple GPU simulations

yunshi11 . yunshi09 at gmail.com
Mon Dec 9 00:32:26 CET 2013


Hi Szilard,




On Sun, Dec 8, 2013 at 2:48 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:

> Hi,
>
> That's unfortunate, but not unexpected. You are getting a 3x1x1
> decomposition where the "middle" cell has most of the protein, hence
> most of the bonded forces to calculate, while the ones on the side
> have little (or none).
>
> From which values can I tell this?


> Currently, the only thing you can do is to try using more domains,
> perhaps with manual decomposition (such that the initial domains will
> contain as much protein as possible). This may not help much, though.
> In extreme cases (e.g. small system), even using only two of the three
> GPUs could improve performance

Cheers,
> --
> Szilárd
>
>
> On Sun, Dec 8, 2013 at 8:10 PM, yunshi11 . <yunshi09 at gmail.com> wrote:
> > Hi all,
> >
> > My conventional MD run (equilibration) of a protein in TIP3 water had the
> > "Average load imbalance: 59.4 %" when running with 3 GPUs + 12 CPU cores.
> > So I wonder how to tweak parameters to optimize the performance.
> >
> > End of the log file reads:
> >
> > ......
> >         M E G A - F L O P S   A C C O U N T I N G
> >
> >  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
> >  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
> >  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
> >  V&F=Potential and force  V=Potential only  F=Force only
> >
> >  Computing:                               M-Number         M-Flops  %
> Flops
> >
> -----------------------------------------------------------------------------
> >  Pair Search distance check           78483.330336    706349.973     0.1
> >  NxN QSTab Elec. + VdW [F]         11321254.234368   464171423.609
>  95.1
> >  NxN QSTab Elec. + VdW [V&F]         114522.922048     6756852.401
> 1.4
> >  1,4 nonbonded interactions            1645.932918    148133.963     0.0
> >  Calc Weights                         25454.159073    916349.727     0.2
> >  Spread Q Bspline                    543022.060224     1086044.120
> 0.2
> >  Gather F Bspline                    543022.060224     3258132.361
> 0.7
> >  3D-FFT                             1138719.444112     9109755.553
> 1.9
> >  Solve PME                              353.129616     22600.295     0.0
> >  Reset In Box                           424.227500        1272.682
> 0.0
> >  CG-CoM                                 424.397191        1273.192
> 0.0
> >  Bonds                                  330.706614     19511.690     0.0
> >  Angles                                1144.322886    192246.245     0.0
> >  Propers                               1718.934378    393635.973     0.1
> >  Impropers                              134.502690     27976.560     0.0
> >  Pos. Restr.                            321.706434       16085.322
> 0.0
> >  Virial                                 424.734826        7645.227
> 0.0
> >  Stop-CM                                 85.184882         851.849
> 0.0
> >  P-Coupling                            8484.719691       50908.318
> 0.0
> >  Calc-Ekin                              848.794382     22917.448     0.0
> >  Lincs                                  313.720420     18823.225     0.0
> >  Lincs-Mat                             1564.146576        6256.586
> 0.0
> >  Constraint-V                          8651.865815     69214.927     0.0
> >  Constraint-Vir                         417.065668     10009.576     0.0
> >  Settle                                2674.808325      863963.089
> 0.2
> >
> -----------------------------------------------------------------------------
> >  Total                                               487878233.910
> 100.0
> >
> -----------------------------------------------------------------------------
> >
> >
> >     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> >
> >  av. #atoms communicated per step for force:  2 x 63413.7
> >  av. #atoms communicated per step for LINCS:  2 x 3922.5
> >
> >  Average load imbalance: 59.4 %
> >  Part of the total run time spent waiting due to load imbalance: 5.0 %
> >
> >
> >      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >
> >  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles     %
> >
> -----------------------------------------------------------------------------
> >  Domain decomp.         3    4       2500      42.792     1300.947
> 4.4
> >  DD comm. load          3    4         31       0.000        0.014
> 0.0
> >  Neighbor search        3    4       2501      33.076     1005.542
> 3.4
> >  Launch GPU ops.        3    4     100002       6.537      198.739
> 0.7
> >  Comm. coord.           3    4      47500      20.349      618.652
> 2.1
> >  Force                  3    4      50001      75.093     2282.944
> 7.8
> >  Wait + Comm. F         3    4      50001      24.850      755.482
> 2.6
> >  PME mesh               3    4      50001     597.925    18177.760
>  62.0
> >  Wait GPU nonlocal      3    4      50001       9.862      299.813
> 1.0
> >  Wait GPU local         3    4      50001       0.262        7.968
> 0.0
> >  NB X/F buffer ops.     3    4     195002      33.578     1020.833
> 3.5
> >  Write traj.            3    4         12       0.506       15.385
> 0.1
> >  Update                 3    4      50001      23.243      706.611
> 2.4
> >  Constraints            3    4      50001      70.972     2157.657
> 7.4
> >  Comm. energies         3    4       2501       0.386       11.724
> 0.0
> >  Rest                   3                      24.466      743.803
> 2.5
> >
> -----------------------------------------------------------------------------
> >  Total                  3                     963.899    29303.873
> 100.0
> >
> -----------------------------------------------------------------------------
> >
> -----------------------------------------------------------------------------
> >  PME redist. X/F        3    4     100002     121.844     3704.214
>  12.6
> >  PME spread/gather      3    4     100002     300.759     9143.486
>  31.2
> >  PME 3D-FFT             3    4     100002     111.366     3385.682
>  11.6
> >  PME 3D-FFT Comm.       3    4     100002      55.347     1682.636
> 5.7
> >  PME solve              3    4      50001       8.199      249.246
> 0.9
> >
> -----------------------------------------------------------------------------
> >
> >                Core t (s)   Wall t (s)        (%)
> >        Time:    11533.900      963.899     1196.6
> >                  (ns/day)    (hour/ns)
> > Performance:        8.964        2.677
> > Finished mdrun on node 0 Sun Dec  8 11:04:48 2013
> >
> >
> >
> > And I set rlist = rvdw = rcoulomb = 1.0.
> >
> > Is there any documentation that details what those values, e.g. VdW
> [V&F] ,
> > mean?
> >
> > Thanks,
> > Yun
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list