[gmx-users] low performance 2 GTX 980+ Intel CPU Core i7-5930K 3.5 GHz (2011-3)

Justin Lemkul jalemkul at vt.edu
Wed Dec 31 17:38:34 CET 2014



On 12/31/14 10:46 AM, Carlos Navarro Retamal wrote:
> Dear everyone,
> In order to check if my workstation were able to work with bigger systems, i ran a md simulation of a system of 265175 atoms, but sadly this was it performance with one GPU:
>
>>         P P   -   P M E   L O A D   B A L A N C I N G
>>
>>   PP/PME load balancing changed the cut-off and PME settings:
>>             particle-particle                    PME
>>              rcoulomb  rlist            grid      spacing   1/beta
>>     initial  1.400 nm  1.451 nm      96  96  84   0.156 nm  0.448 nm
>>     final    1.464 nm  1.515 nm      84  84  80   0.167 nm  0.469 nm
>>   cost-ratio           1.14             0.73
>>   (note that these numbers concern only part of the total PP and PME load)
>>
>>
>> M E G A - F L O P S   A C C O U N T I N G
>>
>>   NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>>   RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>>   W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>>   V&F=Potential and force  V=Potential only  F=Force only
>>
>>   Computing:                               M-Number         M-Flops  % Flops
>> -----------------------------------------------------------------------------
>>   NB VdW [V&F]                          9330.786612        9330.787     0.0
>>   Pair Search distance check           60538.981664      544850.835     0.0
>>   NxN Ewald Elec. + LJ [F]          23126654.798080  1526359216.673    96.9
>>   NxN Ewald Elec. + LJ [V&F]          234136.147904    25052567.826     1.6
>>   1,4 nonbonded interactions           13156.663128     1184099.682     0.1
>>   Calc Weights                         39777.045525     1431973.639     0.1
>>   Spread Q Bspline                    848576.971200     1697153.942     0.1
>>   Gather F Bspline                    848576.971200     5091461.827     0.3
>>   3D-FFT                             1079386.516464     8635092.132     0.5
>>   Solve PME                              353.070736       22596.527     0.0
>>   Shift-X                                331.733925        1990.404     0.0
>>   Propers                              13320.966414     3050501.309     0.2
>>   Impropers                              340.306806       70783.816     0.0
>>   Virial                                1326.365220       23874.574     0.0
>>   Stop-CM                                133.117850        1331.178     0.0
>>   Calc-Ekin                             2652.280350       71611.569     0.0
>>   Lincs                                 4966.549329      297992.960     0.0
>>   Lincs-Mat                           111969.439344      447877.757     0.0
>>   Constraint-V                         18222.114435      145776.915     0.0
>>   Constraint-Vir                        1325.795106       31819.083     0.0
>>   Settle                                2763.005259      892450.699     0.1
>>   (null)                                 116.802336           0.000     0.0
>> -----------------------------------------------------------------------------
>>   Total                                              1575064354.133   100.0
>> -----------------------------------------------------------------------------
>>
>>
>>       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>> On 1 MPI rank, each using 12 OpenMP threads
>>
>>   Computing:          Num   Num      Call    Wall time         Giga-Cycles
>>                       Ranks Threads  Count      (s)         total sum    %
>> -----------------------------------------------------------------------------
>>   Neighbor search        1   12       1251      27.117       1138.913   2.4
>>   Launch GPU ops.        1   12      50001       5.444        228.653   0.5
>>   Force                  1   12      50001     390.693      16409.109  34.0
>>   PME mesh               1   12      50001     443.170      18613.138  38.5
>>   Wait GPU local         1   12      50001       8.133        341.590   0.7
>>   NB X/F buffer ops.     1   12      98751      30.272       1271.429   2.6
>>   Write traj.            1   12         12       1.148         48.198   0.1
>>   Update                 1   12      50001      63.980       2687.175   5.6
>>   Constraints            1   12      50001     124.709       5237.788  10.8
>>   Rest                                          55.169       2317.087   4.8
>> -----------------------------------------------------------------------------
>>   Total                                       1149.836      48293.079 100.0
>> -----------------------------------------------------------------------------
>>   Breakdown of PME mesh computation
>> -----------------------------------------------------------------------------
>>   PME spread/gather      1   12     100002     358.298      15048.493  31.2
>>   PME 3D-FFT             1   12     100002      78.270       3287.334   6.8
>>   PME solve Elec         1   12      50001       6.221        261.268   0.5
>> -----------------------------------------------------------------------------
>>
>>   GPU timings
>> -----------------------------------------------------------------------------
>>   Computing:                         Count  Wall t (s)      ms/step       %
>> -----------------------------------------------------------------------------
>>   Pair list H2D                       1251       3.975        3.178     0.5
>>   X / q H2D                          50001      36.248        0.725     4.6
>>   Nonbonded F kernel                 45000     618.354       13.741    78.7
>>   Nonbonded F+ene k.                  3750      72.721       19.392     9.3
>>   Nonbonded F+ene+prune k.            1251      28.993       23.176     3.7
>>   F D2H                              50001      25.267        0.505     3.2
>> -----------------------------------------------------------------------------
>>   Total                                        785.559       15.711   100.0
>> -----------------------------------------------------------------------------
>>
>> Force evaluation time GPU/CPU: 15.711 ms/16.677 ms = 0.942
>> For optimal performance this ratio should be close to 1!
>>
>>                 Core t (s)   Wall t (s)        (%)
>>         Time:    13663.176     1149.836     1188.3
>>                   (ns/day)    (hour/ns)
>> Performance:        7.514        3.194
>> Finished mdrun on rank 0 Wed Dec 31 01:44:22 2014
>
>

This is consistent with what we see for similarly sized systems.

>
>
>
> i also noticed this at the beginning:
>
>> step   80: timed with pme grid 96 96 84, coulomb cutoff 1.400: 3287.7 M-cycles
>> step  160: timed with pme grid 84 84 80, coulomb cutoff 1.464: 3180.2 M-cycles
>> step  240: timed with pme grid 72 72 72, coulomb cutoff 1.708: 3948.2 M-cycles
>> step  320: timed with pme grid 96 96 84, coulomb cutoff 1.400: 3319.4 M-cycles
>> step  400: timed with pme grid 96 96 80, coulomb cutoff 1.435: 3213.8 M-cycles
>> step  480: timed with pme grid 84 84 80, coulomb cutoff 1.464: 3194.6 M-cycles
>> step  560: timed with pme grid 80 80 80, coulomb cutoff 1.537: 3343.4 M-cycles
>> step  640: timed with pme grid 80 80 72, coulomb cutoff 1.594: 3571.9 M-cycles
>>                optimal pme grid 84 84 80, coulomb cutoff 1.464

This is normal tuning by mdrun to get you optimal performance.

>>             Step           Time         Lambda
>>             5000       10.00000        0.00000
>
>
>
> and when i add the second graphic card the performance drop again to about 5-6ns/day
>
> This performance is really weird, because i got about ~5 ns/day in a different work station  (GTX 770 and a i7-4770).
> Is there something i’m missing regarding the correct use of a second GPU?

Note the following (from above):

"Force evaluation time GPU/CPU: 15.711 ms/16.677 ms = 0.942"

Adding another GPU won't help you.  Your bottleneck is the CPU.

-Justin

-- 
==================================================

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==================================================


More information about the gromacs.org_gmx-users mailing list