<div dir="ltr">Hi,<div><br></div><div>Can you create a redmine issue and upload the tpr and the &quot;mdrun -version&quot; output for both? Is the performance only worse with GPU or also without? What about if you use latest release-5-0 branch (fine if you can use the version with my patch from the previous email)? </div>

<div><br></div><div>Roland</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jun 26, 2014 at 6:07 PM, Mirco Wahab <span dir="ltr">&lt;<a href="mailto:mirco.wahab@chemie.tu-freiberg.de" target="_blank">mirco.wahab@chemie.tu-freiberg.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Performance test on a large system:<br>
<br>
  2.4 x 10^6 particles,<br>
  MARTINI vesicle in water<br>
  GTX-660Ti, 6-core Phenom II X6<br>
<br>
  - nstlist              = 40<br>
  - rlist                = 2.4<br>
  - coulombtype          = Reaction-Field<br>
  - cutoff-scheme        = verlet<br>
  - coulomb-modifier     = Potential-shift<br>
  - epsilon_rf           = 0<br>
  - verlet-buffer-drift  = 0.005<br>
  - rcoulomb             = 1.1<br>
  - rcoulomb_switch      = 0.0<br>
  - epsilon_r            = 15<br>
  - vdw_type             = cut-off<br>
  - rvdw_switch          = 0.9<br>
  - rvdw                 = 1.1<br>
  - vdw-modifier         = Potential-shift<br>
  - tcoupl               = v-rescale    ; Berendsen<br>
  - tc-grps              = DPPC BSCHX W<br>
  - tau_t                = 1.0  1.0  1.0<br>
  - ref_t                = 315  315  315<br>
  - Pcoupl               = Berendsen<br>
  - Pcoupltype           = isotropic<br>
  - tau_p                = 6<br>
<br>
Both tests start *from the same tpr* (generated w/4.6.3)<br>
4.6.3    8.363 ns/day<br>
5.0.rc1  6.604 ns/day<br>
<br>
log file summaries here ==&gt;<br>
<br>
======================= 4.6.3 ========================================<br>
      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G<br>
<br>
  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %<br>
-----------------------------------------------------------------------------<br>
  Neighbor search        1    6         38      20.559      395.924     6.7<br>
  Launch GPU ops.        1    6       1481       0.439        8.445     0.1<br>
  Force                  1    6       1481      60.728     1169.515    19.8<br>
  Wait GPU local         1    6       1481      57.421     1105.832    18.8<br>
  NB X/F buffer ops.     1    6       2924      53.788     1035.867    17.6<br>
  Write traj.            1    6          2       2.084       40.137     0.7<br>
  Update                 1    6       1481      22.860      440.249     7.5<br>
  Constraints            1    6       1481      57.303     1103.547    18.7<br>
  Rest                   1                      30.818      593.509    10.1<br>
-----------------------------------------------------------------------------<br>
  Total                  1                     306.000     5893.027   100.0<br>
-----------------------------------------------------------------------------<br>
<br>
  GPU timings<br>
-----------------------------------------------------------------------------<br>
  Computing:                         Count  Wall t (s)      ms/step       %<br>
-----------------------------------------------------------------------------<br>
  Pair list H2D                         38       0.640       16.844     0.5<br>
  X / q H2D                           1481      11.715        7.910    10.0<br>
  Nonbonded F kernel                  1436      95.190       66.288    81.1<br>
  Nonbonded F+ene k.                     7       0.473       67.625     0.4<br>
  Nonbonded F+ene+prune k.              38       2.723       71.645     2.3<br>
  F D2H                               1481       6.648        4.489     5.7<br>
-----------------------------------------------------------------------------<br>
  Total                                        117.389       79.263   100.0<br>
-----------------------------------------------------------------------------<br>
Force evaluation time GPU/CPU: 79.263 ms/41.005 ms = 1.933<br>
<br>
<br>
======================= 5.0rc1 ========================================<br>
<br>
On 1 MPI rank, each using 6 OpenMP threads<br>
<br>
  Computing:          Num   Num      Call    Wall time         Giga-Cycles<br>
                      Nodes Threads  Count      (s)         total sum    %<br>
-----------------------------------------------------------------------------<br>
  Neighbor search        1    6         69      43.134        906.243   6.1<br>
  Launch GPU ops.        1    6       2721       0.931         19.563   0.1<br>
  Force                  1    6       2721     124.095       2607.240  17.4<br>
  Wait GPU local         1    6       2721     167.917       3527.955  23.6<br>
  NB X/F buffer ops.     1    6       5373     140.788       2957.975  19.8<br>
  Write traj.            1    6          2       2.346         49.282   0.3<br>
  Update                 1    6       2721      59.083       1241.340   8.3<br>
  Constraints            1    6       2721     111.926       2351.573  15.7<br>
  Rest                                          61.781       1298.019   8.7<br>
-----------------------------------------------------------------------------<br>
  Total                                        712.000      14959.191 100.0<br>
-----------------------------------------------------------------------------<br>
<br>
  GPU timings<br>
-----------------------------------------------------------------------------<br>
  Computing:                         Count  Wall t (s)      ms/step       %<br>
-----------------------------------------------------------------------------<br>
  Pair list H2D                         69       1.555       22.542     0.5<br>
  X / q H2D                           2721      27.089        9.955     9.3<br>
  Nonbonded F kernel                  2638     240.172       91.043    82.7<br>
  Nonbonded F+ene k.                    14       1.324       94.599     0.5<br>
  Nonbonded F+ene+prune k.              69       7.308      105.912     2.5<br>
  F D2H                               2721      13.105        4.816     4.5<br>
-----------------------------------------------------------------------------<br>
  Total                                        290.554      106.782   100.0<br>
-----------------------------------------------------------------------------<br>
Force evaluation time GPU/CPU: 106.782 ms/45.606 ms = 2.341<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
<br>
--<br>
Gromacs Developers mailing list<br>
<br>
* Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a> before posting!<br>
<br>
* Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br>
<br>
* For (un)subscribe requests visit<br>
<a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers" target="_blank">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a> or send a mail to <a href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>


</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>865-241-1537, ORNL PO BOX 2008 MS6309
</div>