<div dir="ltr">Hi,<div><br></div><div>Can you create a redmine issue and upload the tpr and the "mdrun -version" output for both? Is the performance only worse with GPU or also without? What about if you use latest release-5-0 branch (fine if you can use the version with my patch from the previous email)? </div>
<div><br></div><div>Roland</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jun 26, 2014 at 6:07 PM, Mirco Wahab <span dir="ltr"><<a href="mailto:mirco.wahab@chemie.tu-freiberg.de" target="_blank">mirco.wahab@chemie.tu-freiberg.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Performance test on a large system:<br>
<br>
2.4 x 10^6 particles,<br>
MARTINI vesicle in water<br>
GTX-660Ti, 6-core Phenom II X6<br>
<br>
- nstlist = 40<br>
- rlist = 2.4<br>
- coulombtype = Reaction-Field<br>
- cutoff-scheme = verlet<br>
- coulomb-modifier = Potential-shift<br>
- epsilon_rf = 0<br>
- verlet-buffer-drift = 0.005<br>
- rcoulomb = 1.1<br>
- rcoulomb_switch = 0.0<br>
- epsilon_r = 15<br>
- vdw_type = cut-off<br>
- rvdw_switch = 0.9<br>
- rvdw = 1.1<br>
- vdw-modifier = Potential-shift<br>
- tcoupl = v-rescale ; Berendsen<br>
- tc-grps = DPPC BSCHX W<br>
- tau_t = 1.0 1.0 1.0<br>
- ref_t = 315 315 315<br>
- Pcoupl = Berendsen<br>
- Pcoupltype = isotropic<br>
- tau_p = 6<br>
<br>
Both tests start *from the same tpr* (generated w/4.6.3)<br>
4.6.3 8.363 ns/day<br>
5.0.rc1 6.604 ns/day<br>
<br>
log file summaries here ==><br>
<br>
======================= 4.6.3 ========================================<br>
R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
<br>
Computing: Nodes Th. Count Wall t (s) G-Cycles %<br>
-----------------------------------------------------------------------------<br>
Neighbor search 1 6 38 20.559 395.924 6.7<br>
Launch GPU ops. 1 6 1481 0.439 8.445 0.1<br>
Force 1 6 1481 60.728 1169.515 19.8<br>
Wait GPU local 1 6 1481 57.421 1105.832 18.8<br>
NB X/F buffer ops. 1 6 2924 53.788 1035.867 17.6<br>
Write traj. 1 6 2 2.084 40.137 0.7<br>
Update 1 6 1481 22.860 440.249 7.5<br>
Constraints 1 6 1481 57.303 1103.547 18.7<br>
Rest 1 30.818 593.509 10.1<br>
-----------------------------------------------------------------------------<br>
Total 1 306.000 5893.027 100.0<br>
-----------------------------------------------------------------------------<br>
<br>
GPU timings<br>
-----------------------------------------------------------------------------<br>
Computing: Count Wall t (s) ms/step %<br>
-----------------------------------------------------------------------------<br>
Pair list H2D 38 0.640 16.844 0.5<br>
X / q H2D 1481 11.715 7.910 10.0<br>
Nonbonded F kernel 1436 95.190 66.288 81.1<br>
Nonbonded F+ene k. 7 0.473 67.625 0.4<br>
Nonbonded F+ene+prune k. 38 2.723 71.645 2.3<br>
F D2H 1481 6.648 4.489 5.7<br>
-----------------------------------------------------------------------------<br>
Total 117.389 79.263 100.0<br>
-----------------------------------------------------------------------------<br>
Force evaluation time GPU/CPU: 79.263 ms/41.005 ms = 1.933<br>
<br>
<br>
======================= 5.0rc1 ========================================<br>
<br>
On 1 MPI rank, each using 6 OpenMP threads<br>
<br>
Computing: Num Num Call Wall time Giga-Cycles<br>
Nodes Threads Count (s) total sum %<br>
-----------------------------------------------------------------------------<br>
Neighbor search 1 6 69 43.134 906.243 6.1<br>
Launch GPU ops. 1 6 2721 0.931 19.563 0.1<br>
Force 1 6 2721 124.095 2607.240 17.4<br>
Wait GPU local 1 6 2721 167.917 3527.955 23.6<br>
NB X/F buffer ops. 1 6 5373 140.788 2957.975 19.8<br>
Write traj. 1 6 2 2.346 49.282 0.3<br>
Update 1 6 2721 59.083 1241.340 8.3<br>
Constraints 1 6 2721 111.926 2351.573 15.7<br>
Rest 61.781 1298.019 8.7<br>
-----------------------------------------------------------------------------<br>
Total 712.000 14959.191 100.0<br>
-----------------------------------------------------------------------------<br>
<br>
GPU timings<br>
-----------------------------------------------------------------------------<br>
Computing: Count Wall t (s) ms/step %<br>
-----------------------------------------------------------------------------<br>
Pair list H2D 69 1.555 22.542 0.5<br>
X / q H2D 2721 27.089 9.955 9.3<br>
Nonbonded F kernel 2638 240.172 91.043 82.7<br>
Nonbonded F+ene k. 14 1.324 94.599 0.5<br>
Nonbonded F+ene+prune k. 69 7.308 105.912 2.5<br>
F D2H 2721 13.105 4.816 4.5<br>
-----------------------------------------------------------------------------<br>
Total 290.554 106.782 100.0<br>
-----------------------------------------------------------------------------<br>
Force evaluation time GPU/CPU: 106.782 ms/45.606 ms = 2.341<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
<br>
--<br>
Gromacs Developers mailing list<br>
<br>
* Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a> before posting!<br>
<br>
* Can't post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br>
<br>
* For (un)subscribe requests visit<br>
<a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers" target="_blank">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a> or send a mail to <a href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>865-241-1537, ORNL PO BOX 2008 MS6309
</div>