Hi,<div><br></div><div>you should have gotten:</div><div><div>* WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *</div><div>We have just committed the new CPU detection code in this branch,</div><div>and will commit new SSE/AVX kernels in a few days. However, this</div>
<div>means that currently only the NxN kernels are accelerated!</div><div>In the mean time, you might want to avoid production runs in 4.6.</div><div><br></div><div>Roland</div><br><div class="gmail_quote">On Thu, Jun 21, 2012 at 12:22 PM, Alexey Shvetsov <span dir="ltr"><<a href="mailto:alexxy@omrb.pnpi.spb.ru" target="_blank">alexxy@omrb.pnpi.spb.ru</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all!<br>
<br>
After merging commit<br>
commit 5ba7125c5972f2aafde2310eaa4a345cbac55da5<br>
Author: Erik Lindahl <<a href="mailto:erik@kth.se">erik@kth.se</a>><br>
Date: Mon May 28 20:54:17 2012 +0200<br>
<br>
New CPU detection & AVX/SSE code, removed raw assembly files.<br>
<br>
I noticed regression in gromacs speed. I used two systems for tests one<br>
7bna and second speptide froma examples<br>
<br>
For 7bna system old 4.6 version 4.6-dev-20120418-3759a-dirty-unknown<br>
gives<br>
R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
<br>
Computing: Nodes Number G-Cycles Seconds %<br>
-----------------------------------------------------------------------<br>
Domain decomp. 16 5000 502.025 335.5 1.5<br>
DD comm. load 16 5000 4.309 2.9 0.0<br>
DD comm. bounds 16 5000 13.941 9.3 0.0<br>
Comm. coord. 16 50001 497.769 332.7 1.5<br>
Neighbor search 16 5001 1630.241 1089.6 4.8<br>
Force 16 50001 23079.690 15425.1 67.3<br>
Wait + Comm. F 16 50001 618.862 413.6 1.8<br>
PME mesh 16 50001 6564.978 4387.7 19.1<br>
Write traj. 16 101 16.666 11.1 0.0<br>
Update 16 50001 384.280 256.8 1.1<br>
Constraints 16 50001 592.154 395.8 1.7<br>
Comm. energies 16 5001 125.537 83.9 0.4<br>
Rest 16 256.227 171.2 0.7<br>
-----------------------------------------------------------------------<br>
Total 16 34286.680 22915.3 100.0<br>
-----------------------------------------------------------------------<br>
-----------------------------------------------------------------------<br>
PME redist. X/F 16 100002 1176.273 786.2 3.4<br>
PME spread/gather 16 100002 2119.858 1416.8 6.2<br>
PME 3D-FFT 16 100002 1041.014 695.8 3.0<br>
PME 3D-FFT Comm. 16 200004 1905.967 1273.8 5.6<br>
PME solve 16 50001 316.714 211.7 0.9<br>
-----------------------------------------------------------------------<br>
<br>
Parallel run - timing based on wallclock.<br>
<br>
NODE (s) Real (s) (%)<br>
Time: 716.102 716.102 100.0<br>
11:56<br>
(Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>
Performance: 1482.789 73.686 12.066 1.989<br>
<br>
New version 4.6-dev-20120618-283a0e5-dirty-unknown with sse4.1<br>
acceleration enabled gives only<br>
R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
<br>
Computing: Nodes Number G-Cycles Seconds %<br>
-----------------------------------------------------------------------<br>
Domain decomp. 16 5000 503.648 336.6 0.5<br>
DD comm. load 16 5000 5.666 3.8 0.0<br>
DD comm. bounds 16 5000 11.637 7.8 0.0<br>
Comm. coord. 16 50001 480.473 321.1 0.4<br>
Neighbor search 16 5001 1665.565 1113.2 1.5<br>
Force 16 50001 98860.466 66073.0 89.0<br>
Wait + Comm. F 16 50001 608.138 406.4 0.5<br>
PME mesh 16 50001 7605.687 5083.2 6.8<br>
Write traj. 16 103 17.010 11.4 0.0<br>
Update 16 50001 383.590 256.4 0.3<br>
Constraints 16 50001 582.954 389.6 0.5<br>
Comm. energies 16 5001 132.665 88.7 0.1<br>
Rest 16 257.063 171.8 0.2<br>
-----------------------------------------------------------------------<br>
Total 16 111114.560 74263.0 100.0<br>
-----------------------------------------------------------------------<br>
-----------------------------------------------------------------------<br>
PME redist. X/F 16 100002 2258.309 1509.3 2.0<br>
PME spread/gather 16 100002 2111.979 1411.5 1.9<br>
PME 3D-FFT 16 100002 1046.271 699.3 0.9<br>
PME 3D-FFT Comm. 16 200004 1854.221 1239.3 1.7<br>
PME solve 16 50001 329.985 220.5 0.3<br>
-----------------------------------------------------------------------<br>
<br>
Parallel run - timing based on wallclock.<br>
<br>
NODE (s) Real (s) (%)<br>
Time: 2320.719 2320.719 100.0<br>
38:40<br>
(Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>
Performance: 457.569 22.739 3.723 6.446<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
--<br>
Best Regards,<br>
Alexey 'Alexxy' Shvetsov<br>
Petersburg Nuclear Physics Institute, NRC Kurchatov Institute,<br>
Gatchina, Russia<br>
Department of Molecular and Radiation Biophysics<br>
Gentoo Team Ru<br>
Gentoo Linux Dev<br>
mailto:<a href="mailto:alexxyum@gmail.com">alexxyum@gmail.com</a><br>
mailto:<a href="mailto:alexxy@gentoo.org">alexxy@gentoo.org</a><br>
mailto:<a href="mailto:alexxy@omrb.pnpi.spb.ru">alexxy@omrb.pnpi.spb.ru</a><br>
--<br>
gmx-developers mailing list<br>
<a href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>
<a href="http://lists.gromacs.org/mailman/listinfo/gmx-developers" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
Please don't post (un)subscribe requests to the list. Use the<br>
www interface or send it to <a href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>
<br>
<br>
<br>
<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>865-241-1537, ORNL PO BOX 2008 MS6309<br>
</div>