Hi Kent,<br><br>usually FFT is not a bottleneck for MD when run on one or a few processors. You can increase the FFT load slightly by using a small cut-off (rcoulomb in the mdp file) and a fine grid (fourierspacing in mdp). Typical one uses a minimum of rcoloumb 0.8 and fourierspacing of 1.1. But you could decrease fourierspacing further to see the effect on the FFT time.<br>

<br>FFT becomes the mayor bottleneck for parallel runs on more than a few hundred CPUs. I did some work on parallel FFT on Jaguar and Kraken. Let me know in case you are also interested in parallel FFT. Is it correct that the ACML only supports serial FFT so far? Do you plan to add an parallel FFT or an extension as for the linear algegra routines with AMD ScaLAPACK?<br>

<br>Roland<br><br><div class="gmail_quote">On Tue, Dec 9, 2008 at 2:40 PM, Knox, Kent <span dir="ltr">&lt;<a href="mailto:Kent.Knox@amd.com">Kent.Knox@amd.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hi all,<br>

<br>

I&#39;m interested in gromacs to find out how it uses FFT&#39;s. &nbsp;I&#39;ve<br>

downloaded the latest tarballs from the website<br>

<br>

Gromacs-4.0.2.tar.gz<br>

Gmxtest-3.3.3.tgz<br>

Gmxbench-3.0.tar.gz<br>

<br>

built everything single-threaded single-precision, grabbed a profile of<br>

the d.lzm benchmark, which I understand is the only included bench that<br>

uses FFT&#39;s.<br>

<br>

After Oprofiling the resulting executables, I am only finding about 2%<br>

of the time is spent in fft&#39;s.<br>

<br>

CPU: AMD64 family10, speed 2300 MHz (estimated)<br>

Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a<br>

unit mask of 0x00 (No unit mask) count 100000<br>

CPU_CLK_UNHALT...|<br>

 &nbsp;samples| &nbsp; &nbsp; &nbsp;%|<br>

------------------<br>

&nbsp;11992213 41.3899 mdrun<br>

 &nbsp;6312771 21.7879 libmd.so.5.0.0<br>

 &nbsp;5121677 17.6770 libgmx.so.5.0.0<br>

 &nbsp;3659315 12.6298 no-vmlinux<br>

 &nbsp;1201312 &nbsp;4.1462 <a href="http://libm-2.4.so" target="_blank">libm-2.4.so</a><br>

 &nbsp; 620405 &nbsp;2.1413 libfftw3f.so.3.2.2<br>

 &nbsp; &nbsp;43580 &nbsp;0.1504 oprofiled<br>

 &nbsp; &nbsp;20128 &nbsp;0.0695 <a href="http://libc-2.4.so" target="_blank">libc-2.4.so</a><br>

<br>

Does this sound typical? &nbsp;Are FFT&#39;s not a bottleneck? &nbsp;Maybe there are<br>

better workloads available that stress FFT computation?<br>

<br>

For disclosure, I do not know much about Molecular Dynamics, but am<br>

interested in interesting and useful workloads that stress math<br>

computation.<br>

<br>

Thanks in advance for any of your time.<br>

<br>

Kent Knox<br>

Member of Technical Staff; AMD ACML<br>

<br>

_______________________________________________<br>

gmx-developers mailing list<br>

<a href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>

<a href="http://www.gromacs.org/mailman/listinfo/gmx-developers" target="_blank">http://www.gromacs.org/mailman/listinfo/gmx-developers</a><br>

Please don&#39;t post (un)subscribe requests to the list. Use the<br>

www interface or send it to <a href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>

</blockquote></div><br><br clear="all"><br>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>