<div class="gmail_quote"><div>Hi,<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Anybody have any real-world comparisons of using MKL vs. FFTW3?<br>

</blockquote><div><br><a href="http://www.quantumespresso.org/user_guide/node16.html">http://www.quantumespresso.org/user_guide/node16.html</a> says: <br><br><div style="margin-left: 40px;">Axel Kohlmeyer suggests the following (April 2008): 

&quot;(I&#39;ve) found that Intel is now turning on multithreading without any

warning and that is for example why their FFT seems faster than

FFTW. For serial and OpenMP based runs this makes no difference (in

fact the multi-threaded FFT helps), but if you run MPI locally, you

actually lose performance. Also if you use the &#39;numactl&#39; tool on linux

to bind a job to a specific cpu core, MKL will still try to use all

available cores (and slow down badly). The cleanest way of avoiding

this mess is to either link with

</div><pre style="margin-left: 40px;">-lmkl_intel_lp64 -lmkl_sequential -lmkl_core (on 64-bit: x86_64, ia64)<br>-lmkl_intel -lmkl_sequential -lmkl_core (on 32-bit, i.e. ia32 )<br></pre><div style="margin-left: 40px;">

or edit the libmkl_&#39;platform&#39;.a file (I&#39;m using now a file libmkl10.a with:

</div><pre style="margin-left: 40px;">  GROUP (libmkl_intel_lp64.a libmkl_sequential.a libmkl_core.a)<br></pre><div style="margin-left: 40px;">

It works like a charm&quot;.

<br></div><br>So, this might contribute to your problem. Please tell us if Axel&#39;s suggestion works for you!<br><br>Best regards,<br>Vasilii<br></div></div>