<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Hi,<br>

      <br>

      Why not simply use MPI parallelization?<br>

      <br>

      But what (exotic) architecture does not support OpenMP and SIMD?

      If you don't have SIMD, I would think it's not worth using it for

      production. You get great performance from a cheap Intel CPU +

      NVidia GPU machine.<br>

      <br>

      Cheers,<br>

      <br>

      Berk<br>

      <br>

      On 2015-11-24 06:05, Vinson Leung wrote:<br>

    </div>

    <blockquote

cite="mid:CAAvvnSriKP+ZsNRrv1qLg1hF5BYqGpYcbnUfYdgHrxFv-Rj-ww@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <div dir="ltr">Hi everyone, I am new learner to Gromacs and I want

        to implement Gromacs in a multi-core CPU machine which is  for

        my research. Because the machine we use only support MPI (no

        openmp, no SIMD), so I profiled the MPI-only version of

        Gromacs-5.0.4 and found that the hotspot was nbnxn_kernel_ref()

        in src/gromacs/mdlib/nbnxn_kernel_ref.c which occupied 80% of

        the total running time. Naturally I want to accelerate the

        nbnxn_kernel_ref() by parallelization with multi-thread. After I

        simply make some analysis and found that the structure

        of nbnxn_kernel_ref() is like below:

        <div>========================================================</div>

        <div>for (nb = 0 ; nb &lt; nnbl; nb++)</div>

        <div>{</div>

        <div>......</div>

        <div>      for( n = 0 ; n &lt; nbl-&gt;nci ; n++ )  // defined

          in nbnxn_kernel_ref_outer.h</div>

        <div>      {</div>

        <div>      ....</div>

        <div>      }</div>

        <div>...h</div>

        <div>}</div>

        <div>========================================================</div>

        <div>So here is my quesion. When I compile with OpenMP=OFF, the

          value of nnbl is 1 during the runtime.  So  can I  parallelize

          the inner loop by just evenly separate the inner loop  (

          nbl-&gt;nci )  to multi-core? </div>

        <div>Or I found that when I compile with OpenMP=ON the value of

          nnbl is not 1 which I can parallelize the outer loop with

          multi-thread. But my machine does not support OpenMP. So is

          there any way to make some modification in the code and

          compile with OpenMP=OFF to make the value of nnbl is not 1?</div>

        <div>Thanks.</div>

        <div><br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

    </blockquote>

    <br>

  </body>

</html>