<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    On 8/02/2011 1:48 AM, Qiong Zhang wrote:

    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"

      type="cite">

      <table border="0" cellpadding="0" cellspacing="0">

        <tbody>

          <tr>

            <td style="font: inherit;" valign="top"><br>

              Hi Mark,<br>

              <br>

              Many thanks for your fast response!<br>

              <br>

              <!--[if gte mso 9]><xml>

 <w:WordDocument>

  <w:View>Normal</w:View>

  <w:Zoom>0</w:Zoom>

  <w:PunctuationKerning/>

  <w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing>

  <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>

  <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>

  <w:ValidateAgainstSchemas/>

  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>

  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>

  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>

  <w:Compatibility>

   <w:SpaceForUL/>

   <w:BalanceSingleByteDoubleByteWidth/>

   <w:DoNotLeaveBackslashAlone/>

   <w:ULTrailSpace/>

   <w:DoNotExpandShiftReturn/>

   <w:AdjustLineHeightInTable/>

   <w:BreakWrappedTables/>

   <w:SnapToGridInCell/>

   <w:WrapTextWithPunct/>

   <w:UseAsianBreakRules/>

   <w:DontGrowAutofit/>

   <w:UseFELayout/>

  </w:Compatibility>

  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>

 </w:WordDocument>

</xml><![endif]--><!--[if gte mso 9]><xml>

 <w:LatentStyles DefLockedState="false" LatentStyleCount="156">

 </w:LatentStyles>

</xml><![endif]--><!--[if gte mso 10]>

<style>

 /* Style Definitions */

 table.MsoNormalTable

        {mso-style-name:普通表格;

        mso-tstyle-rowband-size:0;

        mso-tstyle-colband-size:0;

        mso-style-noshow:yes;

        mso-style-parent:"";

        mso-padding-alt:0cm 5.4pt 0cm 5.4pt;

        mso-para-margin:0cm;

        mso-para-margin-bottom:.0001pt;

        mso-pagination:widow-orphan;

        font-size:10.0pt;

        font-family:"Times New Roman";

        mso-ansi-language:#0400;

        mso-fareast-language:#0400;

        mso-bidi-language:#0400;}

</style>

<![endif]-->

              <p class="MsoNormal"><i style=""><span lang="EN-US">What's

                    the network hardware? Can other machine load

                    influence your network

                    performance?</span></i></p>

              <p class="MsoNormal"><span lang="EN-US">The supercomputer

                  system is based on the

                  Cray Gemini interconnect technology. I suppose this is

                  a fast network hardware...</span></p>

              <p class="MsoNormal"><br>

                <span lang="EN-US"></span></p>

              <p class="MsoNormal"><i style=""><span lang="EN-US">Are

                    the systems in the NVT ensemble? Use diff to check

                    the .mdp files differ only

                    how you think they do.</span></i></p>

              <p class="MsoNormal"><span lang="EN-US">The systems are in

                  NPT ensemble. I saw some

                  discussions on the mailing list that NPT ensemble is

                  superior to NVT ensemble

                  for REMD. And the .mdp files differ only in the

                  temperature.</span></p>

            </td>

          </tr>

        </tbody>

      </table>

    </blockquote>

    <br>

    Maybe so, but under NPT the density varies with T, and so with

    replica. This means the size of neighbour lists varies, and the cost

    of the computation (PME or not) varies. The generalized ensemble is

    limited by the progress of the slowest replica. If using PME, in

    theory, you can juggle the contribution of the various terms to

    balance the computation load across the replicas, but this is not

    easy to do.<br>

    <span lang="EN-US"> </span>

    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"

      type="cite">

      <table border="0" cellpadding="0" cellspacing="0">

        <tbody>

          <tr>

            <td style="font: inherit;" valign="top">

              <p class="MsoNormal"><i style=""><span lang="EN-US">What

                    are the values of nstlist and<a

                      moz-do-not-send="true" name="OLE_LINK12">

                      nstcalcenergy</a>?</span></i></p>

              <p class="MsoNormal"><span lang="EN-US">Previously,

                  nstlist=5</span>, <a moz-do-not-send="true"

                  style="background-color: rgb(255, 255, 255);"

                  name="OLE_LINK15"><span style=""><span style=""><span

                        style="background-image: none;

                        background-repeat: repeat;

                        background-attachment: scroll;

                        background-position: 0% 0%;

                        -moz-background-clip: border;

                        -moz-background-origin: padding;

                        -moz-background-inline-policy: continuous;"

                        lang="EN-US">nstcalcenergy</span></span></span></a><span

                  style="background-color: rgb(255, 255, 255);"><span

                    style="background-image: none; background-repeat:

                    repeat; background-attachment: scroll;

                    background-position: 0% 0%; -moz-background-clip:

                    border; -moz-background-origin: padding;

                    -moz-background-inline-policy: continuous;"

                    lang="EN-US">=1</span></span></p>

              <span style=""></span>

              <p class="MsoNormal"><span style="font-size: 11pt;

                  font-family: NimbusRomNo9L-Regu;" lang="EN-US">Thank

                  you for

                  pointing this out. I checked the manual again that

                  this option affects the

                  performance in parallel simulations because

                  calculating energies requires global

                  communication between all processes. So I have set

                  this option to -1 this time.

                  This should be one reason for the low parallel

                  efficiency.</span></p>

              <p class="MsoNormal"><span style="font-size: 11pt;

                  font-family: NimbusRomNo9L-Regu;" lang="EN-US">And

                  after I

                  changed </span><span style="background: none repeat

                  scroll 0% 0% rgb(255, 255, 255);

                  -moz-background-inline-policy: continuous;"

                  lang="EN-US">nstcalcenergy=</span><span lang="EN-US">-1,

                  I</span><span style="font-size: 11pt; font-family:

                  NimbusRomNo9L-Regu;" lang="EN-US"> found there was a

                  3% improvement on the efficiency compared with those

                  when

                </span><span lang="EN-US">nstcalcenergy=1.</span></p>

            </td>

          </tr>

        </tbody>

      </table>

    </blockquote>

    <br>

    Yep. nstpcouple and nsttcouple also influence this.<br>

    <span lang="EN-US"> <br>

    </span>

    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"

      type="cite">

      <table border="0" cellpadding="0" cellspacing="0">

        <tbody>

          <tr>

            <td style="font: inherit;" valign="top">

              <p style="font-style: italic;" class="MsoNormal"><span

                  lang="EN-US">Take a look at the execution time

                  breakdown

                  at the end of the .log files, and do so for more than

                  one replica. With the

                  current implementation, every simulation has to

                  synchronize and communicate

                  every handful of steps, which means that large scale

                  parallelism won't work

                  efficiently unless you have<a moz-do-not-send="true"

                    name="OLE_LINK17"> fast network hardware</a> that

                  is dedicated to your job. This effect shows up in the

                  "Rest" row of

                  the time breakdown. With <a moz-do-not-send="true"

                    name="OLE_LINK14"></a><a moz-do-not-send="true"

                    name="OLE_LINK13"><span style="">Infiniband</span></a>,

                  I'd expect you should

                  only be losing about 10% of the run time total. The

                  30-fold loss you have upon

                  going from 24-&gt;42 replicas keeping 4 CPUs/replica

                  suggests some other

                  contribution, however.</span></p>

              <p class="MsoNormal"><span lang="EN-US"> </span></p>

              <p class="MsoNormal"><span lang="EN-US">I checked the time

                  breakdown in the log

                  files for short REMD simulations. For the REMD

                  simulaiton with 168 cores for 42

                  replicas, as you see below, the “Rest” makes up as

                  surprisingly high as <b style=""><u>96.6%</u></b> of

                  the time for one of the

                  replicas. This parameter is almost the same level for

                  the other replicas. For

                  the REMD simulation with 96 cores for 24 replicas, the

                  “Rest” takes up about

                  24%. I was also aware of your post: </span></p>

              <p class="MsoNormal"><span lang="EN-US"><a

                    moz-do-not-send="true"

                    href="http://www.mail-archive.com/gmx-users@gromacs.org/msg37507.html">http://www.mail-archive.com/gmx-users@gromacs.org/msg37507.html</a></span></p>

              <p class="MsoNormal"><span lang="EN-US">As you suggested

                  such big loss should be

                  ascribed to other factors. Do you think it is the

                  network hardware to blame or

                  there are other reasons please? Any suggestion would

                  be greatly appreciated<br>

                </span></p>

            </td>

          </tr>

        </tbody>

      </table>

    </blockquote>

    <br>

    I expect the load imbalance across replicas is partly to blame. Look

    at the sum of Force + PME mesh (in seconds) across the generalized

    ensemble. That's where the simulation work is all done, and I expect

    your low-temperature replicas are doing much more work than your

    high-temperature replicas. Unfortunately 4.5.3 doesn't allow the

    user to know enough detail here. Future versions of GROMACS will -

    work in progress.<br>

    <br>

    Strictly, though, your rate-limiting lowest temperature replica in

    the 24-replica regime should take an amount of time comparable to

    that of the lowest in the 42-replica regime (22K difference is not

    that significant) - and similar to a run other than as part of a

    replica-exchange simulation. Your reported data is not consistent

    with that, so I think your jobs are also experiencing differing

    degrees of network or filesystem contention at different times. Your

    sysadmins can comment on that.<br>

    <br>

    Mark<br>

    <br>

    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"

      type="cite">

      <table border="0" cellpadding="0" cellspacing="0">

        <tbody>

          <tr>

            <td style="font: inherit;" valign="top">

              <p class="MsoNormal"><span lang="EN-US"> </span></p>

              <p class="MsoNormal"><span lang="EN-US">Computing:<span

                    style="">         </span>Nodes<span style="">     </span>Number<span

                    style="">    

                  </span>G-Cycles<span style="">    </span>Seconds<span

                    style="">     </span>%</span></p>

              <p class="MsoNormal"><span lang="EN-US">-----------------------------------------------------------------------</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Domain

                  decomp.<span style="">         </span>4<span style="">       

                  </span>442<span style="">   

                  </span><span style="">    </span>2.604<span style="">       

                  </span>1.2<span style="">    

                  </span>0.0</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>DD

comm.

                  load<span style="">          </span>4<span style="">         

                  </span>6<span style="">        </span>0.001<span

                    style="">        </span>0.0<span style="">    

                  </span>0.0</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Comm.

                  coord.<span style="">           </span>4<span

                    style="">       </span>2201<span style="">        </span>1.145<span

                    style="">        </span>0.5<span style="">    

                  </span>0.0</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Neighbor

                  search<span style="">        </span>4<span style="">       

                  </span>442<span style="">       </span>14.964<span

                    style="">        </span>7.1<span style="">    

                  </span>0.2</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Force<span

                    style="">              </span><span style="">    </span>4<span

                    style="">      

                  </span>2201<span style="">      </span>175.303<span

                    style="">       </span>83.5<span style="">    

                  </span>2.0</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Wait

+

                  Comm. F<span style="">         </span>4<span style="">      

                  </span>2201<span style="">        </span>1.245<span

                    style="">        </span>0.6<span style="">    

                  </span>0.0</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>PME

                  mesh<span style="">               </span>4<span

                    style="">       </span>2201<span style="">       </span>30.314<span

                    style="">       </span>14.4<span style="">    

                  </span>0.3</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Write

                  traj.<span style="">            </span>4<span

                    style="">         </span>11<span style="">       </span>17.346<span

                    style="">        </span>8.3<span style="">    

                  </span>0.2</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Update<span

                    style="">                 </span>4<span style="">      

                  </span>2201<span style="">        </span>2.004<span

                    style="">        </span>1.0<span style="">    

                  </span>0.0</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Constraints<span

                    style="">            </span>4<span style="">      

                  </span>2201<span style="">       </span>26.593<span

                    style="">       </span>12.7<span style="">    

                  </span>0.3</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Comm.

                  energies<span style="">         </span>4<span

                    style="">        </span>442<span style="">       </span>28.722<span

                    style="">       </span>13.7<span style="">    

                  </span>0.3</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Rest<span

                    style="">                   </span>4<span style="">               

                  </span>8426.029<span style="">     </span>4012.4<span

                    style="">   

                  </span>96.6</span></p>

              <p class="MsoNormal"><span lang="EN-US">-----------------------------------------------------------------------</span></p>

              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Total<span

                    style="">                  </span>4<span style="">               

                  </span>8726.270<span style="">     </span>4155.4<span

                    style="">  

                  </span>100.0</span></p>

              <br>

              <br>

              Qiong<br>

              <br>

              On 7/02/2011 9:52 PM, Qiong Zhang wrote:

              <blockquote type="cite">

                <table border="0" cellpadding="0" cellspacing="0">

                  <tbody>

                    <tr>

                      <td style="font: inherit;" valign="top">

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">Dear all gmx-users,</span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US"> </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">I have </span><span

                            lang="EN-US">recently </span><span

                            lang="EN-US">been testing the REMD

                            simulations. I was running simulations on a

                            supercomputer system<span

                              class="yiv1366269415highlightedsearchterm">

                            </span>ba<span

                              class="yiv1366269415highlightedsearchterm">se</span>d

                            on the AMD Opteron 12-core (2.1 GHz)

                            processors. The Gromacs 4.5.3 version was

                            used.</span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US"> </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">I have a system of 5172 atoms,

                            of which 138 atoms belong to solute and the

                            other are water molecules. An exponential

                            distribution of temperatures was generated

                            ranging from 276 to 515 K in total of 42

                            replicas or from 298 to 420 K in total of 24

                            replicas, ensuring that the exchange ratio

                            between all adjacent replicas is about 0.25.

                            The replica exchange was carried out every

                            0.5ps. The integrate step size was 2fs.</span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US"> </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">For the above system, when REMD

                            is simulated over 24 replicas, the

                            simulation speed is reasonably fast.

                            However, when REMD is simulated over 42

                            replicas, the simulation speed is awfully

                            slow.Please see the following table for the

                            speed.<br>

                          </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">----------------------------------------------------------------------------</span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">Replica number<span style="">   

                            </span>CPU number<span style="">     </span>speed</span></p>

                        <p class="yiv1366269415MsoNormal"

                          style="margin-left: 90pt;"><span style=""

                            lang="EN-US"><span style="">24<span style="">                                                    

                              </span></span></span><span lang="EN-US">96<span

                              style="">             </span>58015steps/15minutes</span></p>

                        <p class="yiv1366269415MsoNormal"

                          style="margin-left: 90pt;"><span style=""

                            lang="EN-US"><span style="">42<span style="">                                                    

                              </span></span></span><span lang="EN-US">42<span

                              style="">  </span><span style="">           </span><a

                              moz-do-not-send="true" rel="nofollow"

                              name="OLE_LINK5">865steps/15minutes</a></span></p>

                        <p class="yiv1366269415MsoNormal"

                          style="margin-left: 90pt;"><span style=""

                            lang="EN-US"><span style="">42<span style="">                                                    

                              </span></span></span><span lang="EN-US">84<span

                              style="">             </span>1175<a

                              moz-do-not-send="true" rel="nofollow"

                              name="OLE_LINK7">steps/15minutes</a></span></p>

                        <p class="yiv1366269415MsoNormal"

                          style="margin-left: 84.75pt;"><span style=""

                            lang="EN-US"><span style="">42<span style="">                                                 

                              </span></span></span><span lang="EN-US">168<span

                              style="">             </span>1875steps/15minutes</span></p>

                        <div style="border-style: none none solid;

                          border-color: windowtext; border-width: medium

                          medium 1pt; padding: 0cm 0cm 1pt;">

                          <p class="yiv1366269415MsoNormal"

                            style="border: medium none; padding: 0cm;"><span

                              lang="EN-US">42<span style="">           

                                              </span>336<span style="">           

                              </span>2855steps/15minutes</span></p>

                        </div>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US"> </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">The command line for the mdrun

                            is:</span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">aprun -n (CPU number here)

                            mdrun_d -s md.tpr -multi (replica number

                            here) -replex 250</span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US"> </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">My questions are :<br>

                          </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">1) why the REMD for the 42

                            replicas is so slow for the same system? <br>

                          </span></p>

                        <p class="yiv1366269415MsoNormal"><span

                            lang="EN-US">2) On what aspects can I

                            improve the operating efficiency please?<br>

                          </span></p>

                      </td>

                    </tr>

                  </tbody>

                </table>

              </blockquote>

              <br>

              What's the network hardware? Can other machine load

              influence your network performance?<br>

              <br>

              Are the systems in the NVT ensemble? Use diff to check the

              .mdp files differ only how you think they do.<br>

              <br>

              What are the values of nstlist and nstcalcenergy?<br>

              <br>

              Take a look at the execution time breakdown at the end of

              the .log files, and do so for more than one replica. With

              the current implementation, every simulation has to

              synchronize and communicate every handful of steps, which

              means that large scale parallelism won't work efficiently

              unless you have fast network hardware that is dedicated to

              your job. This effect shows up in the "Rest" row of the

              time breakdown. With Infiniband, I'd expect you should

              only be losing about 10% of the run time total. The

              30-fold loss you have upon going from 24-&gt;42 replicas

              keeping 4 CPUs/replica suggests some other contribution,

              however.<br>

              <br>

              Mark</td>

          </tr>

        </tbody>

      </table>

      <br>

    </blockquote>

    <br>

  </body>

</html>