<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#ffffff">
    On 8/02/2011 1:48 AM, Qiong Zhang wrote:
    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"
      type="cite">
      <table border="0" cellpadding="0" cellspacing="0">
        <tbody>
          <tr>
            <td style="font: inherit;" valign="top"><br>
              Hi Mark,<br>
              <br>
              Many thanks for your fast response!<br>
              <br>
              <!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:PunctuationKerning/>
  <w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing>
  <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
  <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:Compatibility>
   <w:SpaceForUL/>
   <w:BalanceSingleByteDoubleByteWidth/>
   <w:DoNotLeaveBackslashAlone/>
   <w:ULTrailSpace/>
   <w:DoNotExpandShiftReturn/>
   <w:AdjustLineHeightInTable/>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
   <w:DontGrowAutofit/>
   <w:UseFELayout/>
  </w:Compatibility>
  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
 </w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" LatentStyleCount="156">
 </w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
        {mso-style-name:普通表格;
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
        mso-para-margin:0cm;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";
        mso-ansi-language:#0400;
        mso-fareast-language:#0400;
        mso-bidi-language:#0400;}
</style>
<![endif]-->
              <p class="MsoNormal"><i style=""><span lang="EN-US">What's
                    the network hardware? Can other machine load
                    influence your network
                    performance?</span></i></p>
              <p class="MsoNormal"><span lang="EN-US">The supercomputer
                  system is based on the
                  Cray Gemini interconnect technology. I suppose this is
                  a fast network hardware...</span></p>
              <p class="MsoNormal"><br>
                <span lang="EN-US"></span></p>
              <p class="MsoNormal"><i style=""><span lang="EN-US">Are
                    the systems in the NVT ensemble? Use diff to check
                    the .mdp files differ only
                    how you think they do.</span></i></p>
              <p class="MsoNormal"><span lang="EN-US">The systems are in
                  NPT ensemble. I saw some
                  discussions on the mailing list that NPT ensemble is
                  superior to NVT ensemble
                  for REMD. And the .mdp files differ only in the
                  temperature.</span></p>
            </td>
          </tr>
        </tbody>
      </table>
    </blockquote>
    <br>
    Maybe so, but under NPT the density varies with T, and so with
    replica. This means the size of neighbour lists varies, and the cost
    of the computation (PME or not) varies. The generalized ensemble is
    limited by the progress of the slowest replica. If using PME, in
    theory, you can juggle the contribution of the various terms to
    balance the computation load across the replicas, but this is not
    easy to do.<br>
    <span lang="EN-US"> </span>
    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"
      type="cite">
      <table border="0" cellpadding="0" cellspacing="0">
        <tbody>
          <tr>
            <td style="font: inherit;" valign="top">
              <p class="MsoNormal"><i style=""><span lang="EN-US">What
                    are the values of nstlist and<a
                      moz-do-not-send="true" name="OLE_LINK12">
                      nstcalcenergy</a>?</span></i></p>
              <p class="MsoNormal"><span lang="EN-US">Previously,
                  nstlist=5</span>, <a moz-do-not-send="true"
                  style="background-color: rgb(255, 255, 255);"
                  name="OLE_LINK15"><span style=""><span style=""><span
                        style="background-image: none;
                        background-repeat: repeat;
                        background-attachment: scroll;
                        background-position: 0% 0%;
                        -moz-background-clip: border;
                        -moz-background-origin: padding;
                        -moz-background-inline-policy: continuous;"
                        lang="EN-US">nstcalcenergy</span></span></span></a><span
                  style="background-color: rgb(255, 255, 255);"><span
                    style="background-image: none; background-repeat:
                    repeat; background-attachment: scroll;
                    background-position: 0% 0%; -moz-background-clip:
                    border; -moz-background-origin: padding;
                    -moz-background-inline-policy: continuous;"
                    lang="EN-US">=1</span></span></p>
              <span style=""></span>
              <p class="MsoNormal"><span style="font-size: 11pt;
                  font-family: NimbusRomNo9L-Regu;" lang="EN-US">Thank
                  you for
                  pointing this out. I checked the manual again that
                  this option affects the
                  performance in parallel simulations because
                  calculating energies requires global
                  communication between all processes. So I have set
                  this option to -1 this time.
                  This should be one reason for the low parallel
                  efficiency.</span></p>
              <p class="MsoNormal"><span style="font-size: 11pt;
                  font-family: NimbusRomNo9L-Regu;" lang="EN-US">And
                  after I
                  changed </span><span style="background: none repeat
                  scroll 0% 0% rgb(255, 255, 255);
                  -moz-background-inline-policy: continuous;"
                  lang="EN-US">nstcalcenergy=</span><span lang="EN-US">-1,
                  I</span><span style="font-size: 11pt; font-family:
                  NimbusRomNo9L-Regu;" lang="EN-US"> found there was a
                  3% improvement on the efficiency compared with those
                  when
                </span><span lang="EN-US">nstcalcenergy=1.</span></p>
            </td>
          </tr>
        </tbody>
      </table>
    </blockquote>
    <br>
    Yep. nstpcouple and nsttcouple also influence this.<br>
    <span lang="EN-US"> <br>
    </span>
    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"
      type="cite">
      <table border="0" cellpadding="0" cellspacing="0">
        <tbody>
          <tr>
            <td style="font: inherit;" valign="top">
              <p style="font-style: italic;" class="MsoNormal"><span
                  lang="EN-US">Take a look at the execution time
                  breakdown
                  at the end of the .log files, and do so for more than
                  one replica. With the
                  current implementation, every simulation has to
                  synchronize and communicate
                  every handful of steps, which means that large scale
                  parallelism won't work
                  efficiently unless you have<a moz-do-not-send="true"
                    name="OLE_LINK17"> fast network hardware</a> that
                  is dedicated to your job. This effect shows up in the
                  "Rest" row of
                  the time breakdown. With <a moz-do-not-send="true"
                    name="OLE_LINK14"></a><a moz-do-not-send="true"
                    name="OLE_LINK13"><span style="">Infiniband</span></a>,
                  I'd expect you should
                  only be losing about 10% of the run time total. The
                  30-fold loss you have upon
                  going from 24-&gt;42 replicas keeping 4 CPUs/replica
                  suggests some other
                  contribution, however.</span></p>
              <p class="MsoNormal"><span lang="EN-US"> </span></p>
              <p class="MsoNormal"><span lang="EN-US">I checked the time
                  breakdown in the log
                  files for short REMD simulations. For the REMD
                  simulaiton with 168 cores for 42
                  replicas, as you see below, the “Rest” makes up as
                  surprisingly high as <b style=""><u>96.6%</u></b> of
                  the time for one of the
                  replicas. This parameter is almost the same level for
                  the other replicas. For
                  the REMD simulation with 96 cores for 24 replicas, the
                  “Rest” takes up about
                  24%. I was also aware of your post: </span></p>
              <p class="MsoNormal"><span lang="EN-US"><a
                    moz-do-not-send="true"
                    href="http://www.mail-archive.com/gmx-users@gromacs.org/msg37507.html">http://www.mail-archive.com/gmx-users@gromacs.org/msg37507.html</a></span></p>
              <p class="MsoNormal"><span lang="EN-US">As you suggested
                  such big loss should be
                  ascribed to other factors. Do you think it is the
                  network hardware to blame or
                  there are other reasons please? Any suggestion would
                  be greatly appreciated<br>
                </span></p>
            </td>
          </tr>
        </tbody>
      </table>
    </blockquote>
    <br>
    I expect the load imbalance across replicas is partly to blame. Look
    at the sum of Force + PME mesh (in seconds) across the generalized
    ensemble. That's where the simulation work is all done, and I expect
    your low-temperature replicas are doing much more work than your
    high-temperature replicas. Unfortunately 4.5.3 doesn't allow the
    user to know enough detail here. Future versions of GROMACS will -
    work in progress.<br>
    <br>
    Strictly, though, your rate-limiting lowest temperature replica in
    the 24-replica regime should take an amount of time comparable to
    that of the lowest in the 42-replica regime (22K difference is not
    that significant) - and similar to a run other than as part of a
    replica-exchange simulation. Your reported data is not consistent
    with that, so I think your jobs are also experiencing differing
    degrees of network or filesystem contention at different times. Your
    sysadmins can comment on that.<br>
    <br>
    Mark<br>
    <br>
    <blockquote cite="mid:442280.57183.qm@web53804.mail.re2.yahoo.com"
      type="cite">
      <table border="0" cellpadding="0" cellspacing="0">
        <tbody>
          <tr>
            <td style="font: inherit;" valign="top">
              <p class="MsoNormal"><span lang="EN-US"> </span></p>
              <p class="MsoNormal"><span lang="EN-US">Computing:<span
                    style="">         </span>Nodes<span style="">     </span>Number<span
                    style="">    
                  </span>G-Cycles<span style="">    </span>Seconds<span
                    style="">     </span>%</span></p>
              <p class="MsoNormal"><span lang="EN-US">-----------------------------------------------------------------------</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Domain
                  decomp.<span style="">         </span>4<span style="">       
                  </span>442<span style="">   
                  </span><span style="">    </span>2.604<span style="">       
                  </span>1.2<span style="">    
                  </span>0.0</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>DD
comm.
                  load<span style="">          </span>4<span style="">         
                  </span>6<span style="">        </span>0.001<span
                    style="">        </span>0.0<span style="">    
                  </span>0.0</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Comm.
                  coord.<span style="">           </span>4<span
                    style="">       </span>2201<span style="">        </span>1.145<span
                    style="">        </span>0.5<span style="">    
                  </span>0.0</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Neighbor
                  search<span style="">        </span>4<span style="">       
                  </span>442<span style="">       </span>14.964<span
                    style="">        </span>7.1<span style="">    
                  </span>0.2</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Force<span
                    style="">              </span><span style="">    </span>4<span
                    style="">      
                  </span>2201<span style="">      </span>175.303<span
                    style="">       </span>83.5<span style="">    
                  </span>2.0</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Wait
+
                  Comm. F<span style="">         </span>4<span style="">      
                  </span>2201<span style="">        </span>1.245<span
                    style="">        </span>0.6<span style="">    
                  </span>0.0</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>PME
                  mesh<span style="">               </span>4<span
                    style="">       </span>2201<span style="">       </span>30.314<span
                    style="">       </span>14.4<span style="">    
                  </span>0.3</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Write
                  traj.<span style="">            </span>4<span
                    style="">         </span>11<span style="">       </span>17.346<span
                    style="">        </span>8.3<span style="">    
                  </span>0.2</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Update<span
                    style="">                 </span>4<span style="">      
                  </span>2201<span style="">        </span>2.004<span
                    style="">        </span>1.0<span style="">    
                  </span>0.0</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Constraints<span
                    style="">            </span>4<span style="">      
                  </span>2201<span style="">       </span>26.593<span
                    style="">       </span>12.7<span style="">    
                  </span>0.3</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Comm.
                  energies<span style="">         </span>4<span
                    style="">        </span>442<span style="">       </span>28.722<span
                    style="">       </span>13.7<span style="">    
                  </span>0.3</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Rest<span
                    style="">                   </span>4<span style="">               
                  </span>8426.029<span style="">     </span>4012.4<span
                    style="">   
                  </span>96.6</span></p>
              <p class="MsoNormal"><span lang="EN-US">-----------------------------------------------------------------------</span></p>
              <p class="MsoNormal"><span lang="EN-US"><span style=""> </span>Total<span
                    style="">                  </span>4<span style="">               
                  </span>8726.270<span style="">     </span>4155.4<span
                    style="">  
                  </span>100.0</span></p>
              <br>
              <br>
              Qiong<br>
              <br>
              On 7/02/2011 9:52 PM, Qiong Zhang wrote:
              <blockquote type="cite">
                <table border="0" cellpadding="0" cellspacing="0">
                  <tbody>
                    <tr>
                      <td style="font: inherit;" valign="top">
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">Dear all gmx-users,</span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US"> </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">I have </span><span
                            lang="EN-US">recently </span><span
                            lang="EN-US">been testing the REMD
                            simulations. I was running simulations on a
                            supercomputer system<span
                              class="yiv1366269415highlightedsearchterm">
                            </span>ba<span
                              class="yiv1366269415highlightedsearchterm">se</span>d
                            on the AMD Opteron 12-core (2.1 GHz)
                            processors. The Gromacs 4.5.3 version was
                            used.</span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US"> </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">I have a system of 5172 atoms,
                            of which 138 atoms belong to solute and the
                            other are water molecules. An exponential
                            distribution of temperatures was generated
                            ranging from 276 to 515 K in total of 42
                            replicas or from 298 to 420 K in total of 24
                            replicas, ensuring that the exchange ratio
                            between all adjacent replicas is about 0.25.
                            The replica exchange was carried out every
                            0.5ps. The integrate step size was 2fs.</span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US"> </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">For the above system, when REMD
                            is simulated over 24 replicas, the
                            simulation speed is reasonably fast.
                            However, when REMD is simulated over 42
                            replicas, the simulation speed is awfully
                            slow.Please see the following table for the
                            speed.<br>
                          </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">----------------------------------------------------------------------------</span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">Replica number<span style="">   
                            </span>CPU number<span style="">     </span>speed</span></p>
                        <p class="yiv1366269415MsoNormal"
                          style="margin-left: 90pt;"><span style=""
                            lang="EN-US"><span style="">24<span style="">                                                    

                              </span></span></span><span lang="EN-US">96<span
                              style="">             </span>58015steps/15minutes</span></p>
                        <p class="yiv1366269415MsoNormal"
                          style="margin-left: 90pt;"><span style=""
                            lang="EN-US"><span style="">42<span style="">                                                    

                              </span></span></span><span lang="EN-US">42<span
                              style="">  </span><span style="">           </span><a
                              moz-do-not-send="true" rel="nofollow"
                              name="OLE_LINK5">865steps/15minutes</a></span></p>
                        <p class="yiv1366269415MsoNormal"
                          style="margin-left: 90pt;"><span style=""
                            lang="EN-US"><span style="">42<span style="">                                                    

                              </span></span></span><span lang="EN-US">84<span
                              style="">             </span>1175<a
                              moz-do-not-send="true" rel="nofollow"
                              name="OLE_LINK7">steps/15minutes</a></span></p>
                        <p class="yiv1366269415MsoNormal"
                          style="margin-left: 84.75pt;"><span style=""
                            lang="EN-US"><span style="">42<span style="">                                                 

                              </span></span></span><span lang="EN-US">168<span
                              style="">             </span>1875steps/15minutes</span></p>
                        <div style="border-style: none none solid;
                          border-color: windowtext; border-width: medium
                          medium 1pt; padding: 0cm 0cm 1pt;">
                          <p class="yiv1366269415MsoNormal"
                            style="border: medium none; padding: 0cm;"><span
                              lang="EN-US">42<span style="">           
                                              </span>336<span style="">           

                              </span>2855steps/15minutes</span></p>
                        </div>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US"> </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">The command line for the mdrun
                            is:</span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">aprun -n (CPU number here)
                            mdrun_d -s md.tpr -multi (replica number
                            here) -replex 250</span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US"> </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">My questions are :<br>
                          </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">1) why the REMD for the 42
                            replicas is so slow for the same system? <br>
                          </span></p>
                        <p class="yiv1366269415MsoNormal"><span
                            lang="EN-US">2) On what aspects can I
                            improve the operating efficiency please?<br>
                          </span></p>
                      </td>
                    </tr>
                  </tbody>
                </table>
              </blockquote>
              <br>
              What's the network hardware? Can other machine load
              influence your network performance?<br>
              <br>
              Are the systems in the NVT ensemble? Use diff to check the
              .mdp files differ only how you think they do.<br>
              <br>
              What are the values of nstlist and nstcalcenergy?<br>
              <br>
              Take a look at the execution time breakdown at the end of
              the .log files, and do so for more than one replica. With
              the current implementation, every simulation has to
              synchronize and communicate every handful of steps, which
              means that large scale parallelism won't work efficiently
              unless you have fast network hardware that is dedicated to
              your job. This effect shows up in the "Rest" row of the
              time breakdown. With Infiniband, I'd expect you should
              only be losing about 10% of the run time total. The
              30-fold loss you have upon going from 24-&gt;42 replicas
              keeping 4 CPUs/replica suggests some other contribution,
              however.<br>
              <br>
              Mark</td>
          </tr>
        </tbody>
      </table>
      <br>
    </blockquote>
    <br>
  </body>
</html>