<div dir="ltr">OK, thanks Berk. Szilard and I each have thought of some minor documentation things that would be nice to fix also.<div><br></div><div>I suggest we try to get all that in so we can shift focus away from release-4-6, with the 5.0 beta only weeks away now!!<div>
<br></div><div>Mark</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Nov 8, 2013 at 3:30 PM, Berk Hess <span dir="ltr">&lt;<a href="mailto:hess@kth.se" target="_blank">hess@kth.se</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <div>Hi,<br>
      <br>
      I think I found the source of the problem, now I have to come up
      with a solution.<br>
      It seems you have a harmonic potential between two atoms at a
      distance of 2.9 nm. This is much longer than the pair-list range.
      So one (non-local) atom ends up beyond the non-local search grid.
      I thought I had accounted for such cases, but apparently not.<br>
      We can probaly simply round down the index to the maximum, but
      that means we can&#39;t check for other cases, such as without DD,
      where atoms can&#39;t be beyond the grid.<br>
      The fix will be very simple and quick, but I need to think a over
      a bit deeper.<br>
      <br>
      Cheers,<br>
      <br>
      Berk<div><div class="h5"><br>
      <br>
      On 11/08/2013 02:58 PM, Carsten Kutzner wrote:<br>
    </div></div></div><div><div class="h5">
    <blockquote type="cite">
      
      <div>Hi Berk,<br>
        <br>
        On 11/08/2013 02:48 PM, Berk Hess wrote:<br>
      </div>
      <blockquote type="cite">
        <div>Hi,<br>
          <br>
          I assume this is with GPUs.<br>
          If you run in a debugger, break on exit, can you tell me which
          sort_atoms call this comes from?<br>
          <br>
        </div>
      </blockquote>
      It comes from sort_columns_supersub:<br>
      <br>
      <br>
      Breakpoint 1, 0x00007ffff5d17920 in exit () from /lib64/libc.so.6<br>
      (gdb) where<br>
      #0  0x00007ffff5d17920 in exit () from /lib64/libc.so.6<br>
      #1  0x0000000000a60558 in quit_gmx (msg=0x7fffee092290 &quot;\n&quot;, &#39;-&#39;
      &lt;repeats 55 times&gt;, &quot;\nProgram mdrun, VERSION
      4.6.4-dev-20131107-ba8232e\nSource code file:
      /home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,

      l&quot;...) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:284<br>
      #2  0x0000000000a61609 in _gmx_error (key=0x185ecde &quot;fatal&quot;,
      msg=0x7fffee094af0 &quot;(int)((x[74522][x]=11.764535 -
      10.229600)*58.394176) = 89, not in 0 - 16*4\n&quot;, file=0x1836ad8
      &quot;/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c&quot;,

      line=609) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:774<br>
      #3  0x0000000000a61054 in gmx_fatal (f_errno=0, file=0x1836ad8
      &quot;/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c&quot;,

      line=609, fmt=0x1836c80 &quot;(int)((x[%d][%c]=%f - %f)*%f) = %d, not
      in 0 - %d*%d\n&quot;) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:509<br>
      #4  0x00000000004e91db in sort_atoms (dim=0, Backwards=0,
      a=0x7ffff212e920, n=2, x=0x7ffff148a650, h0=10.2296,
      invh=58.3941765, n_per_h=16, sort=0x7ffff10bb2b0) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:609<br>
      #5  0x00000000004ebcee in sort_columns_supersub
      (nbs=0x7ffff057f2a0, dd_zone=1, grid=0x7ffff06d6f60, a0=61048,
      a1=74523, atinfo=0x7ffff0c51840, x=0x7ffff148a650,
      nbat=0x7ffff05d3bb0, cxy_start=10, cxy_end=15,
      sort_work=0x7ffff10bb2b0) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:1394<br>
      #6  0x00000000004ecf09 in calc_cell_indices.omp_fn.1
      (.omp_data_i=0x7ffff516f670) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:1651<br>
      #7  0x00007ffff62a50ba in ?? () from /usr/lib64/libgomp.so.1<br>
      #8  0x00007ffff796de0e in start_thread () from
      /lib64/libpthread.so.0<br>
      #9  0x00007ffff5dc42cd in clone () from /lib64/libc.so.6<br>
      <br>
      <blockquote type="cite">
        <div> On how many MPI ranks is this?<br>
          If I can easily run this, could you mail me the tpr and the
          run settings?<br>
        </div>
      </blockquote>
      mdrun_threads -ntmpi 2 -s in.tpr -v -gpu_id 00<br>
      <br>
      Best,<br>
      Carsten<br>
      <br>
      <blockquote type="cite">
        <div> <br>
          Cheers,<br>
          <br>
          Berk<br>
          <br>
          On 11/08/2013 02:30 PM, Carsten Kutzner wrote:<br>
        </div>
        <blockquote type="cite">
          <div>Hi,<br>
            <br>
            using a just checked-out 4.6 branch compiled with debug
            checks I get<br>
            <br>
            -------------------------------------------------------<br>
            Program mdrun, VERSION 4.6.4-dev-20131107-ba8232e<br>
            Source code file:
            /home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,



            line: 609<br>
            <br>
            Fatal error:<br>
            (int)((x[74522][x]=11.764535 - 10.229600)*58.394176) = 89,
            not in 0 - 16*4<br>
            <br>
            For more information and tips for troubleshooting, please
            check the GROMACS<br>
            website at <a href="http://www.gromacs.org/Documentation/Errors" target="_blank">http://www.gromacs.org/Documentation/Errors</a><br>
            -------------------------------------------------------<br>
            <br>
            Carsten<br>
            <br>
            <br>
            On 11/08/2013 02:00 PM, Berk Hess wrote:<br>
          </div>
          <blockquote type="cite">
            <div>On 11/08/2013 01:44 PM, Mark
              Abraham wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr"><br>
                <div class="gmail_extra"><br>
                  <br>
                  <div class="gmail_quote">On Fri, Nov 8, 2013 at 12:58
                    PM, Carsten Kutzner <span dir="ltr">&lt;<a href="mailto:ckutzne@gwdg.de" target="_blank">ckutzne@gwdg.de</a>&gt;</span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi

                      Mark, hi Berk,<br>
                      <div><br>
                        On Nov 7, 2013, at 6:48 PM, Berk Hess &lt;<a href="mailto:hess@kth.se" target="_blank">hess@kth.se</a>&gt;
                        wrote:<br>
                        <br>
                        &gt; Hi Carsten,<br>
                        &gt;<br>
                        &gt; After how many steps does this happen?<br>
                      </div>
                      this happens immedeately at startup.<br>
                      <div><br>
                        &gt; Could you run with a debug build (or
                        without NDEBUG defined)?<br>
                        &gt; I added a lot of checks, not done with
                        NDEBUG, in the fix for the issue you linked.<br>
                      </div>
                      Will do that now.<br>
                      <div><br>
                        &gt; On 11/07/2013 06:27 PM, Mark Abraham wrote:<br>
                        &gt;&gt; Unclear. 6583c94 is one of your
                        commits. Some very recent stuff has been playing
                        with nstlist and rlist (safely, or so we
                        thought.) Can you reproduce with mainstream
                        release-4-6?<br>
                      </div>
                      This is basically mainstream 4-6, since in my
                      commit I only changed the default behavior of<br>
                      appending to no.<br>
                    </blockquote>
                    <div><br>
                    </div>
                    <div>Right. What&#39;s the mainstream parent commit? I
                      was going to release 4.6.4 today - if you&#39;re based
                      off the current tip then maybe we shouldn&#39;t. If
                      you&#39;re based off code a month back then we know
                      the problem, if any, is of longer standing.</div>
                  </div>
                </div>
              </div>
            </blockquote>
            This is 4.6.4-dev which seems to include my fix for the
            previous issue, so this issue is surely present in the
            current 4-6-release branch. It must be due to a somewhat
            exotic condition, since this code is widely used and we
            haven&#39;t had other reports.<br>
            <br>
            I think it should be easy to track this down with all the
            debug checks in the code.<br>
            And if Carsten can send me his system and the conditions to
            reproduce it, I can also help with debugging.<br>
            <br>
            Cheers,<br>
            <br>
            Berk<br>
            <blockquote type="cite">
              <div dir="ltr">
                <div class="gmail_extra">
                  <div class="gmail_quote">
                    <div><br>
                    </div>
                    <div>Mark</div>
                    <div><br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <span><font color="#888888"><br>
                          Carsten<br>
                        </font></span>
                      <div>
                        <div><br>
                          &gt;&gt;<br>
                          &gt;&gt; Mark<br>
                          &gt;&gt;<br>
                          &gt;&gt;<br>
                          &gt;&gt; On Thu, Nov 7, 2013 at 5:18 PM,
                          Carsten Kutzner &lt;<a href="mailto:ckutzne@gwdg.de" target="_blank">ckutzne@gwdg.de</a>&gt;




                          wrote:<br>
                          &gt;&gt; Hi,<br>
                          &gt;&gt;<br>
                          &gt;&gt; we have a 120k atom system that
                          crashes with<br>
                          &gt;&gt;<br>
                          &gt;&gt;
                          ------------------------------------------------------<br>
                          &gt;&gt; Program mdrun_mpi, VERSION
                          4.6.4-dev-20131015-6583c94<br>
                          &gt;&gt; Source code file:
                          /home/c/gromacs/src/mdlib/nbnxn_search.c,
                          line: 685<br>
                          &gt;&gt;<br>
                          &gt;&gt; Software inconsistency error:<br>
                          &gt;&gt; Lost particles while sorting<br>
                          &gt;&gt; For more information and tips for
                          troubleshooting, please check the GROMACS<br>
                          &gt;&gt; website at <a href="http://www.gromacs.org/Documentation/Errors" target="_blank">http://www.gromacs.org/Documentation/Errors</a><br>
                          &gt;&gt;
                          -------------------------------------------------------<br>
                          &gt;&gt;<br>
                          &gt;&gt; if run with &gt;= 2 MPI processes on
                          a GPU and small values for nstlist. On my
                          workstation,<br>
                          &gt;&gt; nstlist = 34 and larger works,
                          whereas nstlist &lt;= 33 lead to the above
                          problem.<br>
                          &gt;&gt;<br>
                          &gt;&gt; Another system (60k atoms) does not
                          produce this problem, so system size seems<br>
                          &gt;&gt; to matter as well.<br>
                          &gt;&gt;<br>
                          &gt;&gt; Looks like an old ghost:<br>
                          &gt;&gt;<br>
                          &gt;&gt; <a href="http://redmine.gromacs.org/issues/1153" target="_blank">http://redmine.gromacs.org/issues/1153</a><br>
                          &gt;&gt;<br>
                          &gt;&gt;<br>
                          &gt;&gt; Should I file a redmine issue?<br>
                          &gt;&gt;<br>
                          &gt;&gt; Carsten<br>
                          &gt;&gt;<br>
                          &gt;&gt;<br>
                          &gt;&gt; --<br>
                          &gt;&gt; gmx-developers mailing list<br>
                          &gt;&gt; <a href="mailto:gmx-developers@gromacs.org" target="_blank">gmx-developers@gromacs.org</a><br>
                          &gt;&gt; <a href="http://lists.gromacs.org/mailman/listinfo/gmx-developers" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
                          &gt;&gt; Please don&#39;t post (un)subscribe
                          requests to the list. Use the www interface or
                          send it to <a href="mailto:gmx-developers-request@gromacs.org" target="_blank">gmx-developers-request@gromacs.org</a>.<br>
                          &gt;&gt;<br>
                          &gt;&gt;<br>
                          &gt;&gt;<br>
                          &gt;<br>
                          &gt; --<br>
                          &gt; gmx-developers mailing list<br>
                          &gt; <a href="mailto:gmx-developers@gromacs.org" target="_blank">gmx-developers@gromacs.org</a><br>
                          &gt; <a href="http://lists.gromacs.org/mailman/listinfo/gmx-developers" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
                          &gt; Please don&#39;t post (un)subscribe requests
                          to the list. Use the<br>
                          &gt; www interface or send it to <a href="mailto:gmx-developers-request@gromacs.org" target="_blank">gmx-developers-request@gromacs.org</a>.<br>
                          <br>
                          --<br>
                          gmx-developers mailing list<br>
                          <a href="mailto:gmx-developers@gromacs.org" target="_blank">gmx-developers@gromacs.org</a><br>
                          <a href="http://lists.gromacs.org/mailman/listinfo/gmx-developers" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
                          Please don&#39;t post (un)subscribe requests to
                          the list. Use the<br>
                          www interface or send it to <a href="mailto:gmx-developers-request@gromacs.org" target="_blank">gmx-developers-request@gromacs.org</a>.<br>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
              <br>
              <fieldset></fieldset>
              <br>
            </blockquote>
            <br>
            <br>
            <fieldset></fieldset>
            <br>
          </blockquote>
          <br>
          <br>
          <fieldset></fieldset>
          <br>
        </blockquote>
        <br>
        <br>
        <fieldset></fieldset>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset></fieldset>
      <br>
    </blockquote>
    <br>
  </div></div></div>

<br>--<br>
gmx-developers mailing list<br>
<a href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>
<a href="http://lists.gromacs.org/mailman/listinfo/gmx-developers" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
Please don&#39;t post (un)subscribe requests to the list. Use the<br>
www interface or send it to <a href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br></blockquote></div><br></div>