<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Hi Berk,<br>
      <br>
      On 11/08/2013 02:48 PM, Berk Hess wrote:<br>
    </div>
    <blockquote cite="mid:527CEBB4.9080106@kth.se" type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <div class="moz-cite-prefix">Hi,<br>
        <br>
        I assume this is with GPUs.<br>
        If you run in a debugger, break on exit, can you tell me which
        sort_atoms call this comes from?<br>
        <br>
      </div>
    </blockquote>
    It comes from sort_columns_supersub:<br>
    <br>
    <br>
    Breakpoint 1, 0x00007ffff5d17920 in exit () from /lib64/libc.so.6<br>
    (gdb) where<br>
    #0&nbsp; 0x00007ffff5d17920 in exit () from /lib64/libc.so.6<br>
    #1&nbsp; 0x0000000000a60558 in quit_gmx (msg=0x7fffee092290 "\n", '-'
    &lt;repeats 55 times&gt;, "\nProgram mdrun, VERSION
    4.6.4-dev-20131107-ba8232e\nSource code file:
    /home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,
    l"...) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:284<br>
    #2&nbsp; 0x0000000000a61609 in _gmx_error (key=0x185ecde "fatal",
    msg=0x7fffee094af0 "(int)((x[74522][x]=11.764535 -
    10.229600)*58.394176) = 89, not in 0 - 16*4\n", file=0x1836ad8
    "/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c",
    line=609) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:774<br>
    #3&nbsp; 0x0000000000a61054 in gmx_fatal (f_errno=0, file=0x1836ad8
    "/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c",
    line=609, fmt=0x1836c80 "(int)((x[%d][%c]=%f - %f)*%f) = %d, not in
    0 - %d*%d\n") at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:509<br>
    #4&nbsp; 0x00000000004e91db in sort_atoms (dim=0, Backwards=0,
    a=0x7ffff212e920, n=2, x=0x7ffff148a650, h0=10.2296,
    invh=58.3941765, n_per_h=16, sort=0x7ffff10bb2b0) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:609<br>
    #5&nbsp; 0x00000000004ebcee in sort_columns_supersub (nbs=0x7ffff057f2a0,
    dd_zone=1, grid=0x7ffff06d6f60, a0=61048, a1=74523,
    atinfo=0x7ffff0c51840, x=0x7ffff148a650, nbat=0x7ffff05d3bb0,
    cxy_start=10, cxy_end=15, sort_work=0x7ffff10bb2b0) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:1394<br>
    #6&nbsp; 0x00000000004ecf09 in calc_cell_indices.omp_fn.1
    (.omp_data_i=0x7ffff516f670) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:1651<br>
    #7&nbsp; 0x00007ffff62a50ba in ?? () from /usr/lib64/libgomp.so.1<br>
    #8&nbsp; 0x00007ffff796de0e in start_thread () from
    /lib64/libpthread.so.0<br>
    #9&nbsp; 0x00007ffff5dc42cd in clone () from /lib64/libc.so.6<br>
    <br>
    <blockquote cite="mid:527CEBB4.9080106@kth.se" type="cite">
      <div class="moz-cite-prefix"> On how many MPI ranks is this?<br>
        If I can easily run this, could you mail me the tpr and the run
        settings?<br>
      </div>
    </blockquote>
    mdrun_threads -ntmpi 2 -s in.tpr -v -gpu_id 00<br>
    <br>
    Best,<br>
    Carsten<br>
    <br>
    <blockquote cite="mid:527CEBB4.9080106@kth.se" type="cite">
      <div class="moz-cite-prefix"> <br>
        Cheers,<br>
        <br>
        Berk<br>
        <br>
        On 11/08/2013 02:30 PM, Carsten Kutzner wrote:<br>
      </div>
      <blockquote cite="mid:527CE75D.1050509@gwdg.de" type="cite">
        <div class="moz-cite-prefix">Hi,<br>
          <br>
          using a just checked-out 4.6 branch compiled with debug checks
          I get<br>
          <br>
          -------------------------------------------------------<br>
          Program mdrun, VERSION 4.6.4-dev-20131107-ba8232e<br>
          Source code file:
          /home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,


          line: 609<br>
          <br>
          Fatal error:<br>
          (int)((x[74522][x]=11.764535 - 10.229600)*58.394176) = 89, not
          in 0 - 16*4<br>
          <br>
          For more information and tips for troubleshooting, please
          check the GROMACS<br>
          website at <a moz-do-not-send="true"
            class="moz-txt-link-freetext"
            href="http://www.gromacs.org/Documentation/Errors">http://www.gromacs.org/Documentation/Errors</a><br>
          -------------------------------------------------------<br>
          <br>
          Carsten<br>
          <br>
          <br>
          On 11/08/2013 02:00 PM, Berk Hess wrote:<br>
        </div>
        <blockquote cite="mid:527CE062.3020200@kth.se" type="cite">
          <div class="moz-cite-prefix">On 11/08/2013 01:44 PM, Mark
            Abraham wrote:<br>
          </div>
          <blockquote
cite="mid:CAMNuMAS9x3=btqto7URjCjB6tMdTwtHV6Ye7eCZ21wJmWpb1Ag@mail.gmail.com"
            type="cite">
            <div dir="ltr"><br>
              <div class="gmail_extra"><br>
                <br>
                <div class="gmail_quote">On Fri, Nov 8, 2013 at 12:58
                  PM, Carsten Kutzner <span dir="ltr">&lt;<a
                      moz-do-not-send="true"
                      href="mailto:ckutzne@gwdg.de" target="_blank">ckutzne@gwdg.de</a>&gt;</span>
                  wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
                    Mark, hi Berk,<br>
                    <div class="im"><br>
                      On Nov 7, 2013, at 6:48 PM, Berk Hess &lt;<a
                        moz-do-not-send="true" href="mailto:hess@kth.se">hess@kth.se</a>&gt;



                      wrote:<br>
                      <br>
                      &gt; Hi Carsten,<br>
                      &gt;<br>
                      &gt; After how many steps does this happen?<br>
                    </div>
                    this happens immedeately at startup.<br>
                    <div class="im"><br>
                      &gt; Could you run with a debug build (or without
                      NDEBUG defined)?<br>
                      &gt; I added a lot of checks, not done with
                      NDEBUG, in the fix for the issue you linked.<br>
                    </div>
                    Will do that now.<br>
                    <div class="im"><br>
                      &gt; On 11/07/2013 06:27 PM, Mark Abraham wrote:<br>
                      &gt;&gt; Unclear. 6583c94 is one of your commits.
                      Some very recent stuff has been playing with
                      nstlist and rlist (safely, or so we thought.) Can
                      you reproduce with mainstream release-4-6?<br>
                    </div>
                    This is basically mainstream 4-6, since in my commit
                    I only changed the default behavior of<br>
                    appending to no.<br>
                  </blockquote>
                  <div><br>
                  </div>
                  <div>Right. What's the mainstream parent commit? I was
                    going to release 4.6.4 today - if you're based off
                    the current tip then maybe we shouldn't. If you're
                    based off code a month back then we know the
                    problem, if any, is of longer standing.</div>
                </div>
              </div>
            </div>
          </blockquote>
          This is 4.6.4-dev which seems to include my fix for the
          previous issue, so this issue is surely present in the current
          4-6-release branch. It must be due to a somewhat exotic
          condition, since this code is widely used and we haven't had
          other reports.<br>
          <br>
          I think it should be easy to track this down with all the
          debug checks in the code.<br>
          And if Carsten can send me his system and the conditions to
          reproduce it, I can also help with debugging.<br>
          <br>
          Cheers,<br>
          <br>
          Berk<br>
          <blockquote
cite="mid:CAMNuMAS9x3=btqto7URjCjB6tMdTwtHV6Ye7eCZ21wJmWpb1Ag@mail.gmail.com"
            type="cite">
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div><br>
                  </div>
                  <div>Mark</div>
                  <div><br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span
                      class="HOEnZb"><font color="#888888"><br>
                        Carsten<br>
                      </font></span>
                    <div class="HOEnZb">
                      <div class="h5"><br>
                        &gt;&gt;<br>
                        &gt;&gt; Mark<br>
                        &gt;&gt;<br>
                        &gt;&gt;<br>
                        &gt;&gt; On Thu, Nov 7, 2013 at 5:18 PM, Carsten
                        Kutzner &lt;<a moz-do-not-send="true"
                          href="mailto:ckutzne@gwdg.de">ckutzne@gwdg.de</a>&gt;



                        wrote:<br>
                        &gt;&gt; Hi,<br>
                        &gt;&gt;<br>
                        &gt;&gt; we have a 120k atom system that crashes
                        with<br>
                        &gt;&gt;<br>
                        &gt;&gt;
                        ------------------------------------------------------<br>
                        &gt;&gt; Program mdrun_mpi, VERSION
                        4.6.4-dev-20131015-6583c94<br>
                        &gt;&gt; Source code file:
                        /home/c/gromacs/src/mdlib/nbnxn_search.c, line:
                        685<br>
                        &gt;&gt;<br>
                        &gt;&gt; Software inconsistency error:<br>
                        &gt;&gt; Lost particles while sorting<br>
                        &gt;&gt; For more information and tips for
                        troubleshooting, please check the GROMACS<br>
                        &gt;&gt; website at <a moz-do-not-send="true"
                          href="http://www.gromacs.org/Documentation/Errors"
                          target="_blank">http://www.gromacs.org/Documentation/Errors</a><br>
                        &gt;&gt;
                        -------------------------------------------------------<br>
                        &gt;&gt;<br>
                        &gt;&gt; if run with &gt;= 2 MPI processes on a
                        GPU and small values for nstlist. On my
                        workstation,<br>
                        &gt;&gt; nstlist = 34 and larger works, whereas
                        nstlist &lt;= 33 lead to the above problem.<br>
                        &gt;&gt;<br>
                        &gt;&gt; Another system (60k atoms) does not
                        produce this problem, so system size seems<br>
                        &gt;&gt; to matter as well.<br>
                        &gt;&gt;<br>
                        &gt;&gt; Looks like an old ghost:<br>
                        &gt;&gt;<br>
                        &gt;&gt; <a moz-do-not-send="true"
                          href="http://redmine.gromacs.org/issues/1153"
                          target="_blank">http://redmine.gromacs.org/issues/1153</a><br>
                        &gt;&gt;<br>
                        &gt;&gt;<br>
                        &gt;&gt; Should I file a redmine issue?<br>
                        &gt;&gt;<br>
                        &gt;&gt; Carsten<br>
                        &gt;&gt;<br>
                        &gt;&gt;<br>
                        &gt;&gt; --<br>
                        &gt;&gt; gmx-developers mailing list<br>
                        &gt;&gt; <a moz-do-not-send="true"
                          href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>
                        &gt;&gt; <a moz-do-not-send="true"
                          href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
                          target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
                        &gt;&gt; Please don't post (un)subscribe
                        requests to the list. Use the www interface or
                        send it to <a moz-do-not-send="true"
                          href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>
                        &gt;&gt;<br>
                        &gt;&gt;<br>
                        &gt;&gt;<br>
                        &gt;<br>
                        &gt; --<br>
                        &gt; gmx-developers mailing list<br>
                        &gt; <a moz-do-not-send="true"
                          href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>
                        &gt; <a moz-do-not-send="true"
                          href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
                          target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
                        &gt; Please don't post (un)subscribe requests to
                        the list. Use the<br>
                        &gt; www interface or send it to <a
                          moz-do-not-send="true"
                          href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>
                        <br>
                        --<br>
                        gmx-developers mailing list<br>
                        <a moz-do-not-send="true"
                          href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>
                        <a moz-do-not-send="true"
                          href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
                          target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
                        Please don't post (un)subscribe requests to the
                        list. Use the<br>
                        www interface or send it to <a
                          moz-do-not-send="true"
                          href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <br>
              </div>
            </div>
            <br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <br>
          </blockquote>
          <br>
          <br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <br>
        </blockquote>
        <br>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
    </blockquote>
    <br>
  </body>
</html>