<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">I think I can upload a fix around
18:00.<br>
<br>
Cheers,<br>
<br>
Berk<br>
<br>
On 11/08/2013 03:37 PM, Mark Abraham wrote:<br>
</div>
<blockquote
cite="mid:CAMNuMAQky9dDuLiMoNZ+_uvHmkAGdwFPxYNY1GYCh0PHDKmgSg@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div dir="ltr">OK, thanks Berk. Szilard and I each have thought of
some minor documentation things that would be nice to fix also.
<div><br>
</div>
<div>I suggest we try to get all that in so we can shift focus
away from release-4-6, with the 5.0 beta only weeks away now!!
<div>
<br>
</div>
<div>Mark</div>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Fri, Nov 8, 2013 at 3:30 PM, Berk
Hess <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hess@kth.se" target="_blank">hess@kth.se</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi,<br>
<br>
I think I found the source of the problem, now I have to
come up with a solution.<br>
It seems you have a harmonic potential between two atoms
at a distance of 2.9 nm. This is much longer than the
pair-list range. So one (non-local) atom ends up beyond
the non-local search grid. I thought I had accounted for
such cases, but apparently not.<br>
We can probaly simply round down the index to the
maximum, but that means we can't check for other cases,
such as without DD, where atoms can't be beyond the
grid.<br>
The fix will be very simple and quick, but I need to
think a over a bit deeper.<br>
<br>
Cheers,<br>
<br>
Berk
<div>
<div class="h5"><br>
<br>
On 11/08/2013 02:58 PM, Carsten Kutzner wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">
<div>Hi Berk,<br>
<br>
On 11/08/2013 02:48 PM, Berk Hess wrote:<br>
</div>
<blockquote type="cite">
<div>Hi,<br>
<br>
I assume this is with GPUs.<br>
If you run in a debugger, break on exit, can you
tell me which sort_atoms call this comes from?<br>
<br>
</div>
</blockquote>
It comes from sort_columns_supersub:<br>
<br>
<br>
Breakpoint 1, 0x00007ffff5d17920 in exit () from
/lib64/libc.so.6<br>
(gdb) where<br>
#0 0x00007ffff5d17920 in exit () from
/lib64/libc.so.6<br>
#1 0x0000000000a60558 in quit_gmx
(msg=0x7fffee092290 "\n", '-' <repeats 55
times>, "\nProgram mdrun, VERSION
4.6.4-dev-20131107-ba8232e\nSource code file:
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,
l"...) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:284<br>
#2 0x0000000000a61609 in _gmx_error (key=0x185ecde
"fatal", msg=0x7fffee094af0
"(int)((x[74522][x]=11.764535 -
10.229600)*58.394176) = 89, not in 0 - 16*4\n",
file=0x1836ad8
"/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c",
line=609) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:774<br>
#3 0x0000000000a61054 in gmx_fatal (f_errno=0,
file=0x1836ad8
"/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c",
line=609, fmt=0x1836c80 "(int)((x[%d][%c]=%f -
%f)*%f) = %d, not in 0 - %d*%d\n") at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/gmxlib/gmx_fatal.c:509<br>
#4 0x00000000004e91db in sort_atoms (dim=0,
Backwards=0, a=0x7ffff212e920, n=2,
x=0x7ffff148a650, h0=10.2296, invh=58.3941765,
n_per_h=16, sort=0x7ffff10bb2b0) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:609<br>
#5 0x00000000004ebcee in sort_columns_supersub
(nbs=0x7ffff057f2a0, dd_zone=1, grid=0x7ffff06d6f60,
a0=61048, a1=74523, atinfo=0x7ffff0c51840,
x=0x7ffff148a650, nbat=0x7ffff05d3bb0, cxy_start=10,
cxy_end=15, sort_work=0x7ffff10bb2b0) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:1394<br>
#6 0x00000000004ecf09 in calc_cell_indices.omp_fn.1
(.omp_data_i=0x7ffff516f670) at
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c:1651<br>
#7 0x00007ffff62a50ba in ?? () from
/usr/lib64/libgomp.so.1<br>
#8 0x00007ffff796de0e in start_thread () from
/lib64/libpthread.so.0<br>
#9 0x00007ffff5dc42cd in clone () from
/lib64/libc.so.6<br>
<br>
<blockquote type="cite">
<div> On how many MPI ranks is this?<br>
If I can easily run this, could you mail me the
tpr and the run settings?<br>
</div>
</blockquote>
mdrun_threads -ntmpi 2 -s in.tpr -v -gpu_id 00<br>
<br>
Best,<br>
Carsten<br>
<br>
<blockquote type="cite">
<div> <br>
Cheers,<br>
<br>
Berk<br>
<br>
On 11/08/2013 02:30 PM, Carsten Kutzner wrote:<br>
</div>
<blockquote type="cite">
<div>Hi,<br>
<br>
using a just checked-out 4.6 branch compiled
with debug checks I get<br>
<br>
-------------------------------------------------------<br>
Program mdrun, VERSION
4.6.4-dev-20131107-ba8232e<br>
Source code file:
/home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,
line: 609<br>
<br>
Fatal error:<br>
(int)((x[74522][x]=11.764535 -
10.229600)*58.394176) = 89, not in 0 - 16*4<br>
<br>
For more information and tips for
troubleshooting, please check the GROMACS<br>
website at <a moz-do-not-send="true"
href="http://www.gromacs.org/Documentation/Errors"
target="_blank">http://www.gromacs.org/Documentation/Errors</a><br>
-------------------------------------------------------<br>
<br>
Carsten<br>
<br>
<br>
On 11/08/2013 02:00 PM, Berk Hess wrote:<br>
</div>
<blockquote type="cite">
<div>On 11/08/2013 01:44 PM, Mark Abraham
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Fri, Nov 8,
2013 at 12:58 PM, Carsten Kutzner <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:ckutzne@gwdg.de"
target="_blank">ckutzne@gwdg.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">Hi Mark, hi
Berk,<br>
<div><br>
On Nov 7, 2013, at 6:48 PM, Berk
Hess <<a moz-do-not-send="true"
href="mailto:hess@kth.se"
target="_blank">hess@kth.se</a>>
wrote:<br>
<br>
> Hi Carsten,<br>
><br>
> After how many steps does
this happen?<br>
</div>
this happens immedeately at startup.<br>
<div><br>
> Could you run with a debug
build (or without NDEBUG defined)?<br>
> I added a lot of checks, not
done with NDEBUG, in the fix for
the issue you linked.<br>
</div>
Will do that now.<br>
<div><br>
> On 11/07/2013 06:27 PM, Mark
Abraham wrote:<br>
>> Unclear. 6583c94 is one
of your commits. Some very recent
stuff has been playing with
nstlist and rlist (safely, or so
we thought.) Can you reproduce
with mainstream release-4-6?<br>
</div>
This is basically mainstream 4-6,
since in my commit I only changed
the default behavior of<br>
appending to no.<br>
</blockquote>
<div><br>
</div>
<div>Right. What's the mainstream
parent commit? I was going to
release 4.6.4 today - if you're
based off the current tip then maybe
we shouldn't. If you're based off
code a month back then we know the
problem, if any, is of longer
standing.</div>
</div>
</div>
</div>
</blockquote>
This is 4.6.4-dev which seems to include my
fix for the previous issue, so this issue is
surely present in the current 4-6-release
branch. It must be due to a somewhat exotic
condition, since this code is widely used and
we haven't had other reports.<br>
<br>
I think it should be easy to track this down
with all the debug checks in the code.<br>
And if Carsten can send me his system and the
conditions to reproduce it, I can also help
with debugging.<br>
<br>
Cheers,<br>
<br>
Berk<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Mark</div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <span><font
color="#888888"><br>
Carsten<br>
</font></span>
<div>
<div><br>
>><br>
>> Mark<br>
>><br>
>><br>
>> On Thu, Nov 7, 2013 at
5:18 PM, Carsten Kutzner <<a
moz-do-not-send="true"
href="mailto:ckutzne@gwdg.de"
target="_blank">ckutzne@gwdg.de</a>>
wrote:<br>
>> Hi,<br>
>><br>
>> we have a 120k atom
system that crashes with<br>
>><br>
>>
------------------------------------------------------<br>
>> Program mdrun_mpi,
VERSION
4.6.4-dev-20131015-6583c94<br>
>> Source code file:
/home/c/gromacs/src/mdlib/nbnxn_search.c,
line: 685<br>
>><br>
>> Software inconsistency
error:<br>
>> Lost particles while
sorting<br>
>> For more information
and tips for troubleshooting,
please check the GROMACS<br>
>> website at <a
moz-do-not-send="true"
href="http://www.gromacs.org/Documentation/Errors"
target="_blank">http://www.gromacs.org/Documentation/Errors</a><br>
>>
-------------------------------------------------------<br>
>><br>
>> if run with >= 2 MPI
processes on a GPU and small
values for nstlist. On my
workstation,<br>
>> nstlist = 34 and larger
works, whereas nstlist <= 33
lead to the above problem.<br>
>><br>
>> Another system (60k
atoms) does not produce this
problem, so system size seems<br>
>> to matter as well.<br>
>><br>
>> Looks like an old
ghost:<br>
>><br>
>> <a
moz-do-not-send="true"
href="http://redmine.gromacs.org/issues/1153"
target="_blank">http://redmine.gromacs.org/issues/1153</a><br>
>><br>
>><br>
>> Should I file a redmine
issue?<br>
>><br>
>> Carsten<br>
>><br>
>><br>
>> --<br>
>> gmx-developers mailing
list<br>
>> <a
moz-do-not-send="true"
href="mailto:gmx-developers@gromacs.org"
target="_blank">gmx-developers@gromacs.org</a><br>
>> <a
moz-do-not-send="true"
href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
>> Please don't post
(un)subscribe requests to the
list. Use the www interface or
send it to <a
moz-do-not-send="true"
href="mailto:gmx-developers-request@gromacs.org"
target="_blank">gmx-developers-request@gromacs.org</a>.<br>
>><br>
>><br>
>><br>
><br>
> --<br>
> gmx-developers mailing list<br>
> <a moz-do-not-send="true"
href="mailto:gmx-developers@gromacs.org" target="_blank">gmx-developers@gromacs.org</a><br>
> <a moz-do-not-send="true"
href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
> Please don't post
(un)subscribe requests to the
list. Use the<br>
> www interface or send it to
<a moz-do-not-send="true"
href="mailto:gmx-developers-request@gromacs.org"
target="_blank">gmx-developers-request@gromacs.org</a>.<br>
<br>
--<br>
gmx-developers mailing list<br>
<a moz-do-not-send="true"
href="mailto:gmx-developers@gromacs.org"
target="_blank">gmx-developers@gromacs.org</a><br>
<a moz-do-not-send="true"
href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
Please don't post (un)subscribe
requests to the list. Use the<br>
www interface or send it to <a
moz-do-not-send="true"
href="mailto:gmx-developers-request@gromacs.org"
target="_blank">gmx-developers-request@gromacs.org</a>.<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
</blockquote>
<br>
</div>
</div>
</div>
<br>
--<br>
gmx-developers mailing list<br>
<a moz-do-not-send="true"
href="mailto:gmx-developers@gromacs.org">gmx-developers@gromacs.org</a><br>
<a moz-do-not-send="true"
href="http://lists.gromacs.org/mailman/listinfo/gmx-developers"
target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-developers</a><br>
Please don't post (un)subscribe requests to the list. Use
the<br>
www interface or send it to <a moz-do-not-send="true"
href="mailto:gmx-developers-request@gromacs.org">gmx-developers-request@gromacs.org</a>.<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
</blockquote>
<br>
</body>
</html>