<div>Two comments about the discussion: </div><div><br></div><div>1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the OS. Maybe someone can write a short test application which tries to reproduce this idea. Thus writing to a file from one node and immediate after one test program is killed on one node writing to it from some other node.</div>
<div><br></div><div>2) We lock files but only the log file. The idea is that we only need to guarantee that the set of files is only accessed by one application. This seems safe but in case someone sees a way of how the trajectory is opened without the log file being opened, please file a bug.</div>
<div><br></div><div>Roland</div><br><div class="gmail_quote">On Sun, Jun 5, 2011 at 10:13 AM, Mark Abraham <span dir="ltr"><<a href="mailto:Mark.Abraham@anu.edu.au">Mark.Abraham@anu.edu.au</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div text="#000000" bgcolor="#ffffff"><div><div></div><div class="h5">
On 5/06/2011 11:08 PM, Francesco Oteri wrote:
<blockquote type="cite">
Dear Dimitar, <br>
I'm following the debate regarding:<br>
<span style="border-collapse:collapse;font-family:arial,sans-serif;font-size:13px"><br>
<div>
<div><font size="1"><br>
</font></div>
</div>
</span>
<blockquote type="cite">
<div class="gmail_quote">
<div class="gmail_quote">The point was not "why" I was getting
the restarts, but the fact itself that I was getting
restarts close in time, as I stated in my first post. I
actually also don't know whether jobs are deleted or
suspended. I've thought that a job returned back to the
queue will basically start from the beginning when later
moved to an empty slot ... so don't understand the
difference from that perspective.<br>
</div>
</div>
</blockquote>
<br>
In the second mail yoo say:<br>
<br>
<span style="border-collapse:collapse;font-family:arial,sans-serif;font-size:13px">
<div>Submitted by:</div>
<div>========================</div>
<div><font size="1">ii=1</font></div>
<div><font size="1">ifmpi="mpirun -np $NSLOTS"</font></div>
<div><font size="1">--------</font></div>
<div><font size="1"> if [ ! -f run${ii}-i.tpr ];then</font></div>
<div>
<div><font size="1"> cp run${ii}.tpr run${ii}-i.tpr </font></div>
<div><font size="1"> tpbconv -s run${ii}-i.tpr -until
200000 -o run${ii}.tpr </font></div>
<div><font size="1"> fi</font></div>
<div><font size="1"><br>
</font></div>
<div><font size="1"> k=`ls md-${ii}*.out | wc -l`</font></div>
<div><font size="1"> outfile="md-${ii}-$k.out"</font></div>
<div><font size="1"> if [[ -f run${ii}.cpt ]]; then</font></div>
<div><font size="1"> </font></div>
<div><font size="1"> <b> $ifmpi `which mdrun` </b>-s
run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0
> $outfile 2>&1 </font></div>
<div><font size="1"><br>
</font></div>
<div><font size="1"> fi</font></div>
</div>
<div>=========================<br>
<br>
<br>
If I understand well, you are submitting the SERIAL mdrun.
This means that multiple instances of mdrun are running at the
same time.<br>
Each instance of mdrun is an INDIPENDENT instance. Therefore
checkpoint files, one for each instance (i.e. one for each
CPU), are written at the same time.</div>
</span> </blockquote>
<br></div></div>
Good thought, but Dimitar's stdout excerpts from early in the thread
do indicate the presence of multiple execution threads. Dynamic load
balancing gets turned on, and the DD is 4x2x1 for his 8 processors.
Conventionally, and by default in the installation process, the
MPI-enabled binaries get an "_mpi" suffix, but it isn't enforced -
or enforceable :-)<br>
<br>
Mark<br>
</div>
<br>--<br>
gmx-users mailing list <a href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</a><br>
<a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>
Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/Search" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/Search</a> before posting!<br>
Please don't post (un)subscribe requests to the list. Use the<br>
www interface or send it to <a href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</a>.<br>
Can't post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br></blockquote></div><br><br clear="all"><br>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>
865-241-1537, ORNL PO BOX 2008 MS6309<br>