<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
On 5/06/2011 11:08 PM, Francesco Oteri wrote:
<blockquote cite="mid:4DEB7FDD.8040302@gmail.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
Dear Dimitar, <br>
I'm following the debate regarding:<br>
<span style="border-collapse: collapse; font-family:
arial,sans-serif; font-size: 13px;"><br>
<div>
<div><font size="1"><br>
</font></div>
</div>
</span>
<blockquote
cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div class="gmail_quote">The point was not "why" I was getting
the restarts, but the fact itself that I was getting
restarts close in time, as I stated in my first post. I
actually also don't know whether jobs are deleted or
suspended. I've thought that a job returned back to the
queue will basically start from the beginning when later
moved to an empty slot ... so don't understand the
difference from that perspective.<br>
</div>
</div>
</blockquote>
<br>
In the second mail yoo say:<br>
<br>
<span style="border-collapse: collapse; font-family:
arial,sans-serif; font-size: 13px;">
<div>Submitted by:</div>
<div>========================</div>
<div><font size="1">ii=1</font></div>
<div><font size="1">ifmpi="mpirun -np $NSLOTS"</font></div>
<div><font size="1">--------</font></div>
<div><font size="1"> if [ ! -f run${ii}-i.tpr ];then</font></div>
<div>
<div><font size="1"> cp run${ii}.tpr run${ii}-i.tpr </font></div>
<div><font size="1"> tpbconv -s run${ii}-i.tpr -until
200000 -o run${ii}.tpr </font></div>
<div><font size="1"> fi</font></div>
<div><font size="1"><br>
</font></div>
<div><font size="1"> k=`ls md-${ii}*.out | wc -l`</font></div>
<div><font size="1"> outfile="md-${ii}-$k.out"</font></div>
<div><font size="1"> if [[ -f run${ii}.cpt ]]; then</font></div>
<div><font size="1"> </font></div>
<div><font size="1"> <b> $ifmpi `which mdrun` </b>-s
run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0
> $outfile 2>&1 </font></div>
<div><font size="1"><br>
</font></div>
<div><font size="1"> fi</font></div>
</div>
<div>=========================<br>
<br>
<br>
If I understand well, you are submitting the SERIAL mdrun.
This means that multiple instances of mdrun are running at the
same time.<br>
Each instance of mdrun is an INDIPENDENT instance. Therefore
checkpoint files, one for each instance (i.e. one for each
CPU), are written at the same time.</div>
</span> </blockquote>
<br>
Good thought, but Dimitar's stdout excerpts from early in the thread
do indicate the presence of multiple execution threads. Dynamic load
balancing gets turned on, and the DD is 4x2x1 for his 8 processors.
Conventionally, and by default in the installation process, the
MPI-enabled binaries get an "_mpi" suffix, but it isn't enforced -
or enforceable :-)<br>
<br>
Mark<br>
</body>
</html>