<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
On 5/06/2011 5:42 PM, Dimitar Pachov wrote:
<blockquote
cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"
type="cite"><br>
<br>
<div class="gmail_quote">On Sun, Jun 5, 2011 at 2:14 AM, Mark
Abraham <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:Mark.Abraham@anu.edu.au">Mark.Abraham@anu.edu.au</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<div text="#000000" bgcolor="#ffffff">
<div class="im"> On 5/06/2011 12:31 PM, Dimitar Pachov
wrote:
<blockquote type="cite">As I said, the queue is like this:
you submit the job, it finds an empty node, it goes
there, however seconds later another user with
higher privileges on that particular node submits a job,
his job kicks out my job, mine goes on the queue again,
it finds another empty node, goes there, then another
user with high privileges on that node submits a job,
which consequently kicks out my job again, and the cycle
repeats itself ... theoretically, it could continue
forever, depending on how many and where the empty nodes
are, if any.</blockquote>
</div>
<div class="im"> <br>
</div>
You've said that *now* - but previously you've said nothing
about why you were getting lots of restarts. In my
experience, PBS queues suspend jobs rather than deleting
them, in order that resources are not wasted. Apparently
other places do things this way. I think that this
information is highly relevant to explaining your
observations.
<div class="im"><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>The point was not "why" I was getting the restarts, but the
fact itself that I was getting restarts close in time, as I
stated in my first post. I actually also don't know whether
jobs are deleted or suspended. I've thought that a job
returned back to the queue will basically start from the
beginning when later moved to an empty slot ... so don't
understand the difference from that perspective.</div>
</div>
</blockquote>
<br>
It's the difference between a process being killed, and a process
being allowed to survive but temporarily without access to the CPU.
Operating systems routinely share the CPU over multiple execution
threads. Job suspension just adapts that idea.<br>
<br>
Also, different UNIX signals are interpreted differently by the
GROMACS signal handler. It respects hard kills, but it cooperates
with gentler kills by updating the checkpoint file at the next
neighbour-search step, IIRC. Perhaps your PBS is making excessive
use of hard kills - if it didn't, you still get to make some
progress when you only get a minute of CPU time...<br>
<br>
<blockquote
cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<div text="#000000" bgcolor="#ffffff">
<div class="im"> <br>
<blockquote type="cite">
<div class="gmail_quote">
<div>These many restarts suggest that the queue was
full with relatively short jobs ran by users with
high privileges. Technically, I cannot see why the
same processes should be running simultaneously
because at any instant my job runs only on one node,
or it stays in the queuing list. <br>
</div>
</div>
</blockquote>
<br>
</div>
I/O can be buffered such that the termination of the process
and the completion of its I/O are asynchronous. Perhaps it
*shouldn't* be that way, but this is a problem for the
administrators of your cluster to address. They know how the
file system works. If the next job executes before the old
one has finished output, then I think the symptoms you
observe might be possible.<br>
</div>
</blockquote>
<div><br>
</div>
<div>Yes, this is true, and I believe the timing of when the
buffer is fully flushed is crucial in providing a possible
explanation in the observed behavior. However, this bottleneck
has been known for a long time, so I expected people had
thought about that before confidently putting -append as a
default. That's all.</div>
</div>
</blockquote>
<br>
Judging by the frequency of people reporting problems, most people
don't encounter the kind of "file system latency leading to race
condition" problem I think that you're seeing. Some might see it,
and just work around, as you say. Or other people just don't have
the combination of file system and compute resource management that
you have to work with.<br>
<blockquote
cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<div text="#000000" bgcolor="#ffffff"> <br>
Note that there is nothing GROMACS can do about that, unless
somehow GROMACS can apply a lock in the first mdrun that is
respected by your file system such that a subsequent mdrun
cannot open the same file until all pending I/O has
completed. I'd expect proper HPC file systems do that
automatically, but I don't really know.
<div>
<div class="h5"><br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I am not an expert nor do I know the Gromacs coding, but
could one have an option to specify certain timing before
which Gromacs is prohibited to output/write any files after
its initial start, i.e. some kind of suspension and/or waiting
period? <br>
</div>
</div>
</blockquote>
<br>
One could delay some/all output initialization until the first
write, but it probably makes the code rather more messy. GROMACS
does check that the state of the output files make sense, by
computing and comparing checksums stored in the checkpoint file. One
has to draw a line somewhere. If the contents of those files might
be changed by another process, then efficient MD is simply
impossible. Also, there would be people complain that they spent 15
minutes on their 1024-processor simulation before it died when the
lack of write permission for the checkpoint filename got noticed.
Perhaps not that exact scenario, but similar could arise.<br>
<br>
You can emulate this yourself by calling "sleep 10s" before mdrun
and see if that's long enough to solve the latency issue in your
case.<br>
<br>
It seems to me that this kind of file locking ought to be the
responsibility of the file system. Allowing a new process to access
a file when there's buffered output pending seems wrong. It just
asks for these kind of race conditions to arise. (Assuming my theory
is sound...)<br>
<br>
<blockquote
cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div>I am also wondering about the checkpoint timing - the
default is 15 min, but what would be the minimum? Since I have
not tested it, what would happen if I specify 0.001 min, for
example?</div>
</div>
</blockquote>
<br>
I/O takes time, and checkpointing requires global communication to
prepare for it. Doing it more often than one needs to do it is
wasteful. Your situation sounds so volatile that checkpointing every
30s is probably sound. On a BlueGene, about the only reason to
checkpoint is a power outage. One size can't fit all.<br>
<br>
<blockquote
cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<div text="#000000" bgcolor="#ffffff">
<div>
<div class="h5"><br>
</div>
</div>
Words are open to interpretation. Communicating well
requires that you consider the impact of your words on your
reader. You want people who can address the problem to want
to help. You don't want them to feel defensive about the
situation - whether you think that would be an over-reaction
or not.
<div class="im"><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I got your point(s). However, I respectfully disagree with
some of them. First, I believe it is much more important what
information one's sentences bring rather than how specifically
they are written.</div>
</div>
</blockquote>
<br>
The content is very important. Terse and informative is often much
better than waffling vagueness. However, given a range of
presentations with the same content, why not choose a presentation
that improves the chance of achieving the objective?<br>
<br>
Mark<br>
</body>
</html>