<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body text="#000000" bgcolor="#ffffff">

    On 5/06/2011 5:42 PM, Dimitar Pachov wrote:

    <blockquote

      cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"

      type="cite"><br>

      <br>

      <div class="gmail_quote">On Sun, Jun 5, 2011 at 2:14 AM, Mark

        Abraham <span dir="ltr">&lt;<a moz-do-not-send="true"

            href="mailto:Mark.Abraham@anu.edu.au">Mark.Abraham@anu.edu.au</a>&gt;</span>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          <div text="#000000" bgcolor="#ffffff">

            <div class="im"> On 5/06/2011 12:31 PM, Dimitar Pachov

              wrote:

              <blockquote type="cite">As I said, the queue is like this:

                you submit the job, it finds an empty node, it goes

                there, however seconds later another user with

                higher&nbsp;privileges&nbsp;on that particular node submits a job,

                his job kicks out my job, mine goes on the queue again,

                it finds another empty node, goes there, then another

                user with high&nbsp;privileges&nbsp;on that node submits a job,

                which consequently kicks out my job again, and the cycle

                repeats itself ... theoretically, it could continue

                forever, depending on how many and where the empty nodes

                are, if any.</blockquote>

            </div>

            <div class="im"> <br>

            </div>

            You've said that *now* - but previously you've said nothing

            about why you were getting lots of restarts. In my

            experience, PBS queues suspend jobs rather than deleting

            them, in order that resources are not wasted. Apparently

            other places do things this way. I think that this

            information is highly relevant to explaining your

            observations.

            <div class="im"><br>

            </div>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>The point was not "why" I was getting the restarts, but the

          fact itself that I was getting restarts close in time, as I

          stated in my first post. I actually also don't know whether

          jobs are deleted or suspended. I've thought that a job

          returned back to the queue will basically start from the

          beginning when later moved to an empty slot ... so don't

          understand the difference from that perspective.</div>

      </div>

    </blockquote>

    <br>

    It's the difference between a process being killed, and a process

    being allowed to survive but temporarily without access to the CPU.

    Operating systems routinely share the CPU over multiple execution

    threads. Job suspension just adapts that idea.<br>

    <br>

    Also, different UNIX signals are interpreted differently by the

    GROMACS signal handler. It respects hard kills, but it cooperates

    with gentler kills by updating the checkpoint file at the next

    neighbour-search step, IIRC. Perhaps your PBS is making excessive

    use of hard kills - if it didn't, you still get to make some

    progress when you only get a minute of CPU time...<br>

    <br>

    <blockquote

      cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          <div text="#000000" bgcolor="#ffffff">

            <div class="im"> <br>

              <blockquote type="cite">

                <div class="gmail_quote">

                  <div>These many restarts suggest that the queue was

                    full with relatively short jobs ran by users with

                    high&nbsp;privileges. Technically, I cannot see why the

                    same processes should be running simultaneously

                    because at any instant my job runs only on one node,

                    or it stays in the&nbsp;queuing&nbsp;list. <br>

                  </div>

                </div>

              </blockquote>

              <br>

            </div>

            I/O can be buffered such that the termination of the process

            and the completion of its I/O are asynchronous. Perhaps it

            *shouldn't* be that way, but this is a problem for the

            administrators of your cluster to address. They know how the

            file system works. If the next job executes before the old

            one has finished output, then I think the symptoms you

            observe might be possible.<br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>Yes, this is true, and I believe the timing of when the

          buffer is fully flushed is crucial in providing a possible

          explanation in the observed&nbsp;behavior. However, this bottleneck

          has been known for a long time, so I expected people had

          thought about that before confidently putting -append as a

          default. That's all.</div>

      </div>

    </blockquote>

    <br>

    Judging by the frequency of people reporting problems, most people

    don't encounter the kind of "file system latency leading to race

    condition" problem I think that you're seeing. Some might see it,

    and just work around, as you say. Or other people just don't have

    the combination of file system and compute resource management that

    you have to work with.<br>

    &nbsp;

    <blockquote

      cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          <div text="#000000" bgcolor="#ffffff"> <br>

            Note that there is nothing GROMACS can do about that, unless

            somehow GROMACS can apply a lock in the first mdrun that is

            respected by your file system such that a subsequent mdrun

            cannot open the same file until all pending I/O has

            completed. I'd expect proper HPC file systems do that

            automatically, but I don't really know.

            <div>

              <div class="h5"><br>

              </div>

            </div>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>I am not an expert nor do I know the Gromacs coding, but

          could one have an option to specify certain timing before

          which Gromacs is prohibited to output/write any files after

          its initial start, i.e. some kind of suspension and/or waiting

          period? <br>

        </div>

      </div>

    </blockquote>

    <br>

    One could delay some/all output initialization until the first

    write, but it probably makes the code rather more messy. GROMACS

    does check that the state of the output files make sense, by

    computing and comparing checksums stored in the checkpoint file. One

    has to draw a line somewhere. If the contents of those files might

    be changed by another process, then efficient MD is simply

    impossible. Also, there would be people complain that they spent 15

    minutes on their 1024-processor simulation before it died when the

    lack of write permission for the checkpoint filename got noticed.

    Perhaps not that exact scenario, but similar could arise.<br>

    <br>

    You can emulate this yourself by calling "sleep 10s" before mdrun

    and see if that's long enough to solve the latency issue in your

    case.<br>

    <br>

    It seems to me that this kind of file locking ought to be the

    responsibility of the file system. Allowing a new process to access

    a file when there's buffered output pending seems wrong. It just

    asks for these kind of race conditions to arise. (Assuming my theory

    is sound...)<br>

    <br>

    <blockquote

      cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <div>I am also wondering about the checkpoint timing - the

          default is 15 min, but what would be the minimum? Since I have

          not tested it, what would happen if I specify 0.001 min, for

          example?</div>

      </div>

    </blockquote>

    <br>

    I/O takes time, and checkpointing requires global communication to

    prepare for it. Doing it more often than one needs to do it is

    wasteful. Your situation sounds so volatile that checkpointing every

    30s is probably sound. On a BlueGene, about the only reason to

    checkpoint is a power outage. One size can't fit all.<br>

    <br>

    <blockquote

      cite="mid:BANLkTinHcv9nTtK8O2LTvie_NTuDBb8w3g@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          <div text="#000000" bgcolor="#ffffff">

            <div>

              <div class="h5"><br>

              </div>

            </div>

            Words are open to interpretation. Communicating well

            requires that you consider the impact of your words on your

            reader. You want people who can address the problem to want

            to help. You don't want them to feel defensive about the

            situation - whether you think that would be an over-reaction

            or not.

            <div class="im"><br>

            </div>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>I got your point(s). However, I respectfully disagree with

          some of them. First, I believe it is much more important what

          information one's sentences bring rather than how specifically

          they are written.</div>

      </div>

    </blockquote>

    <br>

    The content is very important. Terse and informative is often much

    better than waffling vagueness. However, given a range of

    presentations with the same content, why not choose a presentation

    that improves the chance of achieving the objective?<br>

    <br>

    Mark<br>

  </body>

</html>