<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 10pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'><div style="text-align: left;">Hi,<br></div><br>Weird.<br>The only thing I can think of is that the time_t data type is somehow<br>incompatible with other types.<br>I will mail you a modified source file so you can try if the fixes the problem.<br><br>Berk<br><br><hr id="stopSpelling">> Subject: RE: [gmx-users] Possible bug in parallelization, PME or        load-balancing on Gromacs 4.0_rc1 ??<br>> From: st01397@student.uib.no<br>> To: gmx-users@gromacs.org<br>> Date: Wed, 1 Oct 2008 14:41:19 +0200<br>> CC: gmx3@hotmail.com<br>> <br>> Hi again Berk,<br>> I know that this particular run used no more than 1:40 hours (( I was<br>> following it), but I am not able to cough up the complete log as it was<br>> accidentally overwritten by a new run.<br>> <br>> <br>> I do however have the same phenomenon in a shorter annealing trial. I<br>> enclose the entire log in this mail, and show excerpts below.<br>> <br>> My startup script for this run looked like this:<br>> ------------------------------<br>> #!/bin/bash<br>> #PBS -A fysisk<br>> #PBS -N pmf_hydanneal_anneal2<br>> #PBS -o pmf_hydanneal.o<br>> #PBS -e pmf.hydanneal.err<br>> #PBS -l walltime=1:00:00,mppwidth=50,mppnppn=4<br>> cd /work/bjornss/pmf/structII/hydrate_annealing/anneal2<br>> source $HOME/gmx_latest_250908/bin/GMXRC<br>> <br>> aprun -n 50 parmdrun -s topol.tpr -maxh 1 -npme 18<br>> exit $?<br>> --------------------------<br>> <br>> Now this should stop after 0.99hours = 59:24<br>> <br>> But as you can see:<br>> <br>> <br>> ----------------------------------------------<br>> head md.log<br>> Log file opened on Mon Sep 29 20:11:42 2008<br>> Host: nid00039 pid: 16507 nodeid: 0 nnodes: 50<br>> The Gromacs distribution was built Mon Sep 29 13:25:26 CEST 2008 by<br>> bjornss@nid00163 (Linux 2.6.16.54-0.2.5-ss x86_64)<br>> <br>> <br>> <br>> :-) G R O M A C S (-:<br>> <br>> Groningen Machine for Chemical Simulation<br>> <br>> :-) VERSION 4.0_rc1 (-:<br>> <br>> ---------------------------------------------<br>> tail md.log -n 300 (excerpt)<br>> <br>> Step 518975: Run time exceeded 0.990 hours, will terminate the run<br>> <br>> ............................<br>> ,,,<br>> Parallel run - timing based on wallclock.<br>> <br>> NODE (s) Real (s) (%)<br>> Time: 1426.000 1426.000 100.0<br>> 23:46<br>> (Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>> Performance: 100.149 29.098 242.356 0.099<br>> Finished mdrun on node 0 Mon Sep 29 20:35:28 2008<br>> --------------------------<br>> <br>> That is. I got about 40% of the allotted walltime also here.<br>> Peculiarly 1:35 / 4:00 (hexagesimally) ~ 41%. That is the relation<br>> betweem scheduled walltime, and actually obtained time is about the same<br>> in both cases.<br>> <br>> Regards<br>> Bjørn<br>> <br>> <br>> On Wed, 2008-10-01 at 13:25 +0200, Berk Hess wrote:<br>> > Hi,<br>> > <br>> > The Cray XT4 has a torus network, but you don't get access to it as a<br>> > torus.<br>> > You will get assigned processors which can be anywhere in the machine<br>> > and they are usually never in a nice cube, but there are always some<br>> > missing.<br>> > Therefore software, such as Gromacs, can not make use of proper<br>> > Cartesian<br>> > <br>> > (torus) communication as one can for instance on a Blue Gene.<br>> > <br>> > I have no clue about the wallclock issue.<br>> > Can you find out if the run took 1.35 or 4 hours?<br>> > The start time is somewhere at the beginning of the log file.<br>> > <br>> > Berk<br>> > <br>> > <br>> > ______________________________________________________________________<br>> <br><br /><hr />Express yourself instantly with MSN Messenger! <a href='http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/' target='_new'>MSN Messenger</a></body>
</html>