<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
<br><br>> From: zhao0139@ntu.edu.sg<br>> To: gmx-users@gromacs.org<br>> Date: Tue, 6 Apr 2010 20:45:37 +0800<br>> Subject: [gmx-users] Re: loab imbalance<br>> <br>> <br>> > > > On 6/04/2010 5:39 PM, lina wrote:<br>> > > > > Hi everyone,<br>> > > > ><br>> > > > > Here is the result of the mdrun which was performed on 16cpus. I am not<br>> > > > > clear about it, was it due to using MPI reason? or some other reasons.<br>> > > > ><br>> > > > > Writing final coordinates.<br>> > > > ><br>> > > > > Average load imbalance: 1500.0 %<br>> > > > > Part of the total run time spent waiting due to load imbalance: 187.5 %<br>> > > > > Steps where the load balancing was limited by -rdd, -rcon and/or -dds:<br>> > > > > X 0 % Y 0 %<br>> > > > ><br>> > > > > NOTE: 187.5 % performance was lost due to load imbalance<br>> > > > > in the domain decomposition.<br>> > > > <br>> > > > You ran an inefficient but otherwise valid computation. Check out the <br>> > > > manual section on domain decomposition to learn why it was inefficient, <br>> > > > and whether you can do better.<br>> > > > <br>> > > > Mark<br>> > > <br>> > > I search the "decomposition" keyword on Gromacs manual, no match found.<br>> > > Are you positive about that? Thanks any way, but can you make it more<br>> > > problem-solved-oriented, so I can easily understand.<br>> > > <br>> > > Thanks and regards,<br>> > > <br>> > > lina<br>> > <br>> > This looks strange.<br>> > You have 1 core doing something and 15 cores doing nothing.<br>> > Do you only have one small molecule?<br>> > How many steps was this simulation?<br>> > <br>> > Berk<br>> <br>> I do not think there was only 1 core doing something and other 15 cores<br>> doing nothing.<br>> <br>> Below is the time-consumed on 8 cpus and 16 cpus. I tried twice to<br>> compare the results. <br>> <br>> 8cpus:<br>>         Parallel run - timing based on wallclock.<br>> <br>> NODE (s) Real (s) (%)<br>> Time: 52292.000 52292.000 100.0<br>> 14h31:32<br>> (Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>> Performance: 523.244 19.720 16.523 1.453<br>> Finished mdrun on node 0 Tue Apr 6 05:09:47 2010<br>> <br>> 16cpus:<br>> <br>>         Parallel run - timing based on wallclock.<br>> <br>> NODE (s) Real (s) (%)<br>> Time: 96457.000 96457.000 100.0<br>> 1d02h47:37<br>> (Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>> Performance: 283.696 10.701 8.957 2.679<br>> Finished mdrun on node 0 Mon Apr 5 01:36:18 2010<br>> <br>> Thanks and regards,<br>> <br>> lina<br><br>The first time I did not notice that 16 cpus are twice as slow as 8.<br>Are you really sure you did not mix things up?<br>The other way around the timings would make perfect sense.<br>If not, there is a problem with your 16 cpu simulation.<br><br>What load imbalance is reported for the 8 cpu run?<br><br>Berk<br><br>                                            <br /><hr />New Windows 7: Find the right PC for you. <a href='http://windows.microsoft.com/shop' target='_new'>Learn more.</a></body>
</html>