<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#ffffff">
Load balancing problems I can understand, but why would it take
longer in absolute time? I would have thought that some nodes would
simple be sitting idle, but this should not cause an increase in the
overall simulation time (15x at that!).<br>
<br>
There must be some extra communication?<br>
<br>
I agree with Justin that this seems like a strange thing to do, but
still I think that there must be some underlying coding issue
(probably one that only exists because of a reasonable assumption
that nobody would annihilate the largest part of their system).<br>
<br>
Chris.<br>
<br>
<br>
<pre>Luca Bellucci wrote:
><i> Hi Chris,
</i>><i> thank for the suggestions,
</i>><i> in the previous mail there is a mistake because
</i>><i> couple-moltype = SOL (for solvent) and not "Protein_chaim_P".
</i>><i> Now the problem of the load balance seems reasonable, because
</i>><i> the water box is large ~9.0 nm.
</i>
Now your outcome makes a lot more sense. You're decoupling all of the solvent?
I don't see how that is going to be physically stable or terribly meaningful,
but it explains your performance loss. You're annihilating a significant number
of interactions (probably the vast majority of all the nonbonded interactions in
the system), which I would expect would cause continuous load balancing issues.
-Justin
><i> However the problem exist and the performance loss is very high, so I have
</i>><i> redone calculations with this command:
</i>><i>
</i>><i> grompp -f
</i>><i> md.mdp -c ../Run-02/confout.gro -t ../Run-02/state.cpt -p ../topo.top -n ../index.ndx -o
</i>><i> md.tpr -maxwarn 1
</i>><i>
</i>><i> mdrun -s md.tpr -o md
</i>><i>
</i>><i> this is part of the md.mdp file:
</i>><i>
</i>><i> ; Run parameters
</i>><i> ; define = -DPOSRES
</i>><i> integrator        = md               
</i>><i> nsteps                = 1000        
</i>><i> dt                = 0.002               
</i>><i> [..]
</i>><i> free_energy = yes ; /no
</i>><i> init_lambda = 0.9
</i>><i> delta_lambda = 0.0
</i>><i> couple-moltype = SOL ; solvent water
</i>><i> couple-lambda0 = vdw-q
</i>><i> couple-lambda1 = none
</i>><i> couple-intramol= yes
</i>><i>
</i>><i> Result for free energy calculation
</i>><i> Computing: Nodes Number G-Cycles Seconds %
</i>><i> -----------------------------------------------------------------------
</i>><i> Domain decomp. 8 126 22.050 8.3 0.1
</i>><i> DD comm. load 8 15 0.009 0.0 0.0
</i>><i> DD comm. bounds 8 12 0.031 0.0 0.0
</i>><i> Comm. coord. 8 1001 17.319 6.5 0.0
</i>><i> Neighbor search 8 127 436.569 163.7 1.1
</i>><i> Force 8 1001 34241.576 12840.9 87.8
</i>><i> Wait + Comm. F 8 1001 19.486 7.3 0.0
</i>><i> PME mesh 8 1001 4190.758 1571.6 10.7
</i>><i> Write traj. 8 7 1.827 0.7 0.0
</i>><i> Update 8 1001 12.557 4.7 0.0
</i>><i> Constraints 8 1001 26.496 9.9 0.1
</i>><i> Comm. energies 8 1002 10.710 4.0 0.0
</i>><i> Rest 8 25.142 9.4 0.1
</i>><i> -----------------------------------------------------------------------
</i>><i> Total 8 39004.531 14627.1 100.0
</i>><i> -----------------------------------------------------------------------
</i>><i> -----------------------------------------------------------------------
</i>><i> PME redist. X/F 8 3003 3479.771 1304.9 8.9
</i>><i> PME spread/gather 8 4004 277.574 104.1 0.7
</i>><i> PME 3D-FFT 8 4004 378.090 141.8 1.0
</i>><i> PME solve 8 2002 55.033 20.6 0.1
</i>><i> -----------------------------------------------------------------------
</i>><i>         Parallel run - timing based on wallclock.
</i>><i>
</i>><i> NODE (s) Real (s) (%)
</i>><i> Time: 1828.385 1828.385 100.0
</i>><i> 30:28
</i>><i> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
</i>><i> Performance: 3.115 3.223 0.095 253.689
</i>><i>
</i>><i> I Switched off only the free_energy keyword and I redone the calculation
</i>><i> I have:
</i>><i> Computing: Nodes Number G-Cycles Seconds %
</i>><i> -----------------------------------------------------------------------
</i>><i> Domain decomp. 8 77 10.975 4.1 0.6
</i>><i> DD comm. load 8 1 0.001 0.0 0.0
</i>><i> Comm. coord. 8 1001 14.480 5.4 0.8
</i>><i> Neighbor search 8 78 136.479 51.2 7.3
</i>><i> Force 8 1001 1141.115 427.9 61.3
</i>><i> Wait + Comm. F 8 1001 17.845 6.7 1.0
</i>><i> PME mesh 8 1001 484.581 181.7 26.0
</i>><i> Write traj. 8 5 1.221 0.5 0.1
</i>><i> Update 8 1001 9.976 3.7 0.5
</i>><i> Constraints 8 1001 20.275 7.6 1.1
</i>><i> Comm. energies 8 992 5.933 2.2 0.3
</i>><i> Rest 8 19.670 7.4 1.1
</i>><i> -----------------------------------------------------------------------
</i>><i> Total 8 1862.552 698.5 100.0
</i>><i> -----------------------------------------------------------------------
</i>><i> -----------------------------------------------------------------------
</i>><i> PME redist. X/F 8 2002 92.204 34.6 5.0
</i>><i> PME spread/gather 8 2002 192.337 72.1 10.3
</i>><i> PME 3D-FFT 8 2002 177.373 66.5 9.5
</i>><i> PME solve 8 1001 22.512 8.4 1.2
</i>><i> -----------------------------------------------------------------------
</i>><i>         Parallel run - timing based on wallclock.
</i>><i>
</i>><i> NODE (s) Real (s) (%)
</i>><i> Time: 87.309 87.309 100.0
</i>><i> 1:27
</i>><i> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
</i>><i> Performance: 439.731 23.995 1.981 12.114
</i>><i> Finished mdrun on node 0 Mon Apr 4 16:52:04 2011
</i>><i>
</i>><i> Luca        
</i>><i>
</i>><i>
</i>><i>
</i>><i>
</i>>><i> If we accept your text at face value, then the simulation slowed down
</i>>><i> by a factor of 1500%, certainly not the 16% of the load balancing.
</i>>><i>
</i>>><i> Please let us know what version of gromacs and cut and paste your
</i>>><i> cammands that you used to run gromacs (so we can verify that you ran
</i>>><i> on the same number of processors) and cut and paste a diff of the .mdp
</i>>><i> files (so that we can verify that you ran for the same number of steps).
</i>>><i>
</i>>><i> You might be correct about the slowdown, but let's rule out some other
</i>>><i> more obvious problems first.
</i>>><i>
</i>>><i> Chris.
</i>>><i>
</i>>><i> -- original message --
</i>>><i>
</i>>><i>
</i>>><i> Dear all,
</i>>><i> when I run a single free energy simulation
</i>>><i> i noticed that there is a loss of performace with respect to
</i>>><i> the normal MD
</i>>><i>
</i>>><i> free_energy = yes
</i>>><i> init_lambda = 0.9
</i>>><i> delta_lambda = 0.0
</i>>><i> couple-moltype = Protein_Chain_P
</i>>><i> couple-lambda0 = vdw-q
</i>>><i> couple-lambda0 = none
</i>>><i> couple-intramol= yes
</i>>><i>
</i>>><i> Average load imbalance: 16.3 %
</i>>><i> Part of the total run time spent waiting due to load imbalance: 12.2 %
</i>>><i> Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
</i>>><i> X0 % Time: 1852.712 1852.712 100.0
</i>>><i>
</i>>><i> free_energy = no
</i>>><i> Average load imbalance: 2.7 %
</i>>><i> Part of the total run time spent waiting due to load imbalance: 1.7 %
</i>>><i> Time: 127.394 127.394 100.0
</i>>><i>
</i>>><i> It seems that the loss of performace is due in part to in the load
</i>>><i> imbalance in the domain decomposition, however I tried to change
</i>>><i> these keywords without benefit
</i>>><i> Any comment is welcome.
</i></pre>
</body>
</html>