<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
hi<br>
What did you think about put this option in the configure lammpi<br>
tcp-short=524288 for use 512kb<br>
is it rigth?<br>
<br>
<br>
David wrote:<br>
<blockquote type="cite"
cite="mid1062624213.4898.9.camel@h28n2fls34o1123.telia.com">
<pre wrap="">On Thu, 2003-09-04 at 01:52, Osmany Guirola Cruz wrote:
</pre>
<blockquote type="cite">
<pre wrap="">No i have dual PIII 933MHz coupled by tcp/ip
It is 100 Mbit
My cluster have a switch i have 32 dual in a sub-net and only one
machine is in my network (PBS)
I do a simulation whit 9500 molecules of water (SOL) 129 proteins
residue
No i dont do the gromacs benchmarks , HOW COULD I DO IT?
</pre>
</blockquote>
<pre wrap=""><!---->Download them from gromacs.org...
I have done the test with a switched 100 Mbit/s network with dual 800
MHz P3s, up until 10 nodes (i.e. 20 cpus).
</pre>
<blockquote type="cite">
<pre wrap=""> i forget something , my simulations whith cutoff are shorter than PME
</pre>
</blockquote>
<pre wrap=""><!---->
To efficiently use the dual processors you have to select another lam
option (rcp=usysv or rcp=sysv).
Now the real problem performancewise is PME. In the current 3.1.4
version PME does not behave well at all in parallel. On a Scali network
I use at most 4 dual Xeon nodes for my runs which have 30000 waters.
Since your system is smaller, performance will be even worse. Note that
the gromacs scaling benchmark is done with a (twin-range) cut-off rather
than PME. If you can live with a cut-off (and after all, the GROMOS96
force field was developed for use with a cut-off) you could maybe scale
to somewhat more processors:
nstlist = 5
rlist = 0.9
rcoulomb = 1.4
rvdw = 1.4
See how far you can go with that. Furthermore you want to control how
PBS/LAM allocates your processors. The communication is on a ring
topology in principle, so if you have two dual processor nodes
N0-p0, N0-p1, N1-p0, N1-p1
you want the jobs to be allocated in this order (to use the shared
memory communication) rather than
N0-p0, N1-p0, N0-p1, N1-p1
In the first example two of the four communications use shared memory,
in the other example none of them do.
</pre>
<blockquote type="cite">
<pre wrap="">Really i need help, i have 32 machines and only use one for my
simulations :-(
David wrote:
</pre>
<blockquote type="cite">
<pre wrap="">On Wed, 2003-09-03 at 22:16, Osmany Guirola Cruz wrote:
</pre>
<blockquote type="cite">
<pre wrap="">This is not the first time that i make the same question, how should i do gromacs work well whith lam in my linux cluster , a simulation in one machine is shorter than two machines , my last steep was compile the lammpi source whith the option tcp-short=524288 (512kb) and nothing happens .
PLEASE HELPMEEEEEEEEEEEEEEEEEEEEEE
</pre>
</blockquote>
<pre wrap="">I'll just assume you have single processor machines coupled by tcp/ip
network, is that correct?
Is it 10 Mbit/s, 100 Mbit/s or better?
Do you have a switch between the machines or a hub?
How large is your system to simulate?
Did you try to reproduce the gromacs benchmarks?
</pre>
<blockquote type="cite">
<pre wrap="">
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<br>
</body>
</html>