<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <title></title>

</head>

<body>

hi<br>

What did you think about put this option in the configure lammpi<br>

tcp-short=524288 for use 512kb<br>

is it rigth?<br>

<br>

<br>

David wrote:<br>

<blockquote type="cite"

 cite="mid1062624213.4898.9.camel@h28n2fls34o1123.telia.com">

  <pre wrap="">On Thu, 2003-09-04 at 01:52, Osmany Guirola Cruz wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">No i have dual PIII 933MHz   coupled by tcp/ip

It is 100 Mbit 

My cluster  have a switch  i have 32 dual in a sub-net and only one

machine is in my network (PBS)

I do a simulation whit 9500  molecules of water (SOL) 129 proteins

residue

No i dont do the gromacs benchmarks , HOW COULD I DO IT?

    </pre>

  </blockquote>

<pre wrap="">Download them from gromacs.org...

I have done the test with a switched 100 Mbit/s network with dual 800

MHz P3s, up until 10 nodes (i.e. 20 cpus). 

  </pre>

  <blockquote type="cite">

    <pre wrap=""> i forget something , my simulations whith cutoff are shorter than PME

    </pre>

  </blockquote>

  <pre wrap=""><!---->

To efficiently use the dual processors you have to select another lam

option (rcp=usysv or rcp=sysv).

Now the real problem performancewise is PME. In the current 3.1.4

version PME does not behave well at all in parallel. On a Scali network

I use at most 4 dual Xeon nodes  for my runs which have 30000 waters.

Since your system is smaller, performance will be even worse. Note that

the gromacs scaling benchmark is done with a (twin-range) cut-off rather

than PME. If you can live with a cut-off (and after all, the GROMOS96

force field was developed for use with a cut-off) you could maybe scale

to somewhat more processors:

nstlist  = 5

rlist    = 0.9

rcoulomb = 1.4

rvdw     = 1.4

See how far you can go with that. Furthermore you want to control how

PBS/LAM allocates your processors. The communication is on a ring

topology in principle, so if you have two dual processor nodes

N0-p0, N0-p1, N1-p0, N1-p1

you want the jobs to be allocated in this order (to use the shared

memory communication) rather than

N0-p0, N1-p0, N0-p1, N1-p1

In the first example two of the four communications use shared memory,

in the other example none of them do.

  </pre>

  <blockquote type="cite">

    <pre wrap="">Really i need help, i have 32 machines and  only use one for my

simulations :-( 

David wrote:

    </pre>

    <blockquote type="cite">

      <pre wrap="">On Wed, 2003-09-03 at 22:16, Osmany Guirola Cruz wrote:

      </pre>

      <blockquote type="cite">

        <pre wrap="">This is not the first time that i make the same question, how should i do gromacs work well whith lam in my linux cluster , a simulation in one machine is shorter than two machines , my last steep was compile the lammpi source whith the option  tcp-short=524288 (512kb) and nothing  happens . 

PLEASE HELPMEEEEEEEEEEEEEEEEEEEEEE

        </pre>

      </blockquote>

      <pre wrap="">I'll just assume you have single processor machines coupled by tcp/ip

network, is that correct?

Is it 10 Mbit/s, 100 Mbit/s or better?

Do you have a switch between the machines or a hub?

How large is your system to simulate?

Did you try to reproduce the gromacs benchmarks?

      </pre>

      <blockquote type="cite">

        <pre wrap="">    

        </pre>

      </blockquote>

    </blockquote>

  </blockquote>

</blockquote>

<br>

</body>

</html>