Hi,<div><br></div><div>how many CPUs do you try to use? How big is your system. What kind of interconnect? Since you use condor probably some pretty slow interconnect. Than you can&#39;t aspect it to work on many CPUs. If you want to use many CPUs for MD you need a faster interconnect.</div>

<div><br></div><div>Roland<br><br><div class="gmail_quote">2010/4/2 Hsin-Lin Chiang <span dir="ltr">&lt;<a href="mailto:jiangsl@phys.sinica.edu.tw">jiangsl@phys.sinica.edu.tw</a>&gt;</span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">






<div bgcolor="#ffffff">
Hi,

<br><br>Do someone use gromacs, lam, and condor together 
here?

<br>I use gromacs with lam/mpi on condor 
system.

<br>Everytime I submit the parallel 
job.

<br>I got the node which is occupied before and the performance of each 
cpu is 
below 
10%.

<br>How should I change the 
script?

<br>Below is one submit script and two executable 
script.

<br>

<br>
condor_mpi:
<br>----

<br>#!/bin/bash

<br>Universe = 
parallel

<br>Executable = 
./lamscript

<br>machine_count = 
2

<br>output = 
md_$(NODE).out

<br>error = 
md_$(NODE).err

<br>log = 
md.log

<br>arguments = 
/stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md.sh

<br>+WantIOProxy = 
True

<br>should_transfer_files = 
yes

<br>when_to_transfer_output = 
on_exit

<br>Queue

<br>-------
<br>

<br>
lamscript:
<br>
------- 
<br>#!/bin/sh

<br>

<br>_CONDOR_PROCNO=$_CONDOR_PROCNO

<br>_CONDOR_NPROCS=$_CONDOR_NPROCS

<br>_CONDOR_REMOTE_SPOOL_DIR=$_CONDOR_REMOTE_SPOOL_DIR

<br>

<br>SSHD_SH=`condor_config_val 
libexec`

<br>SSHD_SH=$SSHD_SH/sshd.sh

<br>

<br>CONDOR_SSH=`condor_config_val 
libexec`

<br>CONDOR_SSH=$CONDOR_SSH/condor_ssh

<br>

<br># Set this to the bin directory of your lam 
installation

<br># This also must be in your .cshrc file, so the remote 
side

<br># can find 
it!

<br>export 
LAMDIR=/stathome/jiangsl/soft/lam-7.1.4

<br>export 
PATH=${LAMDIR}/bin:${PATH}

<br>export 
LD_LIBRARY_PATH=/lib:/usr/lib:$LAMDIR/lib:.:/opt/intel/compilers/lib

<br>

<br>. $SSHD_SH $_CONDOR_PROCNO 
$_CONDOR_NPROCS

<br>

<br># If not the head node, just sleep forever, to let 
the

<br># sshds 
run

<br>if [ $_CONDOR_PROCNO -ne 0 
]

<br>then

<br>                
wait

<br>                
sshd_cleanup

<br>                
exit 
0

<br>fi

<br>

<br>EXECUTABLE=$1

<br>shift

<br>

<br># the binary is copied but the executable flag is 
cleared.

<br># so the script have to take care of 
this

<br>chmod +x 
$EXECUTABLE

<br>

<br># to allow multiple lam jobs running on a single 
machine,

<br># we have to give somewhat unique 
value

<br>export 
LAM_MPI_SESSION_SUFFIX=$$

<br>export 
LAMRSH=$CONDOR_SSH

<br># when a job is killed by the user, this script will get 
sigterm

<br># This script have to catch it and do the cleaning for 
the

<br># lam 
environment

<br>finalize()

<br>{

<br>sshd_cleanup

<br>lamhalt

<br>exit

<br>}

<br>trap finalize 
TERM

<br>

<br>CONDOR_CONTACT_FILE=$_CONDOR_SCRATCH_DIR/contact

<br>export 
$CONDOR_CONTACT_FILE

<br># The second field in the contact file is the machine 
name

<br># that condor_ssh knows how to use. Note that this used 
to

<br># say &quot;sort -n +0 ...&quot;, but -n option is now 
deprecated.

<br>sort &lt; $CONDOR_CONTACT_FILE | awk &#39;{print $2}&#39; &gt; 
machines

<br>

<br># start the lam 
environment

<br># For older versions of lam you may need to remove the -ssi boot rsh
 
line

<br>lamboot -ssi boot rsh -ssi rsh_agent &quot;$LAMRSH -x&quot; 
machines

<br>

<br>if [ $? -ne 0 
]

<br>then

<br>        echo &quot;lamscript error 
booting 
lam&quot;

<br>        exit 
1

<br>fi

<br>

<br>mpirun C -ssi rpi usysv -ssi coll_smp 1 $EXECUTABLE $@ 
&amp;

<br>
<br>
CHILD=$!

<br>TMP=130

<br>while [ $TMP -gt 128 ] ; 
do

<br>        wait 
$CHILD

<br>        
TMP=$?;

<br>done

<br>

<br># clean up 
files

<br>sshd_cleanup

<br>/bin/rm -f 
machines

<br>

<br># clean up 
lam

<br>lamhalt

<br>

<br>exit 
$TMP

<br>----

<br>

<br>
md.sh
<br>
----
<br>#!/bin/sh

<br>#running 
GROMACS

<br>/stathome/jiangsl/soft/gromacs-4.0.5/bin/mdrun_mpi_d 
\

<br>-s /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.tpr 
\

<br>-e /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.edr 
\

<br>-o /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.trr 
\

<br>-g /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.log 
\

<br>-c 
/stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.gro

<br>-----

<br>

<br>
<br>Hsin-Lin

<br>


</div>


<br>--<br>
gmx-users mailing list    <a href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</a><br>
<a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>
Please search the archive at <a href="http://www.gromacs.org/search" target="_blank">http://www.gromacs.org/search</a> before posting!<br>
Please don&#39;t post (un)subscribe requests to the list. Use the<br>
www interface or send it to <a href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</a>.<br>
Can&#39;t post? Read <a href="http://www.gromacs.org/mailing_lists/users.php" target="_blank">http://www.gromacs.org/mailing_lists/users.php</a><br></blockquote></div><br><br clear="all"><br>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>

865-241-1537, ORNL PO BOX 2008 MS6309<br>
</div>