Hi,<div><br></div><div>how many CPUs do you try to use? How big is your system. What kind of interconnect? Since you use condor probably some pretty slow interconnect. Than you can't aspect it to work on many CPUs. If you want to use many CPUs for MD you need a faster interconnect.</div>
<div><br></div><div>Roland<br><br><div class="gmail_quote">2010/4/2 Hsin-Lin Chiang <span dir="ltr"><<a href="mailto:jiangsl@phys.sinica.edu.tw">jiangsl@phys.sinica.edu.tw</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div bgcolor="#ffffff">
Hi,
<br><br>Do someone use gromacs, lam, and condor together
here?
<br>I use gromacs with lam/mpi on condor
system.
<br>Everytime I submit the parallel
job.
<br>I got the node which is occupied before and the performance of each
cpu is
below
10%.
<br>How should I change the
script?
<br>Below is one submit script and two executable
script.
<br>
<br>
condor_mpi:
<br>----
<br>#!/bin/bash
<br>Universe =
parallel
<br>Executable =
./lamscript
<br>machine_count =
2
<br>output =
md_$(NODE).out
<br>error =
md_$(NODE).err
<br>log =
md.log
<br>arguments =
/stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md.sh
<br>+WantIOProxy =
True
<br>should_transfer_files =
yes
<br>when_to_transfer_output =
on_exit
<br>Queue
<br>-------
<br>
<br>
lamscript:
<br>
-------
<br>#!/bin/sh
<br>
<br>_CONDOR_PROCNO=$_CONDOR_PROCNO
<br>_CONDOR_NPROCS=$_CONDOR_NPROCS
<br>_CONDOR_REMOTE_SPOOL_DIR=$_CONDOR_REMOTE_SPOOL_DIR
<br>
<br>SSHD_SH=`condor_config_val
libexec`
<br>SSHD_SH=$SSHD_SH/sshd.sh
<br>
<br>CONDOR_SSH=`condor_config_val
libexec`
<br>CONDOR_SSH=$CONDOR_SSH/condor_ssh
<br>
<br># Set this to the bin directory of your lam
installation
<br># This also must be in your .cshrc file, so the remote
side
<br># can find
it!
<br>export
LAMDIR=/stathome/jiangsl/soft/lam-7.1.4
<br>export
PATH=${LAMDIR}/bin:${PATH}
<br>export
LD_LIBRARY_PATH=/lib:/usr/lib:$LAMDIR/lib:.:/opt/intel/compilers/lib
<br>
<br>. $SSHD_SH $_CONDOR_PROCNO
$_CONDOR_NPROCS
<br>
<br># If not the head node, just sleep forever, to let
the
<br># sshds
run
<br>if [ $_CONDOR_PROCNO -ne 0
]
<br>then
<br>
wait
<br>
sshd_cleanup
<br>
exit
0
<br>fi
<br>
<br>EXECUTABLE=$1
<br>shift
<br>
<br># the binary is copied but the executable flag is
cleared.
<br># so the script have to take care of
this
<br>chmod +x
$EXECUTABLE
<br>
<br># to allow multiple lam jobs running on a single
machine,
<br># we have to give somewhat unique
value
<br>export
LAM_MPI_SESSION_SUFFIX=$$
<br>export
LAMRSH=$CONDOR_SSH
<br># when a job is killed by the user, this script will get
sigterm
<br># This script have to catch it and do the cleaning for
the
<br># lam
environment
<br>finalize()
<br>{
<br>sshd_cleanup
<br>lamhalt
<br>exit
<br>}
<br>trap finalize
TERM
<br>
<br>CONDOR_CONTACT_FILE=$_CONDOR_SCRATCH_DIR/contact
<br>export
$CONDOR_CONTACT_FILE
<br># The second field in the contact file is the machine
name
<br># that condor_ssh knows how to use. Note that this used
to
<br># say "sort -n +0 ...", but -n option is now
deprecated.
<br>sort < $CONDOR_CONTACT_FILE | awk '{print $2}' >
machines
<br>
<br># start the lam
environment
<br># For older versions of lam you may need to remove the -ssi boot rsh
line
<br>lamboot -ssi boot rsh -ssi rsh_agent "$LAMRSH -x"
machines
<br>
<br>if [ $? -ne 0
]
<br>then
<br> echo "lamscript error
booting
lam"
<br> exit
1
<br>fi
<br>
<br>mpirun C -ssi rpi usysv -ssi coll_smp 1 $EXECUTABLE $@
&
<br>
<br>
CHILD=$!
<br>TMP=130
<br>while [ $TMP -gt 128 ] ;
do
<br> wait
$CHILD
<br>
TMP=$?;
<br>done
<br>
<br># clean up
files
<br>sshd_cleanup
<br>/bin/rm -f
machines
<br>
<br># clean up
lam
<br>lamhalt
<br>
<br>exit
$TMP
<br>----
<br>
<br>
md.sh
<br>
----
<br>#!/bin/sh
<br>#running
GROMACS
<br>/stathome/jiangsl/soft/gromacs-4.0.5/bin/mdrun_mpi_d
\
<br>-s /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.tpr
\
<br>-e /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.edr
\
<br>-o /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.trr
\
<br>-g /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.log
\
<br>-c
/stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.gro
<br>-----
<br>
<br>
<br>Hsin-Lin
<br>
</div>
<br>--<br>
gmx-users mailing list <a href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</a><br>
<a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>
Please search the archive at <a href="http://www.gromacs.org/search" target="_blank">http://www.gromacs.org/search</a> before posting!<br>
Please don't post (un)subscribe requests to the list. Use the<br>
www interface or send it to <a href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</a>.<br>
Can't post? Read <a href="http://www.gromacs.org/mailing_lists/users.php" target="_blank">http://www.gromacs.org/mailing_lists/users.php</a><br></blockquote></div><br><br clear="all"><br>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>
865-241-1537, ORNL PO BOX 2008 MS6309<br>
</div>