<HTML>

<HEAD>

<META content="text/html; charset=big5" http-equiv=Content-Type>

<META content="OPENWEBMAIL" name=GENERATOR>

</HEAD>

<BODY bgColor=#ffffff>

<pre>Hi,

<br />

<br />I tried to use 4 and 8 CPUs.

<br />There are about 6000 atoms in my system.

<br />The interconnect of our computer is the network with speed 1Gb but not optical fiber.

<br />

<br />I'm sorry for my poor English and I couldn't express well in my question.

<br />Everytime I submitted the parallel job, the nodes assigned to mehave been 100% loading,

<br />and the CPU source availble to me is less then 10%.

<br />I think there is something wrong with my submit script or executable script, 

<br />and I post them in my previous message.

<br />How should I correct my script?

<br />

<br />Hsin-Lin

<br /></pre>

<br /><font size="2">&gt; 

Hi, 

<br />&gt; 

<br />&gt; 

how many CPUs do you try to use? How big is your system. What kind of 

<br />&gt; 

interconnect? Since you use condor probably some pretty slow interconnect. 

<br />&gt; 

Than you can't aspect it to work on many CPUs. If you want to use many CPUs 

<br />&gt; 

for MD you need a faster interconnect. 

<br />&gt; 

<br />&gt; 

Roland 

<br />&gt; 

<br />&gt; 

2010/4/2 Hsin-Lin Chiang &lt;jiangsl@phys.sinica.edu.tw&gt; 

<br />&gt; 

<br />&gt; 

&gt; &#160;Hi, 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; Do someone use gromacs, lam, and condor together here? 

<br />&gt; 

&gt; I use gromacs with lam/mpi on condor system. 

<br />&gt; 

&gt; Everytime I submit the parallel job. 

<br />&gt; 

&gt; I got the node which is occupied before and the performance of each cpu is 

<br />&gt; 

&gt; below 10%. 

<br />&gt; 

&gt; How should I change the script? 

<br />&gt; 

&gt; Below is one submit script and two executable script. 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; condor_mpi: 

<br />&gt; 

&gt; ---- 

<br />&gt; 

&gt; #!/bin/bash 

<br />&gt; 

&gt; Universe = parallel 

<br />&gt; 

&gt; Executable = ./lamscript 

<br />&gt; 

&gt; machine_count = 8

<br />&gt; 

&gt; output = md_$(NODE).out 

<br />&gt; 

&gt; error = md_$(NODE).err 

<br />&gt; 

&gt; log = md.log 

<br />&gt; 

&gt; arguments = /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md.sh 

<br />&gt; 

&gt; +WantIOProxy = True 

<br />&gt; 

&gt; should_transfer_files = yes 

<br />&gt; 

&gt; when_to_transfer_output = on_exit 

<br />&gt; 

&gt; Queue 

<br />&gt; 

&gt; ------- 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; lamscript: 

<br />&gt; 

&gt; ------- 

<br />&gt; 

&gt; #!/bin/sh 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; _CONDOR_PROCNO=$_CONDOR_PROCNO 

<br />&gt; 

&gt; _CONDOR_NPROCS=$_CONDOR_NPROCS 

<br />&gt; 

&gt; _CONDOR_REMOTE_SPOOL_DIR=$_CONDOR_REMOTE_SPOOL_DIR 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; SSHD_SH=`condor_config_val libexec` 

<br />&gt; 

&gt; SSHD_SH=$SSHD_SH/sshd.sh 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; CONDOR_SSH=`condor_config_val libexec` 

<br />&gt; 

&gt; CONDOR_SSH=$CONDOR_SSH/condor_ssh 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # Set this to the bin directory of your lam installation 

<br />&gt; 

&gt; # This also must be in your .cshrc file, so the remote side 

<br />&gt; 

&gt; # can find it! 

<br />&gt; 

&gt; export LAMDIR=/stathome/jiangsl/soft/lam-7.1.4 

<br />&gt; 

&gt; export PATH=${LAMDIR}/bin:${PATH} 

<br />&gt; 

&gt; export LD_LIBRARY_PATH=/lib:/usr/lib:$LAMDIR/lib:.:/opt/intel/compilers/lib 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; . $SSHD_SH $_CONDOR_PROCNO $_CONDOR_NPROCS 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # If not the head node, just sleep forever, to let the 

<br />&gt; 

&gt; # sshds run 

<br />&gt; 

&gt; if [ $_CONDOR_PROCNO -ne 0 ] 

<br />&gt; 

&gt; then 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; wait 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; sshd_cleanup 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; exit 0 

<br />&gt; 

&gt; fi 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; EXECUTABLE=$1 

<br />&gt; 

&gt; shift 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # the binary is copied but the executable flag is cleared. 

<br />&gt; 

&gt; # so the script have to take care of this 

<br />&gt; 

&gt; chmod +x $EXECUTABLE 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # to allow multiple lam jobs running on a single machine, 

<br />&gt; 

&gt; # we have to give somewhat unique value 

<br />&gt; 

&gt; export LAM_MPI_SESSION_SUFFIX=$$ 

<br />&gt; 

&gt; export LAMRSH=$CONDOR_SSH 

<br />&gt; 

&gt; # when a job is killed by the user, this script will get sigterm 

<br />&gt; 

&gt; # This script have to catch it and do the cleaning for the 

<br />&gt; 

&gt; # lam environment 

<br />&gt; 

&gt; finalize() 

<br />&gt; 

&gt; { 

<br />&gt; 

&gt; sshd_cleanup 

<br />&gt; 

&gt; lamhalt 

<br />&gt; 

&gt; exit 

<br />&gt; 

&gt; } 

<br />&gt; 

&gt; trap finalize TERM 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; CONDOR_CONTACT_FILE=$_CONDOR_SCRATCH_DIR/contact 

<br />&gt; 

&gt; export $CONDOR_CONTACT_FILE 

<br />&gt; 

&gt; # The second field in the contact file is the machine name 

<br />&gt; 

&gt; # that condor_ssh knows how to use. Note that this used to 

<br />&gt; 

&gt; # say &quot;sort -n +0 ...&quot;, but -n option is now deprecated. 

<br />&gt; 

&gt; sort &lt; $CONDOR_CONTACT_FILE | awk '{print $2}' &gt; machines 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # start the lam environment 

<br />&gt; 

&gt; # For older versions of lam you may need to remove the -ssi boot rsh line 

<br />&gt; 

&gt; lamboot -ssi boot rsh -ssi rsh_agent &quot;$LAMRSH -x&quot; machines 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; if [ $? -ne 0 ] 

<br />&gt; 

&gt; then 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; echo &quot;lamscript error booting lam&quot; 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; exit 1 

<br />&gt; 

&gt; fi 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; mpirun C -ssi rpi usysv -ssi coll_smp 1 $EXECUTABLE $@ &amp; 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; CHILD=$! 

<br />&gt; 

&gt; TMP=130 

<br />&gt; 

&gt; while [ $TMP -gt 128 ] ; do 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; wait $CHILD 

<br />&gt; 

&gt; &#160; &#160; &#160; &#160; TMP=$?; 

<br />&gt; 

&gt; done 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # clean up files 

<br />&gt; 

&gt; sshd_cleanup 

<br />&gt; 

&gt; /bin/rm -f machines 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; # clean up lam 

<br />&gt; 

&gt; lamhalt 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; exit $TMP 

<br />&gt; 

&gt; ---- 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; md.sh 

<br />&gt; 

&gt; ---- 

<br />&gt; 

&gt; #!/bin/sh 

<br />&gt; 

&gt; #running GROMACS 

<br />&gt; 

&gt; /stathome/jiangsl/soft/gromacs-4.0.5/bin/mdrun_mpi_d \ 

<br />&gt; 

&gt; -s /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.tpr \ 

<br />&gt; 

&gt; -e /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.edr \ 

<br />&gt; 

&gt; -o /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.trr \ 

<br />&gt; 

&gt; -g /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.log \ 

<br />&gt; 

&gt; -c /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.gro 

<br />&gt; 

&gt; ----- 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; 

<br />&gt; 

&gt; Hsin-Lin 

<br />

</font>

</BODY>

</HTML>