<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 10pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'><div style="text-align: left;">Hi,<br><br>I have to correct myself.<br><br>The PME fix is in the 4.0 rc2 release, but was not in the CVS.<br>Now it is also in CVS.<br><br>So you can try changing the line, or you can download 4.0 rc2.<br><br>Berk<br></div><br><br><hr id="EC_stopSpelling">From: gmx3@hotmail.com<br>To: gmx-users@gromacs.org<br>Subject: RE: [gmx-users] Possible bug in parallelization,        PME or        load-balancing on Gromacs 4.0_rc1 ??<br>Date: Tue, 30 Sep 2008 10:10:15 +0200<br><br>
<meta http-equiv="Content-Type" content="text/html; charset=unicode">
<meta name="Generator" content="Microsoft SafeHTML">
<style>
.ExternalClass .EC_hmmessage P
{padding:0px;}
.ExternalClass body.EC_hmmessage
{font-size:10pt;font-family:Tahoma;}
</style>
<div style="text-align: left;">Hi,<br><br>In 4.0 rc1 there is a bug in PME.<br>Erik mailed that this has been fixed in 4.0 rc2, but actually this is not the case.<br>The fix was in the head branch of CVS, but not in the release tree.<br>I have committed the fix now.<br></div><br>Could you check if the crash is due to this?<br>After line 1040 of src/mdlib/pme.c p0++; has to be added:<br> if ((kx>0) || (ky>0)) {<br> kzstart = 0;<br> } else {<br> kzstart = 1;<br> p0++;<br> }<br><br>Berk.<br><br><br><hr id="EC_stopSpelling">> Subject: RE: [gmx-users] Possible bug in parallelization,        PME or        load-balancing on Gromacs 4.0_rc1 ??<br>> From: st01397@student.uib.no<br>> To: gmx-users@gromacs.org<br>> Date: Mon, 29 Sep 2008 19:50:57 +0200<br>> <br>> The only Error message I can find is the rather cryptic::<br>> <br>> NOTE: Turning on dynamic load balancing<br>> <br>> _pmii_daemon(SIGCHLD): PE 4 exit signal Killed<br>> [NID 1412]Apid 159787: initiated application termination<br>> <br>> There are no error's apart from that.<br>> This may not be very helpful, but I googled this particular error and<br>> came up with another massively parallel code: Gadget2, also doing domain<br>> decomposition, and this link:<br>> <br>> http://www.mpa-garching.mpg.de/gadget/gadget-list/0213.html<br>> <br>> Furthermore I can now report that this error is endemic in all my sims<br>> using harmonic position restraints in GROMACS 4.0_beta1 and GMX<br>> 4.0_rc1. <br>> (I have yet to check if it remains an issue without restraints, but I<br>> strongly suspect it does: I did some earlier sims on a gmx version<br>> downloaded from CVS on 20/08/08 when the DD scheme was just barely<br>> implemented, seeing similar unexplained crashes.) <br>> <br>> I thus have some reasons to think it has to do with the new<br>> domain-decomposition implementation.<br>> <br>> About core dumps. I will talk to our HPC staff, and get back to you with<br>> something more substantial I hope.<br>> I guess I could recompile gmx for the Totalview debugger, and give you<br>> some debugging information from that. Would this be helpful?? <br>> <br>> Would it be helpful to give you diagnostics from running mdrun<br>> verbosely or with the -debug flag??<br>> <br>> If you think it beneficial I can also provide the config.log.<br>> <br>> My configscript is really quite minimal:<br>> ------------------------<br>> ! /bin/bash<br>> export LDFLAGS="-lsci"<br>> export CFLAGS="-march=barcelona -O3"<br>> <br>> ./configure --prefix=$HOME/gmx_latest_290908 --disable-fortran<br>> --enable-mpi --without-x --without-xml --with-external-lapack<br>> --with-external-blas --program-prefix=par CC=cc MPICC=cc<br>> ---------------------------------<br>> <br>> I am using fftw-3.1.1, the gcc-4.2.0 quadcore-edition compiler.<br>> Cray's optimized XT LibSci 10.3.0 blas/lapack routines<br>> and Cray's optimized MPI library (based on MPICH2 I believe)<br>> <br>> I will get back to you with more soon<br>> <br>> Regards and thanks<br>> Bjørn<br>> <br>> > <br>> > <br>> > Can you produce core dump files?<br>> > <br>> > Berk<br>> > <br>> <br>> > > PBS .o: <br>> > > Application 159316 exit codes: 137<br>> > > Application 159316 exit signals: Killed<br>> > > Application 159316 resources: utime 0, stime 0<br>> > > --------------------------------------------------<br>> > > Begin PBS Epilogue hexagon.bccs.uib.no<br>> > > Date: Mon Sep 29 12:32:54 CEST 2008<br>> > > Job ID: 65643.nid00003<br>> > > Username: bjornss<br>> > > Group: bjornss<br>> > > Job Name: pmf_hydanneal_heatup_400K<br>> > > Session: 10156<br>> > > Limits: walltime=05:00:00<br>> > > Resources:<br>> > > cput=00:00:00,mem=4940kb,vmem=22144kb,walltime=00:20:31<br>> > > Queue: batch<br>> > > Account: fysisk<br>> > > Base login-node: login5<br>> > > End PBS Epilogue Mon Sep 29 12:32:54 CEST 2008<br>> > > <br>> > > PBS .err:<br>> > > _pmii_daemon(SIGCHLD): PE 0 exit signal Killed<br>> > > [NID 702]Apid 159316: initiated application termination.<br>> > > <br>> > > As proper electrostatics is crucial to my modeling I am using PME<br>> > which<br>> > > comprises a large part of my calculation cost: 35-50%<br>> > > In the most extreme case, I use the following startup-script<br>> > > <br>> > > run.pbs:<br>> > > <br>> > > #!/bin/bash<br>> > > #PBS -A fysisk<br>> > > #PBS -N pmf_hydanneal_heatup_400K<br>> > > #PBS -o pmf_hydanneal.o<br>> > > #PBS -e pmf.hydanneal.err<br>> > > #PBS -l walltime=5:00:00,mppwidth=40,mppnppn=4<br>> > > <br>> > > cd /work/bjornss/pmf/structII/hydrate_annealing/heatup_400K<br>> > > source $HOME/gmx_latest_290908/bin/GMXRC<br>> > > <br>> > > aprun -n 40 parmdrun -s topol.tpr -maxh 5 -npme 20<br>> > > exit $?<br>> > > <br>> > > <br>> > > Now, apart from a significant reduction in the system dipole moment,<br>> > > there are no large changes in the system, nor significant<br>> > translations<br>> > > of the molecules in the box.<br>> > > <br>> > > I enclose the md.log and my parameter file. The run-topology<br>> > (topol.tpr)<br>> > > can be found at:<br>> > > <br>> > > http:/drop.io/mdanneal<br>> > > <br>> > > if anyone wants to try and replicate the crash on their local<br>> > cluster,<br>> > > they are welcome.<br>> > > If after such trials are attempted the error persists, I am willing<br>> > to<br>> > > post a bug on bugzilla.<br>> > > <br>> > > <br>> > > If more information is needed I will try to provide it upon request<br>> > > <br>> > > <br>> > > Regards and thanks for bothering<br>> > > <br>> > > -- <br>> > > ---------------------<br>> > > Bjørn Steen Saethre <br>> > > PhD-student<br>> > > Theoretical and Energy Physics Unit<br>> > > Institute of Physics and Technology<br>> > > Allegt, 41<br>> > > N-5020 Bergen<br>> > > Norway<br>> > > <br>> > > Tel(office) +47 55582869 <br>> > > <br>> > > <br>> > <br>> > <br>> > ______________________________________________________________________<br>> > Express yourself instantly with MSN Messenger! MSN Messenger<br>> > _______________________________________________<br>> > gmx-users mailing list gmx-users@gromacs.org<br>> > http://www.gromacs.org/mailman/listinfo/gmx-users<br>> > Please search the archive at http://www.gromacs.org/search before posting!<br>> > Please don't post (un)subscribe requests to the list. Use the <br>> > www interface or send it to gmx-users-request@gromacs.org.<br>> > Can't post? Read http://www.gromacs.org/mailing_lists/users.php<br>> <br>> _______________________________________________<br>> gmx-users mailing list gmx-users@gromacs.org<br>> http://www.gromacs.org/mailman/listinfo/gmx-users<br>> Please search the archive at http://www.gromacs.org/search before posting!<br>> Please don't post (un)subscribe requests to the list. Use the <br>> www interface or send it to gmx-users-request@gromacs.org.<br>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php<br><br><hr>Express yourself instantly with MSN Messenger! <a href="http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/" target="_blank">MSN Messenger</a>
<br /><hr />Express yourself instantly with MSN Messenger! <a href='http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/' target='_new'>MSN Messenger</a></body>
</html>