Justin,<div><br></div><div>I think the interaction kernel is not OK on your PowerPC machine. I assume that from: 1) The force seems to be zero (minimization output). 2) When you use the all-to-all kernel which is not available for the powerpc kernel, it automatically falls back to the C kernel and then it works.</div>


<br>What is the kernel you are using? It should say in the log file. Look for: &quot;Configuring single precision IBM Power6-specific Fortran kernels&quot; or &quot;Testing Altivec/VMX support&quot;<div><br></div><div>You can also look in the config.h whether  GMX_POWER6 and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with one/both of them deactivated and see whether that solves it. This will make it slower too. Thus if this is indeed the problem, you will probably want to figure out why the fastest kernel doesn&#39;t work correctly to get good performance.<br>


<div><br></div><div>Roland</div><div><br><br><div class="gmail_quote">On Mon, Sep 27, 2010 at 4:59 PM, Justin A. Lemkul <span dir="ltr">&lt;<a href="mailto:jalemkul@vt.edu">jalemkul@vt.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

Hi All,<br>

<br>

I&#39;m hoping I might get some tips in tracking down the source of an issue that appears to be hardware-specific, leading to crashes in my system.  The failures are occurring on our supercomputer (Mac OSX 10.3, PowerPC).  Running the same .tpr file on my laptop (Mac OSX 10.5.8, Intel Core2Duo) and on another workstation (Ubuntu 10.04, AMD64) produce identical results.  I suspect the problem stems from unsuccessful energy minimization, which then leads to a crash when running full MD.  All jobs were run in parallel on two cores.  The supercomputer does not support threading, so MPI is invoked using MPICH-1.2.5 (native MPI implementation on the cluster).<br>


<br>

<br>

Details as follows:<br>

<br>

EM md.log file: successful run (Intel Core2Duo or AMD64)<br>

<br>

Steepest Descents converged to Fmax &lt; 1000 in 7 steps<br>

Potential Energy  = -4.8878180e+04<br>

Maximum force     =  8.7791553e+02 on atom 5440<br>

Norm of force     =  1.1781271e+02<br>

<br>

<br>

EM md.log file: unsuccessful run (PowerPC)<br>

<br>

Steepest Descents converged to Fmax &lt; 1000 in 1 steps<br>

Potential Energy  = -2.4873273e+04<br>

Maximum force     =  0.0000000e+00 on atom 0<br>

Norm of force     =            nan<br>

<br>

<br>

MD invoked from the minimized structure generated on my laptop or AMD64 runs successfully (at least for a few hundred steps in my test), but the MD on the PowerPC cluster fails immediately:<br>

<br>

           Step           Time         Lambda<br>

              0        0.00000        0.00000<br>

<br>

   Energies (kJ/mol)<br>

            U-B    Proper Dih.  Improper Dih.      CMAP Dih.GB Polarization<br>

    7.93559e+03    9.34958e+03    2.24036e+02   -2.47750e+03   -7.83599e+04<br>

          LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)      Potential<br>

    7.70042e+03    9.94520e+04   -1.17168e+04   -5.79783e+04   -2.55780e+04<br>

    Kinetic En.   Total Energy    Temperature Pressure (bar)   Constr. rmsd<br>

            nan            nan            nan    0.00000e+00            nan<br>

  Constr.2 rmsd<br>

            nan<br>

<br>

DD  step 9 load imb.: force  3.0%<br>

<br>

<br>

-------------------------------------------------------<br>

Program mdrun_4.5.1_mpi, VERSION 4.5.1<br>

Source code file: nsgrid.c, line: 601<br>

<br>

Range checking error:<br>

Explanation: During neighborsearching, we assign each particle to a grid<br>

based on its coordinates. If your system contains collisions or parameter<br>

errors that give particles very high velocities you might end up with some<br>

coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot<br>

put these on a grid, so this is usually where we detect those errors.<br>

Make sure your system is properly energy-minimized and that the potential<br>

energy seems reasonable before trying again.<br>

Variable ind has value 7131. It should have been within [ 0 .. 7131 ]<br>

<br>

For more information and tips for troubleshooting, please check the GROMACS<br>

website at <a href="http://www.gromacs.org/Documentation/Errors" target="_blank">http://www.gromacs.org/Documentation/Errors</a><br>

-------------------------------------------------------<br>

<br>

It seems as if the crash really shouldn&#39;t be happening, if the value range is inclusive.<br>

<br>

Running with all-vs-all kernels works, but the performance is horrendously slow (&lt;300 ps per day for a 7131-atom system) so I am attempting to use long cutoffs (2.0 nm) as others on the list have suggested.<br>

<br>

Details of the installations and .mdp files are appended below.<br>

<br>

-Justin<br>

<br>

=== em.mdp ===<br>

; Run parameters<br>

integrator      = steep         ; EM<br>

emstep      = 0.005<br>

emtol       = 1000<br>

nsteps      = 50000<br>

nstcomm         = 1<br>

comm_mode   = angular       ; non-periodic system<br>

; Bond parameters<br>

constraint_algorithm    = lincs<br>

constraints             = all-bonds<br>

continuation    = no            ; starting up<br>

; required cutoffs for implicit<br>

nstlist         = 1<br>

ns_type         = grid<br>

rlist           = 2.0<br>

rcoulomb        = 2.0<br>

rvdw            = 2.0<br>

; cutoffs required for qq and vdw<br>

coulombtype     = cut-off<br>

vdwtype     = cut-off<br>

; temperature coupling<br>

tcoupl          = no<br>

; Pressure coupling is off<br>

Pcoupl          = no<br>

; Periodic boundary conditions are off for implicit<br>

pbc                 = no<br>

; Settings for implicit solvent<br>

implicit_solvent    = GBSA<br>

gb_algorithm        = OBC<br>

rgbradii            = 2.0<br>

<br>

<br>

=== md.mdp ===<br>

<br>

; Run parameters<br>

integrator      = sd            ; velocity Langevin dynamics<br>

dt                  = 0.002<br>

nsteps          = 2500000               ; 5000 ps (5 ns)<br>

nstcomm         = 1<br>

comm_mode   = angular       ; non-periodic system<br>

; Output parameters<br>

nstxout         = 0             ; nst[xvf]out = 0 to suppress useless .trr output<br>

nstvout         = 0<br>

nstfout         = 0<br>

nstlog      = 5000          ; 10 ps<br>

nstenergy   = 5000          ; 10 ps<br>

nstxtcout   = 5000          ; 10 ps<br>

; Bond parameters<br>

constraint_algorithm    = lincs<br>

constraints             = all-bonds<br>

continuation    = no            ; starting up<br>

; required cutoffs for implicit<br>

nstlist         = 10<br>

ns_type         = grid<br>

rlist           = 2.0<br>

rcoulomb        = 2.0<br>

rvdw            = 2.0<br>

; cutoffs required for qq and vdw<br>

coulombtype     = cut-off<br>

vdwtype     = cut-off<br>

; temperature coupling<br>

tc_grps         = System<br>

tau_t           = 1.0   ; inverse friction coefficient for Langevin (ps^-1)<br>

ref_t           = 310<br>

; Pressure coupling is off<br>

Pcoupl          = no<br>

; Generate velocities is on<br>

gen_vel         = yes           <br>

gen_temp        = 310<br>

gen_seed        = 173529<br>

; Periodic boundary conditions are off for implicit<br>

pbc                 = no<br>

; Free energy must be off to use all-vs-all kernels<br>

; default, but just for the sake of being pedantic<br>

free_energy = no<br>

; Settings for implicit solvent<br>

implicit_solvent    = GBSA<br>

gb_algorithm        = OBC<br>

rgbradii            = 2.0<br>

<br>

<br>

=== Installation commands for the cluster ===<br>

<br>

$ ./configure --prefix=/home/rdiv1001/gromacs-4.5 CPPFLAGS=&quot;-I/home/rdiv1001/fftw-3.0.1-osx/include&quot; LDFLAGS=&quot;-L/home/rdiv1001/fftw-3.0.1-osx/lib&quot; --disable-threads --without-x --program-suffix=_4.5.1_s<br>


<br>

$ make<br>

<br>

$ make install<br>

<br>

$ make distclean<br>

<br>

$ ./configure --prefix=/home/rdiv1001/gromacs-4.5 CPPFLAGS=&quot;-I/home/rdiv1001/fftw-3.0.1-osx/include&quot; LDFLAGS=&quot;-L/home/rdiv1001/fftw-3.0.1-osx/lib&quot; --disable-threads --without-x --program-suffix=_4.5.1_mpi --enable-mpi CXXCPP=&quot;/nfs/compilers/mpich-1.2.5/bin/mpicxx -E&quot;<br>


<br>

$ make mdrun<br>

<br>

$ make install-mdrun<br>

<br>

<br>

-- <br>

========================================<br>

<br>

Justin A. Lemkul<br>

Ph.D. Candidate<br>

ICTAS Doctoral Scholar<br>

MILES-IGERT Trainee<br>

Department of Biochemistry<br>

Virginia Tech<br>

Blacksburg, VA<br>

jalemkul[at]<a href="http://vt.edu" target="_blank">vt.edu</a> | (540) 231-9080<br>

<a href="http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin" target="_blank">http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin</a><br>

<br>

========================================<br><font color="#888888">

-- <br>

gmx-users mailing list    <a href="mailto:gmx-users@gromacs.org" target="_blank">gmx-users@gromacs.org</a><br>

<a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>

Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/Search" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/Search</a> before posting!<br>

Please don&#39;t post (un)subscribe requests to the list. Use the www interface or send it to <a href="mailto:gmx-users-request@gromacs.org" target="_blank">gmx-users-request@gromacs.org</a>.<br>

Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br>

</font></blockquote></div><br><br clear="all"><br>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>865-241-1537, ORNL PO BOX 2008 MS6309<br>

</div></div>