[gmx-users] Simulation runs on iMac but explodes on cluster

Mark Abraham Mark.Abraham at anu.edu.au
Wed Jul 13 10:22:39 CEST 2011


On 13/07/2011 5:35 PM, Luke Goodsell wrote:
> Hi,
>
> As the subject suggests, I have a simulation that runs correctly on my 
> iMac, but fails when I try to run it on a cluster, and I am hoping 
> someone may be able to suggest which things to try first to resolve 
> the issue.
>
> Background:
> The simulation proceeds perfectly well on the iMac (OS X 10.5) without 
> error/warning. On the cluster, it begins producing multiple LINCS 
> warnings at step 14555 (of 7500000) and then segfaults after step 
> 14556 with:

Perhaps your simulation is intrinsically unstable, and you haven't 
gotten unlucky enough yet on the iMac. Check out 
http://www.gromacs.org/Documentation/Terminology/Blowing_Up

Mark

> [node-005:13244] *** Process received signal ***
> [node-005:13244] Signal: Segmentation fault (11)
> [node-005:13244] Signal code: Address not mapped (1)
> [node-005:13244] Failing at address: 0x2aaab1380520
> [node-005:13244] [ 0] /lib64/libpthread.so.0 [0x2aaaac402b10]
> [node-005:13244] [ 1] mdrun_mpi(nb_kernel410_x86_64_sse+0xa65) [0x947e25]
> [node-005:13244] [ 2] mdrun_mpi(do_nonbonded+0x780) [0x8ce890]
> [node-005:13244] [ 3] mdrun_mpi(do_force_lowlevel+0x308) [0x6842b8]
> [node-005:13244] [ 4] mdrun_mpi(do_force+0xc59) [0x6f7c19]
> [node-005:13244] [ 5] mdrun_mpi(do_md+0x5785) [0x626f75]
> [node-005:13244] [ 6] mdrun_mpi(mdrunner+0xa07) [0x61e8a7]
> [node-005:13244] [ 7] mdrun_mpi(main+0x1363) [0x62c5f3]
> [node-005:13244] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) 
> [0x2aaaac62d994]
> [node-005:13244] [ 9] mdrun_mpi(__gxx_personality_v0+0x479) [0x44b659]
> [node-005:13244] *** End of error message ***
>
> Things I have tried:
> * Both MPI and non-MPI versions on cluster (same result)
> * Harmonising FFTW - configured and compiled fftw3 from same source 
> using same configuration and ensured correct library was included 
> during configure step
> * Checking the Reproducibility documentation
> * Searching the archives - I didn't find anything that described a 
> similar problem.
>
> Things I think may be involved:
> * Different architectures - i686 vs x86_64 - don't know how to test 
> for this
> * Different BLAS/LAPACK libraries - I believe gromacs uses the vecLb 
> on OS X; maybe I could compile without external BLAS/LAPACK and see if 
> this makes a difference
> * Some other unknown problem
>
> I've currently spent more than 2 weeks trying to diagnose this problem 
> and don't seem to be making progress. Could anyone suggest what is the 
> most likely cause of this significant difference in output, and what I 
> could do to test/fix it?
>
> Any help is greatly appreciated.
>
> Luke
>




More information about the gromacs.org_gmx-users mailing list