[gmx-users] problem with mpirun

huifang liu huifangliu1985 at gmail.com
Tue Sep 2 12:37:18 CEST 2008


Hi, Gromacs users,

   This command "grompp_ompi -np 6 -f pr_10_200.mdp -c
after_em_newton_1000.gro -p all.top -o pr_10_100.tpr -po
mdrun_pr_10_100.mdp"  run normally. But when i run the next command "mpirun
-np 6 mdrun_ompi -s pr_10_100.tpr -c after_pr_10_100.gro -o
after_pr_10_100.trr -e after_pr_10_100.edr -g after_pr_10_100.log -v". It
gave out as follows:

Wrote pdb files with previous and current coordinates
step 0
[node1:13598] *** Process received signal ***
[node1:13598] Signal: Segmentation fault (11)
[node1:13598] Signal code: Address not mapped (1)
[node1:13598] Failing at address: 0x2cc3f38
[node1:13600] *** Process received signal ***
[node1:13600] Signal: Segmentation fault (11)
[node1:13600] Signal code: Address not mapped (1)
[node1:13600] Failing at address: 0x6063518
[node1:13602] *** Process received signal ***
[node1:13602] Signal: Segmentation fault (11)
[node1:13602] Signal code: Address not mapped (1)
[node1:13602] Failing at address: 0xc6519968
[node1:13599] *** Process received signal ***
[node1:13599] Signal: Segmentation fault (11)
[node1:13599] Signal code: Address not mapped (1)
[node1:13599] Failing at address: 0x55dabc8
[node1:13601] *** Process received signal ***
[node1:13601] Signal: Segmentation fault (11)
[node1:13601] Signal code: Address not mapped (1)
[node1:13601] Failing at address: 0x21ae2148
[node1:13598] [ 0] /lib64/tls/libpthread.so.0 [0x3078e0c5b0]
[node1:13598] [ 1] mdrun_ompi(inl3100+0x248) [0x524b58]
[node1:13598] [ 2] mdrun_ompi(do_fnbf+0xfe7) [0x4a3d97]
[node1:13598] [ 3] mdrun_ompi(force+0x120) [0x4432f0]
[node1:13598] [ 4] mdrun_ompi(do_force+0xb8b) [0x471afb]
[node1:13598] [ 5] mdrun_ompi(do_md+0x139f) [0x426fdf]
[node1:13598] [ 6] mdrun_ompi(mdrunner+0xb9c) [0x42a6dc]
[node1:13598] [ 7] mdrun_ompi(main+0x1dd) [0x42aabd]
[node1:13598] [ 8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x307851c3fb]
[node1:13598] [ 9] mdrun_ompi [0x412e8a]
[node1:13598] *** End of error message ***
[node1:13600] [ 0] /lib64/tls/libpthread.so.0 [0x3078e0c5b0]
[node1:13600] [ 1] mdrun_ompi(inl3120+0x4a7) [0x525bb7]
[node1:13600] [ 2] mdrun_ompi(do_fnbf+0xe96) [0x4a3c46]
[node1:13600] [ 3] mdrun_ompi(force+0x120) [0x4432f0]
[node1:13600] [ 4] mdrun_ompi(do_force+0xb8b) [0x471afb]
[node1:13600] [ 5] mdrun_ompi(do_md+0x139f) [0x426fdf]
[node1:13600] [ 6] mdrun_ompi(mdrunner+0xb9c) [0x42a6dc]
[node1:13600] [ 7] mdrun_ompi(main+0x1dd) [0x42aabd]
[node1:13600] [ 8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x307851c3fb]
[node1:13600] [ 9] mdrun_ompi [0x412e8a]
[node1:13600] *** End of error message ***
[node1:13602] [ 0] /lib64/tls/libpthread.so.0 [0x3078e0c5b0]
[node1:13602] [ 1] mdrun_ompi(inl3120+0x4a7) [0x525bb7]
[node1:13602] [ 2] mdrun_ompi(do_fnbf+0xe96) [0x4a3c46]
[node1:13602] [ 3] mdrun_ompi(force+0x120) [0x4432f0]
[node1:13602] [ 4] mdrun_ompi(do_force+0xb8b) [0x471afb]
[node1:13602] [ 5] mdrun_ompi(do_md+0x139f) [0x426fdf]
[node1:13602] [ 6] mdrun_ompi(mdrunner+0xb9c) [0x42a6dc]
[node1:13602] [ 7] mdrun_ompi(main+0x1dd) [0x42aabd]
[node1:13602] [ 8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x307851c3fb]
[node1:13602] [ 9] mdrun_ompi [0x412e8a]
[node1:13602] *** End of error message ***
[node1:13599] [ 0] /lib64/tls/libpthread.so.0 [0x3078e0c5b0]
[node1:13599] [ 1] mdrun_ompi(inl3100+0x248) [0x524b58]
[node1:13599] [ 2] mdrun_ompi(do_fnbf+0xfe7) [0x4a3d97]
[node1:13599] [ 3] mdrun_ompi(force+0x120) [0x4432f0]
[node1:13599] [ 4] mdrun_ompi(do_force+0xb8b) [0x471afb]
[node1:13599] [ 5] mdrun_ompi(do_md+0x139f) [0x426fdf]
[node1:13599] [ 6] mdrun_ompi(mdrunner+0xb9c) [0x42a6dc]
[node1:13599] [ 7] mdrun_ompi(main+0x1dd) [0x42aabd]
[node1:13599] [ 8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x307851c3fb]
[node1:13599] [ 9] mdrun_ompi [0x412e8a]
[node1:13599] *** End of error message ***
[node1:13601] [ 0] /lib64/tls/libpthread.so.0 [0x3078e0c5b0]
[node1:13601] [ 1] mdrun_ompi(inl3120+0x4a7) [0x525bb7]
[node1:13601] [ 2] mdrun_ompi(do_fnbf+0xe96) [0x4a3c46]
[node1:13601] [ 3] mdrun_ompi(force+0x120) [0x4432f0]
[node1:13601] [ 4] mdrun_ompi(do_force+0xb8b) [0x471afb]
[node1:13601] [ 5] mdrun_ompi(do_md+0x139f) [0x426fdf]
[node1:13601] [ 6] mdrun_ompi(mdrunner+0xb9c) [0x42a6dc]
[node1:13601] [ 7] mdrun_ompi(main+0x1dd) [0x42aabd]
[node1:13601] [ 8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x307851c3fb]
[node1:13601] [ 9] mdrun_ompi [0x412e8a]
[node1:13601] *** End of error message ***
[node1:13603] *** Process received signal ***
[node1:13603] Signal: Segmentation fault (11)
[node1:13603] Signal code: Address not mapped (1)
[node1:13603] Failing at address: 0x2bb7c08
[node1:13603] [ 0] /lib64/tls/libpthread.so.0 [0x3078e0c5b0]
[node1:13603] [ 1] mdrun_ompi(inl3100+0x248) [0x524b58]
[node1:13603] [ 2] mdrun_ompi(do_fnbf+0xfe7) [0x4a3d97]
[node1:13603] [ 3] mdrun_ompi(force+0x120) [0x4432f0]
[node1:13603] [ 4] mdrun_ompi(do_force+0xb8b) [0x471afb]
[node1:13603] [ 5] mdrun_ompi(do_md+0x139f) [0x426fdf]
[node1:13603] [ 6] mdrun_ompi(mdrunner+0xb9c) [0x42a6dc]
[node1:13603] [ 7] mdrun_ompi(main+0x1dd) [0x42aabd]
[node1:13603] [ 8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x307851c3fb]
[node1:13603] [ 9] mdrun_ompi [0x412e8a]
[node1:13603] *** End of error message ***
[node1:13595] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c
 at line 275
[node1:13595] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1
166
[node1:13595] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line
90
mpirun noticed that job rank 1 with PID 13599 on node node1 exited on signal
11
(Segmentation fault).
3 additional processes aborted (not shown)
[node1:13595] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c
 at line 188
[node1:13595] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1
198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned
value
Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
   I don't know where the problem it is and how to solve it. It even give
the same error when i run both of the command with only one node. Can you
help me?

Thanks!
Liu, Huifang.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20080902/70931e31/attachment.html>


More information about the gromacs.org_gmx-users mailing list