<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 14/05/2012 3:52 PM, Anirban wrote:
<blockquote
cite="mid:CAJqxE7eey64Ffh_s2c=HuAZOyFBO3vs+3PhtoJa1cNYXoHSEPw@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div class="gmail_quote">Hi ALL,<br>
<br>
I am trying to simulate a membrane protein system using
CHARMM36 FF on GROAMCS4.5.5 on a parallel cluster running on
MPI. The system consists of arounf 1,17,000 atoms. The job
runs fine on 5 nodes (5X12=120 cores) using mpirun and gives
proper output. But whenever I try to submit it on more than 5
nodes, the job gets killed with the following error:<br>
</div>
</div>
</blockquote>
<br>
That's likely going to be an issue with the configuration of your
MPI system, or your hardware, or both. Do check your .log file for
evidence of unsuitable DD partiion, though the fact of "turning on
dynamic load balancing" suggest DD partitioning worked OK.<br>
<br>
Mark<br>
<br>
<blockquote
cite="mid:CAJqxE7eey64Ffh_s2c=HuAZOyFBO3vs+3PhtoJa1cNYXoHSEPw@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div class="gmail_quote">
<br>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>
<br>
starting mdrun 'Protein'<br>
50000000 steps, 100000.0 ps.<br>
<br>
NOTE: Turning on dynamic load balancing<br>
<br>
Fatal error in MPI_Sendrecv: Other MPI error<br>
Fatal error in MPI_Sendrecv: Other MPI error<br>
Fatal error in MPI_Sendrecv: Other MPI error<br>
<br>
=====================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= EXIT CODE: 256<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
=====================================================================================<br>
[proxy:0:0@cn034] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed<br>
[proxy:0:0@cn034] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error
status<br>
[proxy:0:0@cn034] main (./pm/pmiserv/pmip.c:214): demux engine
error waiting for event<br>
.<br>
.<br>
.<br>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>
<br>
Why is this happening? Is it related to DD and PME? How to
solve it? Any suggestion is welcome.<br>
Sorry for re-posting.<br>
<br>
<br>
Thanks and regards,<br>
<br>
Anirban<br>
</div>
<br>
</div>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
</blockquote>
<br>
</body>
</html>