[gmx-users] running g_tune_pme on stampede

Mark Abraham mark.j.abraham at gmail.com
Sat Dec 6 09:18:40 CET 2014


On Sat, Dec 6, 2014 at 12:16 AM, Kevin Chen <fch6699 at gmail.com> wrote:

> Hi,
>
> Has anybody tried g_tune_pme on stampede before? It appears stampede only
> support ibrun, but not mpi -np type of stuff. So I assume one could launch
> g_tune_pme with mpi using command like this (without -np option),
>
> Ibrun g_tune_pme -s cutoff.tpr -launch
>

You should be trying to run mdrun from g_tune_pme in parallel, not trying
to run g_tune_pme in parallel. Make sure you've read g_tune_pme -h to find
out what environment and command line variables you should be setting.

Unfortunately, it failed. Any suggestion is welcome!
>

More information than "it failed" is needed to get a useful suggestion.

Mark


> Thanks in advance
>
> Kevin Chen
>
>
>
>
>
>
> -----Original Message-----
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se] On Behalf Of Szilárd
> Páll
> Sent: Friday, December 5, 2014 12:54 PM
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] multinode issue
>
> On a second thought (and a quick googling), it _seems_ that this is an
> issue caused by the following:
> - the OpenMP runtime gets initialized outside mdrun and its threads (or
> just the master thread), get their affinity set;
> - mdrun then executes the sanity check, point at which
> omp_get_num_procs(), reports 1 CPU most probably because the master thread
> is bound to a single core.
>
> This alone should not be a big deal as long as the affinity settings get
> correctly overridden in mdrun. However this can have the ugly side-effect
> that, if mdrun's affinity setting gets disabled (if mdrun detects the
> externally set affinities it back off or if not all cores/hardware threads
> are used), all compute threads will inherit the affinity set previously and
> multiple threads will run on a the same core.
>
> Note that this warning should typically not cause a crash, but it is
> telling you that something is not quite right, so it may be best to start
> with eliminating this warning (hints: I_MPI_PIN for Intel MPI, -cc for
> Cray's aprun, --cpu-bind for slurm).
>
> Cheers,
> --
> Szilárd
>
>
> On Fri, Dec 5, 2014 at 7:35 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
> > I don't think this is a sysconf issue. As you seem to have 16-core (hw
> > thread?) nodes, it looks like sysnconf returned the correct value
> > (16), but the OpenMP runtime actually returned 1. This typically means
> > that the OpenMP runtime was initialized outside mdrun and for some
> > reason (which I'm not sure about) it returns 1.
> >
> > My guess is that your job scheduler is multi-threading aware and by
> > default assumes 1 core/hardware thread per rank so you may want to set
> > some rank depth/width option.
> >
> > --
> > Szilárd
> >
> >
> > On Fri, Dec 5, 2014 at 1:37 PM, Éric Germaneau <germaneau at sjtu.edu.cn>
> wrote:
> >> Thank you Mark,
> >>
> >> Yes this was the end of the log.
> >> I tried an other input and got the same issue:
> >>
> >>    Number of CPUs detected (16) does not match the number reported by
> >>    OpenMP (1).
> >>    Consider setting the launch configuration manually!
> >>    Reading file yukuntest-70K.tpr, VERSION 4.6.3 (single precision)
> >>    [16:node328] unexpected disconnect completion event from [0:node299]
> >>    Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
> >>    internal ABORT - process 16
> >>
> >> Actually, I'm running some test for our users, I'll talk with the
> >> admin about how to  return information to the standard sysconf()
> >> routine in the usual way.
> >> Thank you,
> >>
> >>            Éric.
> >>
> >>
> >> On 12/05/2014 07:38 PM, Mark Abraham wrote:
> >>>
> >>> On Fri, Dec 5, 2014 at 9:15 AM, Éric Germaneau
> >>> <germaneau at sjtu.edu.cn>
> >>> wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> I use impi and when I submit o job (via LSF) to more than one node
> >>>> I get the following message:
> >>>>
> >>>>     Number of CPUs detected (16) does not match the number reported by
> >>>>     OpenMP (1).
> >>>>
> >>> That suggests this machine has not be set up to return information
> >>> to the standard sysconf() routine in the usual way. What kind of
> machine is this?
> >>>
> >>>     Consider setting the launch configuration manually!
> >>>>
> >>>>     Reading file test184000atoms_verlet.tpr, VERSION 4.6.2 (single
> >>>>     precision)
> >>>>
> >>> I hope that's just a 4.6.2-era .tpr, but nobody should be using
> >>> 4.6.2 mdrun because there was a bug in only that version affecting
> >>> precisely these kinds of issues...
> >>>
> >>>     [16:node319] unexpected disconnect completion event from
> >>> [11:node328]
> >>>>
> >>>>     Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
> >>>>     internal ABORT - process 16
> >>>>
> >>>> I submit doing
> >>>>
> >>>>     mpirun -np 32 -machinefile nodelist $EXE -v -deffnm $INPUT
> >>>>
> >>>> The machinefile looks like this
> >>>>
> >>>>     node328:16
> >>>>     node319:16
> >>>>
> >>>> I'm running the release 4.6.7.
> >>>> I do not set anything about OpenMP for this job, I'd like to have
> >>>> 32 MPI process.
> >>>>
> >>>> Using one node it works fine.
> >>>> Any hints here?
> >>>>
> >>> Everything seems fine. What was the end of the .log file? Can you
> >>> run another MPI test program thus?
> >>>
> >>> Mark
> >>>
> >>>
> >>>>                                                               Éric.
> >>>>
> >>>> --
> >>>> Éric Germaneau (???), Specialist
> >>>> Center for High Performance Computing Shanghai Jiao Tong University
> >>>> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China
> >>>> M:germaneau at sjtu.edu.cn P:+86-136-4161-6480
> >>>> W:http://hpc.sjtu.edu.cn
> >>>> --
> >>>> Gromacs Users mailing list
> >>>>
> >>>> * Please search the archive at http://www.gromacs.org/
> >>>> Support/Mailing_Lists/GMX-Users_List before posting!
> >>>>
> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>
> >>>> * For (un)subscribe requests visit
> >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >>>> or send a mail to gmx-users-request at gromacs.org.
> >>>>
> >>
> >> --
> >> Éric Germaneau (???), Specialist
> >> Center for High Performance Computing Shanghai Jiao Tong University
> >> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China
> >> Email:germaneau at sjtu.edu.cn Mobi:+86-136-4161-6480
> >> http://hpc.sjtu.edu.cn
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list