[gmx-users] Assistance needed running gromacs 4.6.3 on Blue Gene/P

Mark Abraham mark.j.abraham at gmail.com
Thu Mar 13 00:32:36 CET 2014


The real problem cannot be diagnosed without a valid stack trace from the
core file. No GROMACS developers have access to such a machine, so any
resolution is up to you guys providing the information.

Mark
On Mar 12, 2014 10:44 PM, "arrow50311" <linxingcheng50311 at gmail.com> wrote:

> Is there any follow-up for this question?
>
> I met with exactly the same problem on Bluegene/P.
>
> Could anyone offer a help?
>
> Thank you,
>
>
> Prentice Bisbal wrote
> > Mark,
> >
> > Since I was working with 4.6.2, I built 4.6.3 to see if this was the
> > result of a bug in 4.6.2. It isn't I get the same error with 4.6.3, but
> > that is the version I'll be working with from now on, since it's the
> > latest. Since the problem occurs with both versions, might as well try
> > to fix it in the latest version, right?
> >
> > I compiled 4.6.3 with the following options to include debugging
> > information:
> >
> > cmake .. \
> > -DCMAKE_TOOLCHAIN_FILE=../cmake/Platform/BlueGeneP-static-XL-C.cmake \
> >    -DBUILD_SHARED_LIBS=OFF \
> >    -DGMX_MPI=ON \
> >    -DCMAKE_C_FLAGS="-O0 -g -qstrict -qarch=450 -qtune=450" \
> >    -DCMAKE_INSTALL_PREFIX=/scratch/bgapps/gromacs-4.6.3 \
> >    -DGMX_CPU_ACCELERATION=None \
> >    -DGMX_THREAD_MPI=OFF \
> >    -DGMX_OPENMP=OFF \
> >    -DGMX_DEFAULT_SUFFIX=ON \
> >    -DCMAKE_PREFIX_PATH=/scratch/bgapps/fftw-3.3.2 \
> >     2>&1 | tee cmake.log
> >
> > For qarch, I removed the 'd' from the end, so that the double-FPU isn't
> > used, which can cause problems if the data isn't aligned correctly. The
> > -qstrict makes sure certain optimizations aren't performed. It should be
> > superfluous with optimization levels below 3, but I through it in just
> > to be safe, and set -O0. (of course, I think -g turns off all
> > optizations, anyway)
> >
> > On the BG/P, I had to install FFTW3 separately, and that wasn't
> > installed with debugging active, so there are no symbols for FFTW.
> >
> > One of my coworkers wrote a script that converts BG/P core files to
> > stack traces. In all the kernels I've looked at so far (9 out of 64),
> > the stack ends at a vfprintf call. For example:
> >
> > -------------------------------------------------------------
> >
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/resolv/res_init.c:414
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/libio/wgenops.c:419
> >
> /scratch/pbisbal/build/gromacs-4.6.3/src/gmxlib/nonbonded/nb_kernel_c/nb_kernel_ElecRFCut_VdwBhamSh_GeomW4P1_c.c:673
> > ??:0
> >
> /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/sys/dcmf/../ccmi/executor/Broadcast.h:83
> >
> /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/dcmfd/src/coll/reduce/reduce_algorithms.c:69
> >
> /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/dcmfd/src/coll/bcast/bcast_algorithms.c:227
> > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:779
> > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:762
> > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:374
> > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/calcmu.c:88
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/mdrun.c:113
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/runner.c:1492
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/genalg.c:467
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/calc_verletbuf.c:266
> > ../stdio-common/printf_fphex.c:335
> > ../stdio-common/printf_fphex.c:452
> > ??:0
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> > -----------------------------------------------------------------
> >
> > Another node with a different stack looks like this:
> >
> > ---------------------------------------------------------------
> >
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/libio/genops.c:982
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/string/memcpy.c:159
> > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/ns.c:423
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/runner.c:1646
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/genalg.c:467
> > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/calc_verletbuf.c:266
> > ../stdio-common/printf_fphex.c:335
> > ../stdio-common/printf_fphex.c:452
> > ??:0
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> >
> > ---------------------------------------------------------------
> >
> > All the stacks look like one of these two.
> >
> > Is any of this information useful? My coworker, who has a lot of
> > experience developing for Blue Gene/P's, says this looks like an I/O
> > problem, but he doesn't have the time to dig into the Gromacs source
> > code for us. I'm willing to do some digging, but some guidance from
> > someone who know the code well would be very helpful.
> >
> > Prentice
> >
> >
> >
> > On 08/06/2013 08:19 PM, Mark Abraham wrote:
> >> That all looks fine so far. The core file processor won't help unless
> >> you've compiled with -g. Hopefully cmake -DCMAKE_BUILD_TYPE=Debug will
> >> do that, but I haven't actually checked that really works. If not, you
> >> might have to hack cmake/Platform/BlueGeneP-static-XL-C.cmake.
> >>
> >> Anyway, if you can compile with -g, then the core file will tell us in
> >> what function it is dying, which might help locate the problem.
> >>
> >> Mark
> >>
> >> On Tue, Aug 6, 2013 at 11:43 PM, Prentice Bisbal
> >> &lt;
>
> > prentice.bisbal@
>
> > &gt; wrote:
> >>> Dear GMX-users,
> >>>
> >>> I need some assistance running Gromacs 4.6.3 on a Blue Gene/P. Although
> >>> I
> >>> have  a background in Chemistry, I'm an experienced professional HPC
> >>> admin
> >>> who's relatively new to supporting Blue Genes and Gromacs. My first
> >>> Gromacs
> >>> user is having trouble running Gromacs on our BG/P. His jobs die and
> >>> dump
> >>> core, with no obvious signs (not to me, at least) of where the problem
> >>> lies.
> >>>
> >>> I compiled Gromacs 4.6.3 with the following options:
> >>>
> >>>
> ------------------------------------------snip-------------------------------------------
> >>>
> >>> cmake .. \
> >>> -DCMAKE_TOOLCHAIN_FILE=../cmake/Platform/BlueGeneP-static-XL-C.cmake \
> >>>    -DBUILD_SHARED_LIBS=OFF \
> >>>    -DGMX_MPI=ON \
> >>>    -DCMAKE_C_FLAGS="-O3 -qarch=450d -qtune=450" \
> >>>    -DCMAKE_INSTALL_PREFIX=/scratch/bgapps/gromacs-4.6.2 \
> >>>    -DGMX_CPU_ACCELERATION=None \
> >>>    -DGMX_THREAD_MPI=OFF \
> >>>    -DGMX_OPENMP=OFF \
> >>>    -DGMX_DEFAULT_SUFFIX=ON \
> >>>    -DCMAKE_PREFIX_PATH=/scratch/bgapps/fftw-3.3.2 \
> >>>     2>&1 | tee cmake.log
> >>>
> >>>
> ------------------------------------------snip-------------------------------------------
> >>>
> >>> When one of my users submits a job, it dumps core. My scheduler is
> >>> LoadLeveler, and I used this JCF file to replicate the problem. I added
> >>> the
> >>> '-debug 1' flag after searching the gmx-users archives:
> >>>
> >>>
> ------------------------------------------snip-------------------------------------------
> >>>
> >>> #!/bin/bash
> >>> # @ job_name = xiang
> >>> # @ job_type = bluegene
> >>> # @ bg_size = 64
> >>> # @ class = small
> >>> # @ wall_clock_limit = 01:00:00,00:50:00
> >>> # @ error = job.$(Cluster).$(Process).err
> >>> # @ output = job.$(Cluster).$(Process).out
> >>> # @ environment = COPY_ALL;
> >>> # @ queue
> >>>
> >>> source /scratch/bgapps/gromacs-4.6.2/bin/GMXRC.bash
> >>>
> >>>
> ------------------------------------------snip-------------------------------------------
> >>>
> >>> /bgsys/drivers/ppcfloor/bin/mpirun
> >>> /scratch/bgapps/gromacs-4.6.2/bin/mdrun_mpi -pin off -deffnm sbm-b_dyn3
> >>> -v
> >>> -dlb yes -debug 1
> >>>
> >>> The stderr file shows this at the bottom, which isn't too helpful:
> >>>
> >>>
> ------------------------------------------snip-------------------------------------------
> >>>
> >>> Reading file sbm-b_dyn3.tpr, VERSION 4.6.2 (single precision)
> >>>
> >>> Will use 48 particle-particle and 16 PME only nodes
> >>> This is a guess, check the performance at the end of the log file
> >>> Using 64 MPI processes
> >>>
> > <Aug 06 17:25:55.303879>
> >  BE_MPI (ERROR): The error message in the job record
> >>> is as follows:
> >>>
> > <Aug 06 17:25:55.303940>
> >  BE_MPI (ERROR):   "killed with signal 6"
> >>>
> >>>
> -----------------------------------------snip-----------------------------------------------
> >>>
> >>> I have a bunch of core files which I can analyze with the IBM Core file
> >>> processor, and I also have bunch of debug files from mdrun. I went
> >>> through
> >>> about 12/64 of them, and didn't see anything that looked like an error.
> >>>
> >>> Can anyone offer me any suggestions of what to look for, or additional
> >>> debugging steps I can take? Please keep in mind I'm the system
> >>> administrator
> >>> and not an expert-user of gromacs, so I'm not sure if the inputs are
> >>> correct, or are at correct for my BG/P configuration. Any help will be
> >>> greatly appreciated.
> >>>
> >>> Thanks,
> >>> Prentice
> >>>
> >>> --
> >>> gmx-users mailing list
>
> > gmx-users@
>
> >>> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >>> * Please don't post (un)subscribe requests to the list. Use the www
> >>> interface or send it to
>
> > gmx-users-request@
>
> > .
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > --
> > gmx-users mailing list
>
> > gmx-users@
>
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to
>
> > gmx-users-request@
>
> > .
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
> --
> View this message in context:
> http://gromacs.5086.x6.nabble.com/Assistance-needed-running-gromacs-4-6-3-on-Blue-Gene-P-tp5010370p5015114.html
> Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list