[gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration

Sun Jun 9 21:43:41 CEST 2013

On Wed, Jun 5, 2013 at 4:35 PM, João Henriques
<joao.henriques.32353 at gmail.com> wrote:
> Just to wrap up this thread, it does work when the mpirun is properly
> configured. I knew it had to be my fault :)
>
> Something like this works like a charm:
> mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01 -deffnm md -v

That is indeed the correct way to launch the simulation, this way
you'll have two ranks per node, each using a different GPU. However,
coming back to your initial (non-working) launch config, if you want
to run 4 x 4 threads per rank you'll have to assign two for each GPU:
mpirun -np 4 mdrun_mpi -gpu_id 00111 -deffnm md -v

If scaling in the multi-threading is limiting performance, the above
will help (compared to 2 x 8s thread per rank) - which is often the
case on AMD and I've seen cases where it did help already on a single
Intel node.

I'd like to point out one more thing which is important when you run
on more than just a node or two. GPU accelerated runs don't switch to
using separate PME ranks - mostly because it's very hard to pick the
settings for distributing cores between PP and PME ranks. However,
already from around two-three nodes, you will get better performance
by using separate PME ranks.

You should experiment with using part of the cores (usually half is a
decent choice) for PME either by running 2 PP + 1 PME or 2 PP and 2
PME.

> Thank you Mark and Szilárd for your invaluable expertise.

Welcome!

--
Szilárd

>
> Best regards,
> João Henriques
>
>
> On Wed, Jun 5, 2013 at 4:21 PM, João Henriques <
> joao.henriques.32353 at gmail.com> wrote:
>
>> Ok, thanks once again. I will do my best to overcome this issue.
>>
>> Best regards,
>> João Henriques
>>
>>
>> On Wed, Jun 5, 2013 at 3:33 PM, Mark Abraham <mark.j.abraham at gmail.com>wrote:
>>
>>> On Wed, Jun 5, 2013 at 2:53 PM, João Henriques <
>>> joao.henriques.32353 at gmail.com> wrote:
>>>
>>> > Sorry to keep bugging you guys, but even after considering all you
>>> > suggested and reading the bugzilla thread Mark pointed out, I'm still
>>> > unable to make the simulation run over multiple nodes.
>>> > *Here is a template of a simple submission over 2 nodes:*
>>> >
>>> > --- START ---
>>> > #!/bin/sh
>>> > #
>>> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> > #
>>> > # Job name
>>> > #SBATCH -J md
>>> > #
>>> > # No. of nodes and no. of processors per node
>>> > #SBATCH -N 2
>>> > #SBATCH --exclusive
>>> > #
>>> > # Time needed to complete the job
>>> > #SBATCH -t 48:00:00
>>> > #
>>> > # Add modules
>>> > module load gcc/4.6.3
>>> > module load openmpi/1.6.3/gcc/4.6.3
>>> > module load cuda/5.0
>>> > module load gromacs/4.6
>>> > #
>>> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> > #
>>> > grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr
>>> > mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v
>>> > #
>>> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> > --- END ---
>>> >
>>> > *Here is an extract of the md.log:*
>>> >
>>> > --- START ---
>>> > Using 4 MPI processes
>>> > Using 4 OpenMP threads per MPI process
>>> >
>>> > Detecting CPU-specific acceleration.
>>> > Present hardware specification:
>>> > Vendor: GenuineIntel
>>> > Brand:  Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>>> > Family:  6  Model: 45  Stepping:  7
>>> > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
>>> nonstop_tsc
>>> > pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2
>>> ssse3
>>> > tdt x2apic
>>> > Acceleration most likely to fit this hardware: AVX_256
>>> > Acceleration selected at GROMACS compile time: AVX_256
>>> >
>>> >
>>> > 2 GPUs detected on host en001:
>>> >   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>>> >   #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>>> >
>>> >
>>> > -------------------------------------------------------
>>> > Program mdrun_mpi, VERSION 4.6
>>> > Source code file:
>>> >
>>> /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c,
>>> > line: 322
>>> >
>>> > Fatal error:
>>> > Incorrect launch configuration: mismatching number of PP MPI processes
>>> and
>>> > GPUs per node.
>>> >
>>>
>>> "per node" is critical here.
>>>
>>>
>>> > mdrun_mpi was started with 4 PP MPI processes per node, but you
>>> provided 2
>>> > GPUs.
>>> >
>>>
>>> ...and here. As far as mdrun_mpi knows from the MPI system there's only
>>> MPI
>>> ranks on this one node.
>>>
>>> For more information and tips for troubleshooting, please check the
>>> GROMACS
>>> > website at http://www.gromacs.org/Documentation/Errors
>>> > -------------------------------------------------------
>>> > --- END ---
>>> >
>>> > As you can see, gmx is having trouble understanding that there's a
>>> second
>>> > node available. Note that since I did not specify -ntomp, it assigned 4
>>> > threads to each of the 4 mpi processes (filling the entire avail. 16
>>> CPUs
>>> > *on
>>> > one node*).
>>> > For the same exact submission, if I do set "-ntomp 8" (since I have 4
>>> MPI
>>> > procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning
>>> > telling me that I'm hyperthreading, which can only mean that *gmx is
>>> > assigning all processes to the first node once again.*
>>> > Am I doing something wrong or is there some problem with gmx-4.6? I
>>> guess
>>> > it can only be my fault, since I've never seen anyone else complaining
>>> > about the same issue here.
>>> >
>>>
>>> Assigning MPI processes to nodes is a matter configuring your MPI. GROMACS
>>> just follows the MPI system information it gets from MPI - hence the
>>> oversubscription. If you assign two MPI processes to each node, then
>>> things
>>> should work.
>>>
>>> Mark
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>
>>
>>
>> --
>> João Henriques
>>
>
>
>
> --
> João Henriques
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists