[gmx-users] GPU low performance

Fri Feb 27 11:04:03 CET 2015

I report the changes made to improve the performance of a molecular dynamics
on a protein of 1925 running on GPU INVIDIA K20 Tesla :

a.. To limit the number of cores used in the calculation (option -pin on)
and to have a better performance:
"gmx_mpi mdrun ... -ntomp 16 -pin on"

where ntomp is the number of OpenMP threads

a.. clock frequency using the NVDIA management tool is been increased from
the default 705 MHz to 758 MHz.

a.. to reduce runtime to calculate energies every step in mdp file:

nstcalcenergy option = -1

The actual performance is about 7ns /day against 2ns/day without these
changes.

Carmen

----- Original Message ----- 
From: "Szilárd Páll" <pall.szilard at gmail.com>
To: "Carmen Di Giovanni" <cdigiova at unina.it>
Cc: "Discussion list for GROMACS users" <gmx-users at gromacs.org>
Sent: Friday, February 20, 2015 1:25 AM
Subject: Re: [gmx-users] GPU low performance

Please consult the manual an wiki.

--
Szilárd

On Thu, Feb 19, 2015 at 6:44 PM, Carmen Di Giovanni <cdigiova at unina.it> 
wrote:
>
> Szilard,
> about:
>
> Fatal error
> 1) Setting the number of thread-MPI threads is only supported with
> thread-MPI
> and Gromacs was compiled without thread-MPI
> For more information and tips for troubleshooting, please check the 
> GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
> The error is quite clearly explains that you're trying to use mdrun's
> built-in thread-MPI parallelization, but you have a binary that does
> not support it. Use the MPI launching syntax instead.
>
> Can you help me about the MPI launching syntax?  What is the suitable
> command ?

A previous poster has already pointed you to the "Acceleration and
parallelization" page which, I believe describes the matter in detail.

>
>
> 2) Have you looked at the the performance table at the end of the log?
> You are wasting a large amount of runtime calculating energies every
> step and this overhead comes in multiple places in the code - one of
> them being the non-timed code parts which typically take <3%.
>
>
> As can I reduce runtime to calculate the energies every step?
> I must to modify something in mdp file ?

This is discussed throughly in the manual, you should be looking for
the nstcalcenergy option.

>
> Thank you in advance
>
> Carmen
> --
> Carmen Di Giovanni, PhD
> Dept. of Pharmaceutical and Toxicological Chemistry
> "Drug Discovery Lab"
> University of Naples "Federico II"
> Via D. Montesano, 49
> 80131 Naples
> Tel.: ++39 081 678623
> Fax: ++39 081 678100
> Email: cdigiova at unina.it
>
>
>
> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>
>> On Thu, Feb 19, 2015 at 11:32 AM, Carmen Di Giovanni <cdigiova at unina.it>
>> wrote:
>>>
>>> Dear Szilárd,
>>>
>>> 1) the output of command nvidia-smi -ac 2600,758 is
>>>
>>> [root at localhost test_gpu]# nvidia-smi -ac 2600,758
>>> Applications clocks set to "(MEM 2600, SM 758)" for GPU 0000:03:00.0
>>>
>>> Warning: persistence mode is disabled on this device. This settings will
>>> go
>>> back to default as soon as driver unloads (e.g. last application like
>>> nvidia-smi or cuda application terminates). Run with [--help | -h] 
>>> switch
>>> to
>>> get more information on how to enable persistence mode.
>>
>>
>> run nvidia-smi -pm 1 if you want to avoid that.
>>
>>> Setting applications clocks is not supported for GPU 0000:82:00.0.
>>> Treating as warning and moving on.
>>> All done.
>>>
>>> ----------------------------------------------------------------------------
>>> 2) I decreased nlists to 20
>>> However when I do the command:
>>>  gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
>>> give me a fatal error:
>>>
>>> GROMACS:      gmx mdrun, VERSION 5.0
>>> Executable:   /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>> Library dir:  /opt/SW/gromacs-5.0/share/top
>>> Command line:
>>>   gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
>>>
>>>
>>> Back Off! I just backed up nvt.log to ./#nvt.log.8#
>>> Reading file nvt.tpr, VERSION 5.0 (single precision)
>>> Changing nstlist from 10 to 40, rlist from 1 to 1.097
>>>
>>>
>>> -------------------------------------------------------
>>> Program gmx_mpi, VERSION 5.0
>>> Source code file: /opt/SW/gromacs-5.0/src/programs/mdrun/runner.c, line:
>>> 876
>>>
>>> Fatal error:
>>> Setting the number of thread-MPI threads is only supported with
>>> thread-MPI
>>> and Gromacs was compiled without thread-MPI
>>> For more information and tips for troubleshooting, please check the
>>> GROMACS
>>> website at http://www.gromacs.org/Documentation/Errors
>>> -------------------------------------------------------
>>
>>
>> The error is quite clearly explains that you're trying to use mdrun's
>> built-in thread-MPI parallelization, but you have a binary that does
>> not support it. Use the MPI launching syntax instead.
>>
>>> Halting program gmx_mpi
>>>
>>> gcq#223: "Jesus Not Only Saves, He Also Frequently Makes Backups." 
>>> (Myron
>>> Bradshaw)
>>>
>>>
>>> --------------------------------------------------------------------------
>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>> with errorcode -1.
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> You may or may not see output from other processes, depending on
>>> exactly when Open MPI kills them.
>>> -------------------------------------------------------------------------
>>>
>>>
>>> 4) I don't understand as I can reduce the "Rest" time
>>
>>
>> Have you looked at the the performance table at the end of the log?
>> You are wasting a large amount of runtime calculating energies every
>> step and this overhead comes in multiple places in the code - one of
>> them being the non-timed code parts which typically take <3%.
>>
>> Cheers,
>> --
>> Szilard
>>
>>
>>>
>>> Carmen
>>>
>>>
>>>
>>> --
>>> Carmen Di Giovanni, PhD
>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>> "Drug Discovery Lab"
>>> University of Naples "Federico II"
>>> Via D. Montesano, 49
>>> 80131 Naples
>>> Tel.: ++39 081 678623
>>> Fax: ++39 081 678100
>>> Email: cdigiova at unina.it
>>>
>>>
>>>
>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>
>>>> Please keep the mails on the list.
>>>>
>>>> On Wed, Feb 18, 2015 at 6:32 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>>> wrote:
>>>>>
>>>>>
>>>>> nvidia-smi -q -g 0
>>>>>
>>>>> ==============NVSMI LOG==============
>>>>>
>>>>> Timestamp                           : Wed Feb 18 18:30:01 2015
>>>>> Driver Version                      : 340.24
>>>>>
>>>>> Attached GPUs                       : 2
>>>>> GPU 0000:03:00.0
>>>>>     Product Name                    : Tesla K20c
>>>>
>>>>
>>>> [...
>>>>>
>>>>>
>>>>>     Clocks
>>>>>         Graphics                    : 705 MHz
>>>>>         SM                          : 705 MHz
>>>>>         Memory                      : 2600 MHz
>>>>>     Applications Clocks
>>>>>         Graphics                    : 705 MHz
>>>>>         Memory                      : 2600 MHz
>>>>>     Default Applications Clocks
>>>>>         Graphics                    : 705 MHz
>>>>>         Memory                      : 2600 MHz
>>>>>     Max Clocks
>>>>>         Graphics                    : 758 MHz
>>>>>         SM                          : 758 MHz
>>>>>         Memory                      : 2600 MHz
>>>>
>>>>
>>>>
>>>> This is the relevant part I was looking for. The Tesla K20c supports
>>>> setting a so-called application clock which is essentially means that
>>>> you can bump its clock frequency using the NVDIA management tool
>>>> nvidia-smi from the default 705 MHz to 758 MHz.
>>>>
>>>> Use the command:
>>>> nvidia-smi -ac 2600,758
>>>>
>>>> This should give you another 7% or so (I didn't remember the correct
>>>> max clock before, that's why I guessing 5%).
>>>>
>>>> Cheers,
>>>> Szilard
>>>>
>>>>>     Clock Policy
>>>>>         Auto Boost                  : N/A
>>>>>         Auto Boost Default          : N/A
>>>>>     Compute Processes
>>>>>         Process ID                  : 19441
>>>>>             Name                    : gmx_mpi
>>>>>             Used GPU Memory         : 110 MiB
>>>>>
>>>>> [carmendigi at localhost test_gpu]$
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Carmen Di Giovanni, PhD
>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>> "Drug Discovery Lab"
>>>>> University of Naples "Federico II"
>>>>> Via D. Montesano, 49
>>>>> 80131 Naples
>>>>> Tel.: ++39 081 678623
>>>>> Fax: ++39 081 678100
>>>>> Email: cdigiova at unina.it
>>>>>
>>>>>
>>>>>
>>>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>>>
>>>>>> As I suggested above please use pastebin.com or similar!
>>>>>> --
>>>>>> Szilárd
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 18, 2015 at 6:09 PM, Carmen Di Giovanni
>>>>>> <cdigiova at unina.it>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dear Szilàrd, it's not possible attach the full log file in the 
>>>>>>> forum
>>>>>>> mail
>>>>>>> because it is too big.
>>>>>>> I send it by your private mail address.
>>>>>>> Thank you in advance
>>>>>>> Carmen
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Carmen Di Giovanni, PhD
>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>> "Drug Discovery Lab"
>>>>>>> University of Naples "Federico II"
>>>>>>> Via D. Montesano, 49
>>>>>>> 80131 Naples
>>>>>>> Tel.: ++39 081 678623
>>>>>>> Fax: ++39 081 678100
>>>>>>> Email: cdigiova at unina.it
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>>>>>
>>>>>>>> We need a *full* log file, not parts of it!
>>>>>>>>
>>>>>>>> You can try running with "-ntomp 16 -pin on" - it may be a bit
>>>>>>>> faster
>>>>>>>> not not use HyperThreading.
>>>>>>>> --
>>>>>>>> Szilárd
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni
>>>>>>>> <cdigiova at unina.it>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Justin,
>>>>>>>>> the problem is evident for all calculations.
>>>>>>>>> This is the log file  of a recent run:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> Log file opened on Mon Dec 22 16:28:00 2014
>>>>>>>>> Host: localhost.localdomain  pid: 8378  rank ID: 0  number of
>>>>>>>>> ranks:
>>>>>>>>> 1
>>>>>>>>> GROMACS:    gmx mdrun, VERSION 5.0
>>>>>>>>>
>>>>>>>>> GROMACS is written by:
>>>>>>>>> Emile Apol         Rossen Apostolov   Herman J.C. Berendsen Par
>>>>>>>>> Bjelkmar
>>>>>>>>> Aldert van Buuren  Rudi van Drunen    Anton Feenstra     Sebastian
>>>>>>>>> Fritsch
>>>>>>>>> Gerrit Groenhof    Christoph Junghans Peter Kasson       Carsten
>>>>>>>>> Kutzner
>>>>>>>>> Per Larsson        Justin A. Lemkul   Magnus Lundborg    Pieter
>>>>>>>>> Meulenhoff
>>>>>>>>> Erik Marklund      Teemu Murtola      Szilard Pall       Sander
>>>>>>>>> Pronk
>>>>>>>>> Roland Schulz      Alexey Shvetsov    Michael Shirts     Alfons
>>>>>>>>> Sijbers
>>>>>>>>> Peter Tieleman     Christian Wennberg Maarten Wolf
>>>>>>>>> and the project leaders:
>>>>>>>>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>>>>>>>>
>>>>>>>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>>>>>>>> Copyright (c) 2001-2014, The GROMACS development team at
>>>>>>>>> Uppsala University, Stockholm University and
>>>>>>>>> the Royal Institute of Technology, Sweden.
>>>>>>>>> check out http://www.gromacs.org for more information.
>>>>>>>>>
>>>>>>>>> GROMACS is free software; you can redistribute it and/or modify it
>>>>>>>>> under the terms of the GNU Lesser General Public License
>>>>>>>>> as published by the Free Software Foundation; either version 2.1
>>>>>>>>> of the License, or (at your option) any later version.
>>>>>>>>>
>>>>>>>>> GROMACS:      gmx mdrun, VERSION 5.0
>>>>>>>>> Executable:   /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>>>>>>>> Library dir:  /opt/SW/gromacs-5.0/share/top
>>>>>>>>> Command line:
>>>>>>>>>   gmx_mpi mdrun -deffnm prod_20ns
>>>>>>>>>
>>>>>>>>> Gromacs version:    VERSION 5.0
>>>>>>>>> Precision:          single
>>>>>>>>> Memory model:       64 bit
>>>>>>>>> MPI library:        MPI
>>>>>>>>> OpenMP support:     enabled
>>>>>>>>> GPU support:        enabled
>>>>>>>>> invsqrt routine:    gmx_software_invsqrt(x)
>>>>>>>>> SIMD instructions:  AVX_256
>>>>>>>>> FFT library:        fftw-3.3.3-sse2
>>>>>>>>> RDTSCP usage:       enabled
>>>>>>>>> C++11 compilation:  disabled
>>>>>>>>> TNG support:        enabled
>>>>>>>>> Tracing support:    disabled
>>>>>>>>> Built on:           Thu Jul 31 18:30:37 CEST 2014
>>>>>>>>> Built by:           root at localhost.localdomain [CMAKE]
>>>>>>>>> Build OS/arch:      Linux 2.6.32-431.el6.x86_64 x86_64
>>>>>>>>> Build CPU vendor:   GenuineIntel
>>>>>>>>> Build CPU brand:    Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>>>>> Build CPU family:   6   Model: 62   Stepping: 4
>>>>>>>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt
>>>>>>>>> lahf_lm
>>>>>>>>> mmx
>>>>>>>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
>>>>>>>>> sse2
>>>>>>>>> sse3
>>>>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>>>>> C compiler:         /usr/bin/cc GNU 4.4.7
>>>>>>>>> C compiler flags:    -mavx   -Wno-maybe-uninitialized -Wextra
>>>>>>>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith
>>>>>>>>> -Wall
>>>>>>>>> -Wno-unused -Wunused-value -Wunused-parameter
>>>>>>>>> -fomit-frame-pointer
>>>>>>>>> -funroll-all-loops  -Wno-array-bounds  -O3 -DNDEBUG
>>>>>>>>> C++ compiler:       /usr/bin/c++ GNU 4.4.7
>>>>>>>>> C++ compiler flags:  -mavx   -Wextra
>>>>>>>>> -Wno-missing-field-initializers
>>>>>>>>> -Wpointer-arith -Wall -Wno-unused-function   -fomit-frame-pointer
>>>>>>>>> -funroll-all-loops  -Wno-array-bounds  -O3 -DNDEBUG
>>>>>>>>> Boost version:      1.55.0 (internal)
>>>>>>>>> CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
>>>>>>>>> compiler
>>>>>>>>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>>>>>>>>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0,
>>>>>>>>> V6.0.1
>>>>>>>>> CUDA compiler
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>>>>>>>>> ;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>>>>>>>>> CUDA driver:        6.50
>>>>>>>>> CUDA runtime:       6.0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>>>>>>>>> GROMACS 4: Algorithms for highly efficient, load-balanced, and
>>>>>>>>> scalable
>>>>>>>>> molecular simulation
>>>>>>>>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and
>>>>>>>>> H.
>>>>>>>>> J.
>>>>>>>>> C.
>>>>>>>>> Berendsen
>>>>>>>>> GROMACS: Fast, Flexible and Free
>>>>>>>>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> E. Lindahl and B. Hess and D. van der Spoel
>>>>>>>>> GROMACS 3.0: A package for molecular simulation and trajectory
>>>>>>>>> analysis
>>>>>>>>> J. Mol. Mod. 7 (2001) pp. 306-317
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>>>>>>>>> GROMACS: A message-passing parallel molecular dynamics
>>>>>>>>> implementation
>>>>>>>>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For optimal performance with a GPU nstlist (now 10) should be
>>>>>>>>> larger.
>>>>>>>>> The optimum depends on your CPU and GPU resources.
>>>>>>>>> You might want to try several nstlist values.
>>>>>>>>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>>>>>>>>
>>>>>>>>> Input Parameters:
>>>>>>>>>    integrator                     = md
>>>>>>>>>    tinit                          = 0
>>>>>>>>>    dt                             = 0.002
>>>>>>>>>    nsteps                         = 10000000
>>>>>>>>>    init-step                      = 0
>>>>>>>>>    simulation-part                = 1
>>>>>>>>>    comm-mode                      = Linear
>>>>>>>>>    nstcomm                        = 1
>>>>>>>>>    bd-fric                        = 0
>>>>>>>>>    ld-seed                        = 1993
>>>>>>>>>    emtol                          = 10
>>>>>>>>>    emstep                         = 0.01
>>>>>>>>>    niter                          = 20
>>>>>>>>>    fcstep                         = 0
>>>>>>>>>    nstcgsteep                     = 1000
>>>>>>>>>    nbfgscorr                      = 10
>>>>>>>>>    rtpi                           = 0.05
>>>>>>>>>    nstxout                        = 2500
>>>>>>>>>    nstvout                        = 2500
>>>>>>>>>    nstfout                        = 0
>>>>>>>>>    nstlog                         = 2500
>>>>>>>>>    nstcalcenergy                  = 1
>>>>>>>>>    nstenergy                      = 2500
>>>>>>>>>    nstxout-compressed             = 500
>>>>>>>>>    compressed-x-precision         = 1000
>>>>>>>>>    cutoff-scheme                  = Verlet
>>>>>>>>>    nstlist                        = 40
>>>>>>>>>    ns-type                        = Grid
>>>>>>>>>    pbc                            = xyz
>>>>>>>>>    periodic-molecules             = FALSE
>>>>>>>>>    verlet-buffer-tolerance        = 0.005
>>>>>>>>>    rlist                          = 1.285
>>>>>>>>>    rlistlong                      = 1.285
>>>>>>>>>    nstcalclr                      = 10
>>>>>>>>>    coulombtype                    = PME
>>>>>>>>>    coulomb-modifier               = Potential-shift
>>>>>>>>>    rcoulomb-switch                = 0
>>>>>>>>>    rcoulomb                       = 1.2
>>>>>>>>>    epsilon-r                      = 1
>>>>>>>>>    epsilon-rf                     = 1
>>>>>>>>>    vdw-type                       = Cut-off
>>>>>>>>>    vdw-modifier                   = Potential-shift
>>>>>>>>>    rvdw-switch                    = 0
>>>>>>>>>    rvdw                           = 1.2
>>>>>>>>>    DispCorr                       = No
>>>>>>>>>    table-extension                = 1
>>>>>>>>>    fourierspacing                 = 0.135
>>>>>>>>>    fourier-nx                     = 128
>>>>>>>>>    fourier-ny                     = 128
>>>>>>>>>    fourier-nz                     = 128
>>>>>>>>>    pme-order                      = 4
>>>>>>>>>    ewald-rtol                     = 1e-05
>>>>>>>>>    ewald-rtol-lj                  = 0.001
>>>>>>>>>    lj-pme-comb-rule               = Geometric
>>>>>>>>>    ewald-geometry                 = 0
>>>>>>>>>    epsilon-surface                = 0
>>>>>>>>>    implicit-solvent               = No
>>>>>>>>>    gb-algorithm                   = Still
>>>>>>>>>    nstgbradii                     = 1
>>>>>>>>>    rgbradii                       = 2
>>>>>>>>>    gb-epsilon-solvent             = 80
>>>>>>>>>    gb-saltconc                    = 0
>>>>>>>>>    gb-obc-alpha                   = 1
>>>>>>>>>    gb-obc-beta                    = 0.8
>>>>>>>>>    gb-obc-gamma                   = 4.85
>>>>>>>>>    gb-dielectric-offset           = 0.009
>>>>>>>>>    sa-algorithm                   = Ace-approximation
>>>>>>>>>    sa-surface-tension             = 2.092
>>>>>>>>>    tcoupl                         = V-rescale
>>>>>>>>>    nsttcouple                     = 10
>>>>>>>>>    nh-chain-length                = 0
>>>>>>>>>    print-nose-hoover-chain-variables = FALSE
>>>>>>>>>    pcoupl                         = No
>>>>>>>>>    pcoupltype                     = Semiisotropic
>>>>>>>>>    nstpcouple                     = -1
>>>>>>>>>    tau-p                          = 0.5
>>>>>>>>>    compressibility (3x3):
>>>>>>>>>       compressibility[    0]={ 0.00000e+00,  0.00000e+00,
>>>>>>>>> 0.00000e+00}
>>>>>>>>>       compressibility[    1]={ 0.00000e+00,  0.00000e+00,
>>>>>>>>> 0.00000e+00}
>>>>>>>>>       compressibility[    2]={ 0.00000e+00,  0.00000e+00,
>>>>>>>>> 0.00000e+00}
>>>>>>>>>    ref-p (3x3):
>>>>>>>>>       ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>>>>>>>       ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>>>>>>>       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>>>>>>>    refcoord-scaling               = No
>>>>>>>>>    posres-com (3):
>>>>>>>>>       posres-com[0]= 0.00000e+00
>>>>>>>>>       posres-com[1]= 0.00000e+00
>>>>>>>>>       posres-com[2]= 0.00000e+00
>>>>>>>>>    posres-comB (3):
>>>>>>>>>       posres-comB[0]= 0.00000e+00
>>>>>>>>>       posres-comB[1]= 0.00000e+00
>>>>>>>>>       posres-comB[2]= 0.00000e+00
>>>>>>>>>    QMMM                           = FALSE
>>>>>>>>>    QMconstraints                  = 0
>>>>>>>>>    QMMMscheme                     = 0
>>>>>>>>>    MMChargeScaleFactor            = 1
>>>>>>>>> qm-opts:
>>>>>>>>>    ngQM                           = 0
>>>>>>>>>    constraint-algorithm           = Lincs
>>>>>>>>>    continuation                   = FALSE
>>>>>>>>>    Shake-SOR                      = FALSE
>>>>>>>>>    shake-tol                      = 0.0001
>>>>>>>>>    lincs-order                    = 4
>>>>>>>>>    lincs-iter                     = 1
>>>>>>>>>    lincs-warnangle                = 30
>>>>>>>>>    nwall                          = 0
>>>>>>>>>    wall-type                      = 9-3
>>>>>>>>>    wall-r-linpot                  = -1
>>>>>>>>>    wall-atomtype[0]               = -1
>>>>>>>>>    wall-atomtype[1]               = -1
>>>>>>>>>    wall-density[0]                = 0
>>>>>>>>>    wall-density[1]                = 0
>>>>>>>>>    wall-ewald-zfac                = 3
>>>>>>>>>    pull                           = no
>>>>>>>>>    rotation                       = FALSE
>>>>>>>>>    interactiveMD                  = FALSE
>>>>>>>>>    disre                          = No
>>>>>>>>>    disre-weighting                = Conservative
>>>>>>>>>    disre-mixed                    = FALSE
>>>>>>>>>    dr-fc                          = 1000
>>>>>>>>>    dr-tau                         = 0
>>>>>>>>>    nstdisreout                    = 100
>>>>>>>>>    orire-fc                       = 0
>>>>>>>>>    orire-tau                      = 0
>>>>>>>>>    nstorireout                    = 100
>>>>>>>>>    free-energy                    = no
>>>>>>>>>    cos-acceleration               = 0
>>>>>>>>>    deform (3x3):
>>>>>>>>>       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>>>>>>>       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>>>>>>>       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>>>>>>>    simulated-tempering            = FALSE
>>>>>>>>>    E-x:
>>>>>>>>>       n = 0
>>>>>>>>>    E-xt:
>>>>>>>>>       n = 0
>>>>>>>>>    E-y:
>>>>>>>>>       n = 0
>>>>>>>>>    E-yt:
>>>>>>>>>       n = 0
>>>>>>>>>    E-z:
>>>>>>>>>       n = 0
>>>>>>>>>    E-zt:
>>>>>>>>>       n = 0
>>>>>>>>>    swapcoords                     = no
>>>>>>>>>    adress                         = FALSE
>>>>>>>>>    userint1                       = 0
>>>>>>>>>    userint2                       = 0
>>>>>>>>>    userint3                       = 0
>>>>>>>>>    userint4                       = 0
>>>>>>>>>    userreal1                      = 0
>>>>>>>>>    userreal2                      = 0
>>>>>>>>>    userreal3                      = 0
>>>>>>>>>    userreal4                      = 0
>>>>>>>>> grpopts:
>>>>>>>>>    nrdf:      869226
>>>>>>>>>    ref-t:         300
>>>>>>>>>    tau-t:         0.1
>>>>>>>>> annealing:          No
>>>>>>>>> annealing-npoints:           0
>>>>>>>>>    acc:            0           0           0
>>>>>>>>>    nfreeze:           N           N           N
>>>>>>>>>    energygrp-flags[  0]: 0
>>>>>>>>> Using 1 MPI process
>>>>>>>>> Using 32 OpenMP threads
>>>>>>>>>
>>>>>>>>> Detecting CPU SIMD instructions.
>>>>>>>>> Present hardware specification:
>>>>>>>>> Vendor: GenuineIntel
>>>>>>>>> Brand:  Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>>>>> Family:  6  Model: 62  Stepping:  4
>>>>>>>>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx 
>>>>>>>>> msr
>>>>>>>>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp 
>>>>>>>>> sse2
>>>>>>>>> sse3
>>>>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>>>>> SIMD instructions most likely to fit this hardware: AVX_256
>>>>>>>>> SIMD instructions selected at GROMACS compile time: AVX_256
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2 GPUs detected on host localhost.localdomain:
>>>>>>>>>   #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat:
>>>>>>>>> compatible
>>>>>>>>>   #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC:  no, stat:
>>>>>>>>> compatible
>>>>>>>>>
>>>>>>>>> 1 GPU auto-selected for this run.
>>>>>>>>> Mapping of GPU to the 1 PP rank in this node: #0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> NOTE: potentially sub-optimal launch configuration, gmx_mpi 
>>>>>>>>> started
>>>>>>>>> with
>>>>>>>>> less
>>>>>>>>>       PP MPI process per node than GPUs available.
>>>>>>>>>       Each PP MPI process can use only one GPU, 1 GPU per node 
>>>>>>>>> will
>>>>>>>>> be
>>>>>>>>> used.
>>>>>>>>>
>>>>>>>>> Will do PME sum in reciprocal space for electrostatic 
>>>>>>>>> interactions.
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. 
>>>>>>>>> G.
>>>>>>>>> Pedersen
>>>>>>>>> A smooth particle mesh Ewald method
>>>>>>>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>> Will do ordinary reciprocal space Ewald sum.
>>>>>>>>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>>>>>>>>> Cut-off's:   NS: 1.285   Coulomb: 1.2   LJ: 1.2
>>>>>>>>> System total charge: -0.012
>>>>>>>>> Generated table with 1142 data points for Ewald.
>>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>> Generated table with 1142 data points for LJ6.
>>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>> Generated table with 1142 data points for LJ12.
>>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>> Generated table with 1142 data points for 1-4 COUL.
>>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>> Generated table with 1142 data points for 1-4 LJ6.
>>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>> Generated table with 1142 data points for 1-4 LJ12.
>>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>>
>>>>>>>>> Using CUDA 8x8 non-bonded kernels
>>>>>>>>>
>>>>>>>>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald
>>>>>>>>> -1.000e-05
>>>>>>>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04
>>>>>>>>> size:
>>>>>>>>> 1536
>>>>>>>>>
>>>>>>>>> Removing pbc first time
>>>>>>>>> Pinning threads with an auto-selected logical core stride of 1
>>>>>>>>>
>>>>>>>>> Initializing LINear Constraint Solver
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M.
>>>>>>>>> Fraaije
>>>>>>>>> LINCS: A Linear Constraint Solver for molecular simulations
>>>>>>>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>> The number of constraints is 5913
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> S. Miyamoto and P. A. Kollman
>>>>>>>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms
>>>>>>>>> for
>>>>>>>>> Rigid
>>>>>>>>> Water Models
>>>>>>>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>> Center of mass motion removal mode is Linear
>>>>>>>>> We have the following groups for center of mass motion removal:
>>>>>>>>>   0:  rest
>>>>>>>>>
>>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>>> G. Bussi, D. Donadio and M. Parrinello
>>>>>>>>> Canonical sampling through velocity rescaling
>>>>>>>>> J. Chem. Phys. 126 (2007) pp. 014101
>>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>>
>>>>>>>>> There are: 434658 Atoms
>>>>>>>>>
>>>>>>>>> Constraining the starting coordinates (step 0)
>>>>>>>>>
>>>>>>>>> Constraining the coordinates at t0-dt (step 0)
>>>>>>>>> RMS relative constraint deviation after constraining: 3.67e-05
>>>>>>>>> Initial temperature: 300.5 K
>>>>>>>>>
>>>>>>>>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>>>>>>>>>            Step           Time         Lambda
>>>>>>>>>               0        0.00000        0.00000
>>>>>>>>>
>>>>>>>>>    Energies (kJ/mol)
>>>>>>>>>        G96Angle    Proper Dih.  Improper Dih.          LJ-14
>>>>>>>>> Coulomb-14
>>>>>>>>>     9.74139e+03    4.34956e+03    2.97359e+03   -1.93107e+02
>>>>>>>>> 8.05534e+04
>>>>>>>>>         LJ (SR)   Coulomb (SR)   Coul. recip.      Potential
>>>>>>>>> Kinetic
>>>>>>>>> En.
>>>>>>>>>     1.01340e+06   -7.13271e+06    2.01361e+04   -6.00175e+06
>>>>>>>>> 1.09887e+06
>>>>>>>>>    Total Energy  Conserved En.    Temperature Pressure (bar)
>>>>>>>>> Constr.
>>>>>>>>> rmsd
>>>>>>>>>    -4.90288e+06   -4.90288e+06    3.04092e+02    1.70897e+02
>>>>>>>>> 2.16683e-05
>>>>>>>>>
>>>>>>>>> step   80: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>>>>> 6279.0
>>>>>>>>> M-cycles
>>>>>>>>> step  160: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>>>>> 6962.2
>>>>>>>>> M-cycles
>>>>>>>>> step  240: timed with pme grid 100 100 100, coulomb cutoff 1.463:
>>>>>>>>> 8406.5
>>>>>>>>> M-cycles
>>>>>>>>> step  320: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>>>>> 6424.0
>>>>>>>>> M-cycles
>>>>>>>>> step  400: timed with pme grid 120 120 120, coulomb cutoff 1.219:
>>>>>>>>> 6369.1
>>>>>>>>> M-cycles
>>>>>>>>> step  480: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>>>>> 7309.0
>>>>>>>>> M-cycles
>>>>>>>>> step  560: timed with pme grid 108 108 108, coulomb cutoff 1.355:
>>>>>>>>> 7521.2
>>>>>>>>> M-cycles
>>>>>>>>> step  640: timed with pme grid 104 104 104, coulomb cutoff 1.407:
>>>>>>>>> 8369.8
>>>>>>>>> M-cycles
>>>>>>>>>               optimal pme grid 128 128 128, coulomb cutoff 1.200
>>>>>>>>>            Step           Time         Lambda
>>>>>>>>>            2500        5.00000        0.00000
>>>>>>>>>
>>>>>>>>>    Energies (kJ/mol)
>>>>>>>>>        G96Angle    Proper Dih.  Improper Dih.          LJ-14
>>>>>>>>> Coulomb-14
>>>>>>>>>     9.72545e+03    4.33046e+03    2.98087e+03   -1.95794e+02
>>>>>>>>> 8.05967e+04
>>>>>>>>>         LJ (SR)   Coulomb (SR)   Coul. recip.      Potential
>>>>>>>>> Kinetic
>>>>>>>>> En.
>>>>>>>>>     1.01293e+06   -7.13110e+06    2.01689e+04   -6.00057e+06
>>>>>>>>> 1.08489e+06
>>>>>>>>>    Total Energy  Conserved En.    Temperature Pressure (bar)
>>>>>>>>> Constr.
>>>>>>>>> rmsd
>>>>>>>>>    -4.91567e+06   -4.90300e+06    3.00225e+02    1.36173e+02
>>>>>>>>> 2.25998e-05
>>>>>>>>>
>>>>>>>>>            Step           Time         Lambda
>>>>>>>>>            5000       10.00000        0.00000
>>>>>>>>>
>>>>>>>>> ............
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you in advance
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Carmen Di Giovanni, PhD
>>>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>>>> "Drug Discovery Lab"
>>>>>>>>> University of Naples "Federico II"
>>>>>>>>> Via D. Montesano, 49
>>>>>>>>> 80131 Naples
>>>>>>>>> Tel.: ++39 081 678623
>>>>>>>>> Fax: ++39 081 678100
>>>>>>>>> Email: cdigiova at unina.it
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What's your exact command?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A full .log file would be even better; it would tell us 
>>>>>>>>>> everything
>>>>>>>>>> we
>>>>>>>>>> need
>>>>>>>>>> to know :)
>>>>>>>>>>
>>>>>>>>>> -Justin
>>>>>>>>>>
>>>>>>>>>>> Have you reviewed this page:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>>>>>>>>
>>>>>>>>>>> James "Wes" Barnett
>>>>>>>>>>> Ph.D. Candidate
>>>>>>>>>>> Chemical and Biomolecular Engineering
>>>>>>>>>>>
>>>>>>>>>>> Tulane University
>>>>>>>>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>>>>>>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of
>>>>>>>>>>> Carmen
>>>>>>>>>>> Di
>>>>>>>>>>> Giovanni <cdigiova at unina.it>
>>>>>>>>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>>>>>>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>>>>>>>>> Subject: Re: [gmx-users] GPU low performance
>>>>>>>>>>>
>>>>>>>>>>> I post the message of a md run :
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance
>>>>>>>>>>> causes
>>>>>>>>>>>        performance loss, consider using a shorter cut-off and a
>>>>>>>>>>> finer
>>>>>>>>>>> PME
>>>>>>>>>>> grid.
>>>>>>>>>>>
>>>>>>>>>>> As can I solved this problem ?
>>>>>>>>>>> Thank you in advance
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Carmen Di Giovanni, PhD
>>>>>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>>>>>> "Drug Discovery Lab"
>>>>>>>>>>> University of Naples "Federico II"
>>>>>>>>>>> Via D. Montesano, 49
>>>>>>>>>>> 80131 Naples
>>>>>>>>>>> Tel.: ++39 081 678623
>>>>>>>>>>> Fax: ++39 081 678100
>>>>>>>>>>> Email: cdigiova at unina.it
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daear all,
>>>>>>>>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>>>>>>>>> After a minimization on a protein of 1925 atoms this is the
>>>>>>>>>>>>> mesage:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Minimization is a poor indicator of performance.  Do a real MD
>>>>>>>>>>>> run.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance
>>>>>>>>>>>>> causes
>>>>>>>>>>>>> performance loss.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Core t (s) Wall t (s) (%)
>>>>>>>>>>>>> Time: 3289.010 205.891 1597.4
>>>>>>>>>>>>> (steps/hour)
>>>>>>>>>>>>> Performance: 8480.2
>>>>>>>>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cai I improve the performance?
>>>>>>>>>>>>> At the moment in the forum I didn't full informations to solve
>>>>>>>>>>>>> this
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>> In attachment there is the log. file
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The list does not accept attachments.  If you wish to share a
>>>>>>>>>>>> file,
>>>>>>>>>>>> upload it to a file-sharing service and provide a URL.  The 
>>>>>>>>>>>> full
>>>>>>>>>>>> .log is quite important for understanding your hardware,
>>>>>>>>>>>> optimizations, and seeing full details of the performance
>>>>>>>>>>>> breakdown.
>>>>>>>>>>>>  But again, base your assessment on MD, not EM.
>>>>>>>>>>>>
>>>>>>>>>>>> -Justin
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> ==================================================
>>>>>>>>>>>>
>>>>>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>>>>>
>>>>>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>>>>>> School of Pharmacy
>>>>>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>>>>>> University of Maryland, Baltimore
>>>>>>>>>>>> 20 Penn St.
>>>>>>>>>>>> Baltimore, MD 21201
>>>>>>>>>>>>
>>>>>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>>>>>
>>>>>>>>>>>> ==================================================
>>>>>>>>>>>> --
>>>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>>>
>>>>>>>>>>>> * Please search the archive at
>>>>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
>>>>>>>>>>>> before
>>>>>>>>>>>> posting!
>>>>>>>>>>>>
>>>>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>>>
>>>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>>>>
>>>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>>
>>>>>>>>>>> * Please search the archive at
>>>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
>>>>>>>>>>> before
>>>>>>>>>>> posting!
>>>>>>>>>>>
>>>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>>
>>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>>>
>>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>>> or
>>>>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ==================================================
>>>>>>>>>>
>>>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>>>
>>>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>>>> School of Pharmacy
>>>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>>>> University of Maryland, Baltimore
>>>>>>>>>> 20 Penn St.
>>>>>>>>>> Baltimore, MD 21201
>>>>>>>>>>
>>>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>>>
>>>>>>>>>> ==================================================
>>>>>>>>>> --
>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>
>>>>>>>>>> * Please search the archive at
>>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List 
>>>>>>>>>> before
>>>>>>>>>> posting!
>>>>>>>>>>
>>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>
>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>> or
>>>>>>>>>> send
>>>>>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive at
>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>> posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>> or
>>>>>>>>> send a
>>>>>>>>> mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send
>>>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>