<p dir="ltr">Hi,</p>
<p dir="ltr">Are you trying to run several of these at once? If so, you need to manage details better, because just doing three 8 core runs with pinning and different gpu ids will leave 16 cores idle. See examples at <a href="http://manual.gromacs.org/documentation/2016.2/user-guide/mdrun-performance.html">http://manual.gromacs.org/documentation/2016.2/user-guide/mdrun-performance.html</a>. Or better, use the MPI mdrun to do a multi simulation and let it get the details right. </p>
<p dir="ltr">Mark</p>
<br><div class="gmail_quote"><div dir="ltr">On Wed, 22 Feb 2017 17:54 Igor Leontyev &lt;<a href="mailto:ileontyev@ucdavis.edu">ileontyev@ucdavis.edu</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> &gt;<br class="gmail_msg">
 &gt; What CPU vs GPU time per step gets reported at the end of the log<br class="gmail_msg">
 &gt; file?<br class="gmail_msg">
<br class="gmail_msg">
Thank you Berk for prompt response. Here is my log-file that provides<br class="gmail_msg">
all the details.<br class="gmail_msg">
<br class="gmail_msg">
=================================================<br class="gmail_msg">
Host: compute-0-113.local  pid: 12081  rank ID: 0  number of ranks:  1<br class="gmail_msg">
                       :-) GROMACS - gmx mdrun, 2016.2 (-:<br class="gmail_msg">
<br class="gmail_msg">
                             GROMACS is written by:<br class="gmail_msg">
...........................................................<br class="gmail_msg">
<br class="gmail_msg">
GROMACS:      gmx mdrun, version 2016.2<br class="gmail_msg">
Executable:<br class="gmail_msg">
/home/leontyev/programs/bin/gromacs/gromacs-2016.2/bin/gmx_avx2_gpu<br class="gmail_msg">
Data prefix:  /home/leontyev/programs/bin/gromacs/gromacs-2016.2<br class="gmail_msg">
Working dir:<br class="gmail_msg">
/share/COMMON2/MDRUNS/GROMACS/MUTATIONS/PROTEINS/coc-Flu_A-B_LIGs/MDRUNS/InP/fluA/Output_test/6829_6818_9/Gromacs.571690<br class="gmail_msg">
Command line:<br class="gmail_msg">
   gmx_avx2_gpu mdrun -nb gpu -gpu_id 3 -pin on -nt 8 -s<br class="gmail_msg">
6829_6818-liq_0.tpr -e<br class="gmail_msg">
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.edr -dhdl<br class="gmail_msg">
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.xvg -o<br class="gmail_msg">
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.trr -x<br class="gmail_msg">
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.xtc -cpo<br class="gmail_msg">
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.cpt -c<br class="gmail_msg">
6829_6818-liq_0.gro -g 6829_6818-liq_0.log<br class="gmail_msg">
<br class="gmail_msg">
GROMACS version:    2016.2<br class="gmail_msg">
Precision:          single<br class="gmail_msg">
Memory model:       64 bit<br class="gmail_msg">
MPI library:        thread_mpi<br class="gmail_msg">
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)<br class="gmail_msg">
GPU support:        CUDA<br class="gmail_msg">
SIMD instructions:  AVX2_256<br class="gmail_msg">
FFT library:        fftw-3.3.4-sse2-avx<br class="gmail_msg">
RDTSCP usage:       enabled<br class="gmail_msg">
TNG support:        enabled<br class="gmail_msg">
Hwloc support:      disabled<br class="gmail_msg">
Tracing support:    disabled<br class="gmail_msg">
Built on:           Mon Feb 20 18:26:54 PST 2017<br class="gmail_msg">
Built by:           <a href="mailto:leontyev@cluster01.interxinc.com" class="gmail_msg" target="_blank">leontyev@cluster01.interxinc.com</a> [CMAKE]<br class="gmail_msg">
Build OS/arch:      Linux 2.6.32-642.el6.x86_64 x86_64<br class="gmail_msg">
Build CPU vendor:   Intel<br class="gmail_msg">
Build CPU brand:    Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz<br class="gmail_msg">
Build CPU family:   6   Model: 45   Stepping: 7<br class="gmail_msg">
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf mmx msr<br class="gmail_msg">
nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3<br class="gmail_msg">
sse4.1 sse4.2 ssse3 tdt x2apic<br class="gmail_msg">
C compiler:         /share/apps/devtoolset-1.1/root/usr/bin/gcc GNU 4.7.2<br class="gmail_msg">
C compiler flags:    -march=core-avx2   -static-libgcc -static-libstdc++<br class="gmail_msg">
   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast<br class="gmail_msg">
C++ compiler:       /share/apps/devtoolset-1.1/root/usr/bin/g++ GNU 4.7.2<br class="gmail_msg">
C++ compiler flags:  -march=core-avx2    -std=c++0x   -O3 -DNDEBUG<br class="gmail_msg">
-funroll-all-loops -fexcess-precision=fast<br class="gmail_msg">
CUDA compiler:      /share/apps/cuda-8.0/bin/nvcc nvcc: NVIDIA (R) Cuda<br class="gmail_msg">
compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on<br class="gmail_msg">
Sun_Sep__4_22:14:01_CDT_2016;Cuda compilation tools, release 8.0, V8.0.44<br class="gmail_msg">
CUDA compiler<br class="gmail_msg">
flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_61,code=compute_61;-use_fast_math;;;-Xcompiler;,-march=core-avx2,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;<br class="gmail_msg">
<br class="gmail_msg">
CUDA driver:        8.0<br class="gmail_msg">
CUDA runtime:       8.0<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
Running on 1 node with total 24 cores, 24 logical cores, 4 compatible GPUs<br class="gmail_msg">
Hardware detected:<br class="gmail_msg">
   CPU info:<br class="gmail_msg">
     Vendor: Intel<br class="gmail_msg">
     Brand:  Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz<br class="gmail_msg">
     Family: 6   Model: 63   Stepping: 2<br class="gmail_msg">
     Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf<br class="gmail_msg">
mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp<br class="gmail_msg">
sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic<br class="gmail_msg">
     SIMD instructions most likely to fit this hardware: AVX2_256<br class="gmail_msg">
     SIMD instructions selected at GROMACS compile time: AVX2_256<br class="gmail_msg">
<br class="gmail_msg">
   Hardware topology: Basic<br class="gmail_msg">
     Sockets, cores, and logical processors:<br class="gmail_msg">
       Socket  0: [   0] [   1] [   2] [   3] [   4] [   5] [   6] [<br class="gmail_msg">
7] [   8] [   9] [  10] [  11]<br class="gmail_msg">
       Socket  1: [  12] [  13] [  14] [  15] [  16] [  17] [  18] [<br class="gmail_msg">
19] [  20] [  21] [  22] [  23]<br class="gmail_msg">
   GPU info:<br class="gmail_msg">
     Number of GPUs detected: 4<br class="gmail_msg">
     #0: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat:<br class="gmail_msg">
compatible<br class="gmail_msg">
     #1: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat:<br class="gmail_msg">
compatible<br class="gmail_msg">
     #2: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat:<br class="gmail_msg">
compatible<br class="gmail_msg">
     #3: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat:<br class="gmail_msg">
compatible<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
For optimal performance with a GPU nstlist (now 10) should be larger.<br class="gmail_msg">
The optimum depends on your CPU and GPU resources.<br class="gmail_msg">
You might want to try several nstlist values.<br class="gmail_msg">
Changing nstlist from 10 to 40, rlist from 0.9 to 0.932<br class="gmail_msg">
<br class="gmail_msg">
Input Parameters:<br class="gmail_msg">
    integrator                     = sd<br class="gmail_msg">
    tinit                          = 0<br class="gmail_msg">
    dt                             = 0.001<br class="gmail_msg">
    nsteps                         = 10000<br class="gmail_msg">
    init-step                      = 0<br class="gmail_msg">
    simulation-part                = 1<br class="gmail_msg">
    comm-mode                      = Linear<br class="gmail_msg">
    nstcomm                        = 100<br class="gmail_msg">
    bd-fric                        = 0<br class="gmail_msg">
    ld-seed                        = 1103660843<br class="gmail_msg">
    emtol                          = 10<br class="gmail_msg">
    emstep                         = 0.01<br class="gmail_msg">
    niter                          = 20<br class="gmail_msg">
    fcstep                         = 0<br class="gmail_msg">
    nstcgsteep                     = 1000<br class="gmail_msg">
    nbfgscorr                      = 10<br class="gmail_msg">
    rtpi                           = 0.05<br class="gmail_msg">
    nstxout                        = 10000000<br class="gmail_msg">
    nstvout                        = 10000000<br class="gmail_msg">
    nstfout                        = 0<br class="gmail_msg">
    nstlog                         = 20000<br class="gmail_msg">
    nstcalcenergy                  = 100<br class="gmail_msg">
    nstenergy                      = 1000<br class="gmail_msg">
    nstxout-compressed             = 5000<br class="gmail_msg">
    compressed-x-precision         = 1000<br class="gmail_msg">
    cutoff-scheme                  = Verlet<br class="gmail_msg">
    nstlist                        = 40<br class="gmail_msg">
    ns-type                        = Grid<br class="gmail_msg">
    pbc                            = xyz<br class="gmail_msg">
    periodic-molecules             = false<br class="gmail_msg">
    verlet-buffer-tolerance        = 0.005<br class="gmail_msg">
    rlist                          = 0.932<br class="gmail_msg">
    coulombtype                    = PME<br class="gmail_msg">
    coulomb-modifier               = Potential-shift<br class="gmail_msg">
    rcoulomb-switch                = 0.9<br class="gmail_msg">
    rcoulomb                       = 0.9<br class="gmail_msg">
    epsilon-r                      = 1<br class="gmail_msg">
    epsilon-rf                     = inf<br class="gmail_msg">
    vdw-type                       = Cut-off<br class="gmail_msg">
    vdw-modifier                   = Potential-shift<br class="gmail_msg">
    rvdw-switch                    = 0.9<br class="gmail_msg">
    rvdw                           = 0.9<br class="gmail_msg">
    DispCorr                       = EnerPres<br class="gmail_msg">
    table-extension                = 1<br class="gmail_msg">
    fourierspacing                 = 0.12<br class="gmail_msg">
    fourier-nx                     = 42<br class="gmail_msg">
    fourier-ny                     = 42<br class="gmail_msg">
    fourier-nz                     = 40<br class="gmail_msg">
    pme-order                      = 6<br class="gmail_msg">
    ewald-rtol                     = 1e-05<br class="gmail_msg">
    ewald-rtol-lj                  = 0.001<br class="gmail_msg">
    lj-pme-comb-rule               = Geometric<br class="gmail_msg">
    ewald-geometry                 = 0<br class="gmail_msg">
    epsilon-surface                = 0<br class="gmail_msg">
    implicit-solvent               = No<br class="gmail_msg">
    gb-algorithm                   = Still<br class="gmail_msg">
    nstgbradii                     = 1<br class="gmail_msg">
    rgbradii                       = 1<br class="gmail_msg">
    gb-epsilon-solvent             = 80<br class="gmail_msg">
    gb-saltconc                    = 0<br class="gmail_msg">
    gb-obc-alpha                   = 1<br class="gmail_msg">
    gb-obc-beta                    = 0.8<br class="gmail_msg">
    gb-obc-gamma                   = 4.85<br class="gmail_msg">
    gb-dielectric-offset           = 0.009<br class="gmail_msg">
    sa-algorithm                   = Ace-approximation<br class="gmail_msg">
    sa-surface-tension             = 2.05016<br class="gmail_msg">
    tcoupl                         = No<br class="gmail_msg">
    nsttcouple                     = 5<br class="gmail_msg">
    nh-chain-length                = 0<br class="gmail_msg">
    print-nose-hoover-chain-variables = false<br class="gmail_msg">
    pcoupl                         = Parrinello-Rahman<br class="gmail_msg">
    pcoupltype                     = Isotropic<br class="gmail_msg">
    nstpcouple                     = 5<br class="gmail_msg">
    tau-p                          = 0.5<br class="gmail_msg">
    compressibility (3x3):<br class="gmail_msg">
       compressibility[    0]={ 5.00000e-05,  0.00000e+00,  0.00000e+00}<br class="gmail_msg">
       compressibility[    1]={ 0.00000e+00,  5.00000e-05,  0.00000e+00}<br class="gmail_msg">
       compressibility[    2]={ 0.00000e+00,  0.00000e+00,  5.00000e-05}<br class="gmail_msg">
    ref-p (3x3):<br class="gmail_msg">
       ref-p[    0]={ 1.01325e+00,  0.00000e+00,  0.00000e+00}<br class="gmail_msg">
       ref-p[    1]={ 0.00000e+00,  1.01325e+00,  0.00000e+00}<br class="gmail_msg">
       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.01325e+00}<br class="gmail_msg">
    refcoord-scaling               = All<br class="gmail_msg">
    posres-com (3):<br class="gmail_msg">
       posres-com[0]= 0.00000e+00<br class="gmail_msg">
       posres-com[1]= 0.00000e+00<br class="gmail_msg">
       posres-com[2]= 0.00000e+00<br class="gmail_msg">
    posres-comB (3):<br class="gmail_msg">
       posres-comB[0]= 0.00000e+00<br class="gmail_msg">
       posres-comB[1]= 0.00000e+00<br class="gmail_msg">
       posres-comB[2]= 0.00000e+00<br class="gmail_msg">
    QMMM                           = false<br class="gmail_msg">
    QMconstraints                  = 0<br class="gmail_msg">
    QMMMscheme                     = 0<br class="gmail_msg">
    MMChargeScaleFactor            = 1<br class="gmail_msg">
qm-opts:<br class="gmail_msg">
    ngQM                           = 0<br class="gmail_msg">
    constraint-algorithm           = Lincs<br class="gmail_msg">
    continuation                   = false<br class="gmail_msg">
    Shake-SOR                      = false<br class="gmail_msg">
    shake-tol                      = 0.0001<br class="gmail_msg">
    lincs-order                    = 12<br class="gmail_msg">
    lincs-iter                     = 1<br class="gmail_msg">
    lincs-warnangle                = 30<br class="gmail_msg">
    nwall                          = 0<br class="gmail_msg">
    wall-type                      = 9-3<br class="gmail_msg">
    wall-r-linpot                  = -1<br class="gmail_msg">
    wall-atomtype[0]               = -1<br class="gmail_msg">
    wall-atomtype[1]               = -1<br class="gmail_msg">
    wall-density[0]                = 0<br class="gmail_msg">
    wall-density[1]                = 0<br class="gmail_msg">
    wall-ewald-zfac                = 3<br class="gmail_msg">
    pull                           = false<br class="gmail_msg">
    rotation                       = false<br class="gmail_msg">
    interactiveMD                  = false<br class="gmail_msg">
    disre                          = No<br class="gmail_msg">
    disre-weighting                = Conservative<br class="gmail_msg">
    disre-mixed                    = false<br class="gmail_msg">
    dr-fc                          = 1000<br class="gmail_msg">
    dr-tau                         = 0<br class="gmail_msg">
    nstdisreout                    = 100<br class="gmail_msg">
    orire-fc                       = 0<br class="gmail_msg">
    orire-tau                      = 0<br class="gmail_msg">
    nstorireout                    = 100<br class="gmail_msg">
    free-energy                    = yes<br class="gmail_msg">
    init-lambda                    = -1<br class="gmail_msg">
    init-lambda-state              = 0<br class="gmail_msg">
    delta-lambda                   = 0<br class="gmail_msg">
    nstdhdl                        = 100<br class="gmail_msg">
    n-lambdas                      = 13<br class="gmail_msg">
    separate-dvdl:<br class="gmail_msg">
        fep-lambdas =   FALSE<br class="gmail_msg">
       mass-lambdas =   FALSE<br class="gmail_msg">
       coul-lambdas =   TRUE<br class="gmail_msg">
        vdw-lambdas =   TRUE<br class="gmail_msg">
     bonded-lambdas =   TRUE<br class="gmail_msg">
  restraint-lambdas =   FALSE<br class="gmail_msg">
temperature-lambdas =   FALSE<br class="gmail_msg">
all-lambdas:<br class="gmail_msg">
        fep-lambdas =            0           0           0           0<br class="gmail_msg">
         0           0           0           0           0           0<br class="gmail_msg">
         0           0           0<br class="gmail_msg">
       mass-lambdas =            0           0           0           0<br class="gmail_msg">
         0           0           0           0           0           0<br class="gmail_msg">
         0           0           0<br class="gmail_msg">
       coul-lambdas =            0        0.03         0.1         0.2<br class="gmail_msg">
       0.3         0.4         0.5         0.6         0.7         0.8<br class="gmail_msg">
       0.9        0.97           1<br class="gmail_msg">
        vdw-lambdas =            0        0.03         0.1         0.2<br class="gmail_msg">
       0.3         0.4         0.5         0.6         0.7         0.8<br class="gmail_msg">
       0.9        0.97           1<br class="gmail_msg">
     bonded-lambdas =            0        0.03         0.1         0.2<br class="gmail_msg">
       0.3         0.4         0.5         0.6         0.7         0.8<br class="gmail_msg">
       0.9        0.97           1<br class="gmail_msg">
  restraint-lambdas =            0           0           0           0<br class="gmail_msg">
         0           0           0           0           0           0<br class="gmail_msg">
         0           0           0<br class="gmail_msg">
temperature-lambdas =            0           0           0           0<br class="gmail_msg">
         0           0           0           0           0           0<br class="gmail_msg">
         0           0           0<br class="gmail_msg">
    calc-lambda-neighbors          = -1<br class="gmail_msg">
    dhdl-print-energy              = potential<br class="gmail_msg">
    sc-alpha                       = 0.1<br class="gmail_msg">
    sc-power                       = 1<br class="gmail_msg">
    sc-r-power                     = 6<br class="gmail_msg">
    sc-sigma                       = 0.3<br class="gmail_msg">
    sc-sigma-min                   = 0.3<br class="gmail_msg">
    sc-coul                        = true<br class="gmail_msg">
    dh-hist-size                   = 0<br class="gmail_msg">
    dh-hist-spacing                = 0.1<br class="gmail_msg">
    separate-dhdl-file             = yes<br class="gmail_msg">
    dhdl-derivatives               = yes<br class="gmail_msg">
    cos-acceleration               = 0<br class="gmail_msg">
    deform (3x3):<br class="gmail_msg">
       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}<br class="gmail_msg">
       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}<br class="gmail_msg">
       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}<br class="gmail_msg">
    simulated-tempering            = false<br class="gmail_msg">
    E-x:<br class="gmail_msg">
       n = 0<br class="gmail_msg">
    E-xt:<br class="gmail_msg">
       n = 0<br class="gmail_msg">
    E-y:<br class="gmail_msg">
       n = 0<br class="gmail_msg">
    E-yt:<br class="gmail_msg">
       n = 0<br class="gmail_msg">
    E-z:<br class="gmail_msg">
       n = 0<br class="gmail_msg">
    E-zt:<br class="gmail_msg">
       n = 0<br class="gmail_msg">
    swapcoords                     = no<br class="gmail_msg">
    userint1                       = 0<br class="gmail_msg">
    userint2                       = 0<br class="gmail_msg">
    userint3                       = 0<br class="gmail_msg">
    userint4                       = 0<br class="gmail_msg">
    userreal1                      = 0<br class="gmail_msg">
    userreal2                      = 0<br class="gmail_msg">
    userreal3                      = 0<br class="gmail_msg">
    userreal4                      = 0<br class="gmail_msg">
grpopts:<br class="gmail_msg">
    nrdf:     6332.24     62.9925     18705.8<br class="gmail_msg">
    ref-t:      298.15      298.15      298.15<br class="gmail_msg">
    tau-t:           1           1           1<br class="gmail_msg">
annealing:          No          No          No<br class="gmail_msg">
annealing-npoints:           0           0           0<br class="gmail_msg">
    acc:                   0           0           0<br class="gmail_msg">
    nfreeze:           N           N           N<br class="gmail_msg">
    energygrp-flags[  0]: 0<br class="gmail_msg">
<br class="gmail_msg">
Using 1 MPI thread<br class="gmail_msg">
Using 8 OpenMP threads<br class="gmail_msg">
<br class="gmail_msg">
1 GPU user-selected for this run.<br class="gmail_msg">
Mapping of GPU ID to the 1 PP rank in this node: 3<br class="gmail_msg">
<br class="gmail_msg">
Will do PME sum in reciprocal space for electrostatic interactions.<br class="gmail_msg">
<br class="gmail_msg">
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++<br class="gmail_msg">
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.<br class="gmail_msg">
Pedersen<br class="gmail_msg">
A smooth particle mesh Ewald method<br class="gmail_msg">
J. Chem. Phys. 103 (1995) pp. 8577-8592<br class="gmail_msg">
-------- -------- --- Thank You --- -------- --------<br class="gmail_msg">
<br class="gmail_msg">
Will do ordinary reciprocal space Ewald sum.<br class="gmail_msg">
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald<br class="gmail_msg">
Cut-off&#39;s:   NS: 0.932   Coulomb: 0.9   LJ: 0.9<br class="gmail_msg">
Long Range LJ corr.: &lt;C6&gt; 3.6183e-04<br class="gmail_msg">
System total charge, top. A: 7.000 top. B: 7.000<br class="gmail_msg">
Generated table with 965 data points for Ewald.<br class="gmail_msg">
Tabscale = 500 points/nm<br class="gmail_msg">
Generated table with 965 data points for LJ6.<br class="gmail_msg">
Tabscale = 500 points/nm<br class="gmail_msg">
Generated table with 965 data points for LJ12.<br class="gmail_msg">
Tabscale = 500 points/nm<br class="gmail_msg">
Generated table with 965 data points for 1-4 COUL.<br class="gmail_msg">
Tabscale = 500 points/nm<br class="gmail_msg">
Generated table with 965 data points for 1-4 LJ6.<br class="gmail_msg">
Tabscale = 500 points/nm<br class="gmail_msg">
Generated table with 965 data points for 1-4 LJ12.<br class="gmail_msg">
Tabscale = 500 points/nm<br class="gmail_msg">
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05<br class="gmail_msg">
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
Using GPU 8x8 non-bonded kernels<br class="gmail_msg">
<br class="gmail_msg">
Using Lorentz-Berthelot Lennard-Jones combination rule<br class="gmail_msg">
<br class="gmail_msg">
There are 21 atoms and 21 charges for free energy perturbation<br class="gmail_msg">
Removing pbc first time<br class="gmail_msg">
Pinning threads with an auto-selected logical core stride of 1<br class="gmail_msg">
<br class="gmail_msg">
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++<br class="gmail_msg">
S. Miyamoto and P. A. Kollman<br class="gmail_msg">
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid<br class="gmail_msg">
Water Models<br class="gmail_msg">
J. Comp. Chem. 13 (1992) pp. 952-962<br class="gmail_msg">
-------- -------- --- Thank You --- -------- --------<br class="gmail_msg">
<br class="gmail_msg">
Intra-simulation communication will occur every 5 steps.<br class="gmail_msg">
Initial vector of lambda components:[     0.0000     0.0000     0.0000<br class="gmail_msg">
   0.0000     0.0000     0.0000     0.0000 ]<br class="gmail_msg">
Center of mass motion removal mode is Linear<br class="gmail_msg">
We have the following groups for center of mass motion removal:<br class="gmail_msg">
   0:  rest<br class="gmail_msg">
<br class="gmail_msg">
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++<br class="gmail_msg">
N. Goga and A. J. Rzepiela and A. H. de Vries and S. J. Marrink and H. J. C.<br class="gmail_msg">
Berendsen<br class="gmail_msg">
Efficient Algorithms for Langevin and DPD Dynamics<br class="gmail_msg">
J. Chem. Theory Comput. 8 (2012) pp. 3637--3649<br class="gmail_msg">
-------- -------- --- Thank You --- -------- --------<br class="gmail_msg">
<br class="gmail_msg">
There are: 11486 Atoms<br class="gmail_msg">
<br class="gmail_msg">
Constraining the starting coordinates (step 0)<br class="gmail_msg">
<br class="gmail_msg">
Constraining the coordinates at t0-dt (step 0)<br class="gmail_msg">
RMS relative constraint deviation after constraining: 0.00e+00<br class="gmail_msg">
Initial temperature: 291.365 K<br class="gmail_msg">
<br class="gmail_msg">
Started mdrun on rank 0 Wed Feb 22 02:11:02 2017<br class="gmail_msg">
            Step           Time<br class="gmail_msg">
               0        0.00000<br class="gmail_msg">
<br class="gmail_msg">
    Energies (kJ/mol)<br class="gmail_msg">
            Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.<br class="gmail_msg">
     2.99018e+03    4.09043e+03    5.20416e+03    4.32600e+01    2.38045e+02<br class="gmail_msg">
           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)<br class="gmail_msg">
     2.04778e+03    1.45523e+04    1.59846e+04   -2.41317e+03   -1.92125e+05<br class="gmail_msg">
    Coul. recip. Position Rest.      Potential    Kinetic En.   Total Energy<br class="gmail_msg">
     1.58368e+03    2.08367e-09   -1.47804e+05    3.03783e+04   -1.17425e+05<br class="gmail_msg">
     Temperature Pres. DC (bar) Pressure (bar)      dVcoul/dl       dVvdw/dl<br class="gmail_msg">
     2.91118e+02   -3.53694e+02   -3.01252e+02    4.77627e+02    1.41810e+01<br class="gmail_msg">
     dVbonded/dl<br class="gmail_msg">
    -2.15074e+01<br class="gmail_msg">
<br class="gmail_msg">
step   80: timed with pme grid 42 42 40, coulomb cutoff 0.900: 391.6<br class="gmail_msg">
M-cycles<br class="gmail_msg">
step  160: timed with pme grid 36 36 36, coulomb cutoff 1.043: 595.7<br class="gmail_msg">
M-cycles<br class="gmail_msg">
step  240: timed with pme grid 40 36 36, coulomb cutoff 1.022: 401.1<br class="gmail_msg">
M-cycles<br class="gmail_msg">
step  320: timed with pme grid 40 40 36, coulomb cutoff 0.963: 318.8<br class="gmail_msg">
M-cycles<br class="gmail_msg">
step  400: timed with pme grid 40 40 40, coulomb cutoff 0.938: 349.9<br class="gmail_msg">
M-cycles<br class="gmail_msg">
step  480: timed with pme grid 42 40 40, coulomb cutoff 0.920: 319.9<br class="gmail_msg">
M-cycles<br class="gmail_msg">
               optimal pme grid 40 40 36, coulomb cutoff 0.963<br class="gmail_msg">
            Step           Time<br class="gmail_msg">
           10000       10.00000<br class="gmail_msg">
<br class="gmail_msg">
Writing checkpoint, step 10000 at Wed Feb 22 02:11:41 2017<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
    Energies (kJ/mol)<br class="gmail_msg">
            Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.<br class="gmail_msg">
     2.99123e+03    4.14451e+03    5.19572e+03    2.56045e+01    2.74109e+02<br class="gmail_msg">
           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)<br class="gmail_msg">
     2.01371e+03    1.45326e+04    1.55974e+04   -2.43903e+03   -1.88805e+05<br class="gmail_msg">
    Coul. recip. Position Rest.      Potential    Kinetic En.   Total Energy<br class="gmail_msg">
     1.26353e+03    7.39689e+01   -1.45132e+05    3.14390e+04   -1.13693e+05<br class="gmail_msg">
     Temperature Pres. DC (bar) Pressure (bar)      dVcoul/dl       dVvdw/dl<br class="gmail_msg">
     3.01283e+02   -3.61306e+02    1.35461e+02    3.46732e+02    1.03533e+01<br class="gmail_msg">
     dVbonded/dl<br class="gmail_msg">
    -1.08537e+01<br class="gmail_msg">
<br class="gmail_msg">
        &lt;======  ###############  ==&gt;<br class="gmail_msg">
        &lt;====  A V E R A G E S  ====&gt;<br class="gmail_msg">
        &lt;==  ###############  ======&gt;<br class="gmail_msg">
<br class="gmail_msg">
        Statistics over 10001 steps using 101 frames<br class="gmail_msg">
<br class="gmail_msg">
    Energies (kJ/mol)<br class="gmail_msg">
            Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.<br class="gmail_msg">
     3.01465e+03    4.25438e+03    5.23249e+03    3.47157e+01    2.59375e+02<br class="gmail_msg">
           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)<br class="gmail_msg">
     2.02486e+03    1.45795e+04    1.58085e+04   -2.42589e+03   -1.89788e+05<br class="gmail_msg">
    Coul. recip. Position Rest.      Potential    Kinetic En.   Total Energy<br class="gmail_msg">
     1.28411e+03    6.08802e+01   -1.45660e+05    3.09346e+04   -1.14726e+05<br class="gmail_msg">
     Temperature Pres. DC (bar) Pressure (bar)      dVcoul/dl       dVvdw/dl<br class="gmail_msg">
     2.96448e+02   -3.57435e+02    3.32252e+01    4.36060e+02    1.77368e+01<br class="gmail_msg">
     dVbonded/dl<br class="gmail_msg">
    -1.82384e+01<br class="gmail_msg">
<br class="gmail_msg">
           Box-X          Box-Y          Box-Z<br class="gmail_msg">
     4.99607e+00    4.89654e+00    4.61444e+00<br class="gmail_msg">
<br class="gmail_msg">
    Total Virial (kJ/mol)<br class="gmail_msg">
     1.00345e+04    5.03211e+01   -1.17351e+02<br class="gmail_msg">
     4.69630e+01    1.04021e+04    1.73033e+02<br class="gmail_msg">
    -1.16637e+02    1.75781e+02    1.01673e+04<br class="gmail_msg">
<br class="gmail_msg">
    Pressure (bar)<br class="gmail_msg">
     7.67740e+01   -1.32678e+01    3.58518e+01<br class="gmail_msg">
    -1.22810e+01   -2.15571e+01   -5.79828e+01<br class="gmail_msg">
     3.56420e+01   -5.87931e+01    4.44585e+01<br class="gmail_msg">
<br class="gmail_msg">
       T-Protein          T-LIG          T-SOL<br class="gmail_msg">
     2.98707e+02    2.97436e+02    2.95680e+02<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
        P P   -   P M E   L O A D   B A L A N C I N G<br class="gmail_msg">
<br class="gmail_msg">
  PP/PME load balancing changed the cut-off and PME settings:<br class="gmail_msg">
            particle-particle                    PME<br class="gmail_msg">
             rcoulomb  rlist            grid      spacing   1/beta<br class="gmail_msg">
    initial  0.900 nm  0.932 nm      42  42  40   0.119 nm  0.288 nm<br class="gmail_msg">
    final    0.963 nm  0.995 nm      40  40  36   0.128 nm  0.308 nm<br class="gmail_msg">
  cost-ratio           1.22             0.82<br class="gmail_msg">
  (note that these numbers concern only part of the total PP and PME load)<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
        M E G A - F L O P S   A C C O U N T I N G<br class="gmail_msg">
<br class="gmail_msg">
  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels<br class="gmail_msg">
  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table<br class="gmail_msg">
  W3=SPC/TIP3p  W4=TIP4p (single or pairs)<br class="gmail_msg">
  V&amp;F=Potential and force  V=Potential only  F=Force only<br class="gmail_msg">
<br class="gmail_msg">
  Computing:                               M-Number         M-Flops  % Flops<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  NB Free energy kernel                20441.549154       20441.549     0.3<br class="gmail_msg">
  Pair Search distance check             289.750448        2607.754     0.0<br class="gmail_msg">
  NxN Ewald Elec. + LJ [F]             78217.065728     5162326.338    85.6<br class="gmail_msg">
  NxN Ewald Elec. + LJ [V&amp;F]             798.216192       85409.133     1.4<br class="gmail_msg">
  1,4 nonbonded interactions              55.597769        5003.799     0.1<br class="gmail_msg">
  Calc Weights                           344.614458       12406.120     0.2<br class="gmail_msg">
  Spread Q Bspline                     49624.481952       99248.964     1.6<br class="gmail_msg">
  Gather F Bspline                     49624.481952      297746.892     4.9<br class="gmail_msg">
  3D-FFT                               36508.030372      292064.243     4.8<br class="gmail_msg">
  Solve PME                               31.968000        2045.952     0.0<br class="gmail_msg">
  Shift-X                                  2.882986          17.298     0.0<br class="gmail_msg">
  Bonds                                   21.487804        1267.780     0.0<br class="gmail_msg">
  Angles                                  38.645175        6492.389     0.1<br class="gmail_msg">
  Propers                                 58.750116       13453.777     0.2<br class="gmail_msg">
  Impropers                                4.270427         888.249     0.0<br class="gmail_msg">
  RB-Dihedrals                             0.445700         110.088     0.0<br class="gmail_msg">
  Pos. Restr.                              0.900090          45.005     0.0<br class="gmail_msg">
  Virial                                  23.073531         415.324     0.0<br class="gmail_msg">
  Update                                 114.871486        3561.016     0.1<br class="gmail_msg">
  Stop-CM                                  1.171572          11.716     0.0<br class="gmail_msg">
  Calc-Ekin                               45.966972        1241.108     0.0<br class="gmail_msg">
  Constraint-V                           187.108062        1496.864     0.0<br class="gmail_msg">
  Constraint-Vir                          18.717354         449.216     0.0<br class="gmail_msg">
  Settle                                  62.372472       20146.308     0.3<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Total                                                 6028896.883   100.0<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G<br class="gmail_msg">
<br class="gmail_msg">
On 1 MPI rank, each using 8 OpenMP threads<br class="gmail_msg">
<br class="gmail_msg">
  Computing:          Num   Num      Call    Wall time         Giga-Cycles<br class="gmail_msg">
                      Ranks Threads  Count      (s)         total sum    %<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Neighbor search        1    8        251       0.530          9.754   1.4<br class="gmail_msg">
  Launch GPU ops.        1    8      10001       0.509          9.357   1.3<br class="gmail_msg">
  Force                  1    8      10001      10.634        195.662  27.3<br class="gmail_msg">
  PME mesh               1    8      10001      22.173        407.991  57.0<br class="gmail_msg">
  Wait GPU local         1    8      10001       0.073          1.338   0.2<br class="gmail_msg">
  NB X/F buffer ops.     1    8      19751       0.255          4.690   0.7<br class="gmail_msg">
  Write traj.            1    8          3       0.195          3.587   0.5<br class="gmail_msg">
  Update                 1    8      20002       1.038         19.093   2.7<br class="gmail_msg">
  Constraints            1    8      20002       0.374          6.887   1.0<br class="gmail_msg">
  Rest                                           3.126         57.513   8.0<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Total                                         38.906        715.871 100.0<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Breakdown of PME mesh computation<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  PME spread/gather      1    8      40004      19.289        354.929  49.6<br class="gmail_msg">
  PME 3D-FFT             1    8      40004       2.319         42.665   6.0<br class="gmail_msg">
  PME solve Elec         1    8      20002       0.518          9.538   1.3<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
<br class="gmail_msg">
  GPU timings<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Computing:                         Count  Wall t (s)      ms/step       %<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Pair list H2D                        251       0.023        0.090     1.1<br class="gmail_msg">
  X / q H2D                          10001       0.269        0.027    12.5<br class="gmail_msg">
  Nonbonded F kernel                  9700       1.615        0.166    75.0<br class="gmail_msg">
  Nonbonded F+ene k.                    50       0.014        0.273     0.6<br class="gmail_msg">
  Nonbonded F+prune k.                 200       0.039        0.196     1.8<br class="gmail_msg">
  Nonbonded F+ene+prune k.              51       0.016        0.323     0.8<br class="gmail_msg">
  F D2H                              10001       0.177        0.018     8.2<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
  Total                                          2.153        0.215   100.0<br class="gmail_msg">
-----------------------------------------------------------------------------<br class="gmail_msg">
<br class="gmail_msg">
Average per-step force GPU/CPU evaluation time ratio: 0.215 ms/3.280 ms<br class="gmail_msg">
= 0.066<br class="gmail_msg">
For optimal performance this ratio should be close to 1!<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
NOTE: The GPU has &gt;25% less load than the CPU. This imbalance causes<br class="gmail_msg">
       performance loss.<br class="gmail_msg">
<br class="gmail_msg">
                Core t (s)   Wall t (s)        (%)<br class="gmail_msg">
        Time:      311.246       38.906      800.0<br class="gmail_msg">
                  (ns/day)    (hour/ns)<br class="gmail_msg">
Performance:       22.210        1.081<br class="gmail_msg">
=================================================<br class="gmail_msg">
<br class="gmail_msg">
On 2/22/2017 1:04 AM, Igor Leontyev wrote:<br class="gmail_msg">
&gt; Hi.<br class="gmail_msg">
&gt; I am having hard time with accelerating free energy (FE) simulations on<br class="gmail_msg">
&gt; my high end GPU. Not sure is it normal for my smaller systems or I am<br class="gmail_msg">
&gt; doing something wrong.<br class="gmail_msg">
&gt;<br class="gmail_msg">
&gt; The efficiency of GPU acceleration seems to decrease with the system<br class="gmail_msg">
&gt; size, right? Typical sizes in FE simulations in water is 32x32x32 A^3<br class="gmail_msg">
&gt; (~3.5K atoms) and in protein it is about 60x60x60A^3 (~25K atoms).<br class="gmail_msg">
&gt; Requirement for larger MD box in FE simulation is rather rare.<br class="gmail_msg">
&gt;<br class="gmail_msg">
&gt; For my system (with 11K atoms) I am getting on 8 cpus and with GTX 1080<br class="gmail_msg">
&gt; gpu only up to 50% speedup. GPU utilization during simulation is only<br class="gmail_msg">
&gt; 1-2%. Does it sound right? (I am using current gmx ver-2016.2 and CUDA<br class="gmail_msg">
&gt; driver 8.0; by request will attach log-files with all the details.)<br class="gmail_msg">
&gt;<br class="gmail_msg">
&gt; BTW, regarding how much take perturbed interactions, in my case<br class="gmail_msg">
&gt; simulation with &quot;free_energy = no&quot; running about TWICE faster.<br class="gmail_msg">
&gt;<br class="gmail_msg">
&gt; Igor<br class="gmail_msg">
&gt;<br class="gmail_msg">
&gt;&gt; On 2/13/17, 1:32 AM,<br class="gmail_msg">
&gt;&gt; &quot;<a href="mailto:gromacs.org_gmx-developers-bounces@maillist.sys.kth.se" class="gmail_msg" target="_blank">gromacs.org_gmx-developers-bounces@maillist.sys.kth.se</a> on behalf of<br class="gmail_msg">
&gt;&gt; Berk Hess&quot; &lt;<a href="mailto:gromacs.org_gmx-developers-bounces@maillist.sys.kth.se" class="gmail_msg" target="_blank">gromacs.org_gmx-developers-bounces@maillist.sys.kth.se</a> on<br class="gmail_msg">
&gt;&gt; behalf of <a href="mailto:hess@kth.se" class="gmail_msg" target="_blank">hess@kth.se</a>&gt; wrote:<br class="gmail_msg">
&gt;&gt;<br class="gmail_msg">
&gt;&gt;     That depends on what you mean with this.<br class="gmail_msg">
&gt;&gt;     With free-energy all non-perturbed non-bonded interactions can run on<br class="gmail_msg">
&gt;&gt;     the GPU. The perturbed ones currently can not. For a large system<br class="gmail_msg">
&gt;&gt; with a<br class="gmail_msg">
&gt;&gt;     few perturbed atoms this is no issue. For smaller systems the<br class="gmail_msg">
&gt;&gt;     free-energy kernel can be the limiting factor. I think there is a<br class="gmail_msg">
&gt;&gt; lot of<br class="gmail_msg">
&gt;&gt;     gain to be had in making the extremely complex CPU free-energy kernel<br class="gmail_msg">
&gt;&gt;     faster. Initially I thought SIMD would not help there. But since any<br class="gmail_msg">
&gt;&gt;     perturbed i-particle will have perturbed interactions with all<br class="gmail_msg">
&gt;&gt; j&#39;s, this<br class="gmail_msg">
&gt;&gt;     will help a lot.<br class="gmail_msg">
&gt;&gt;<br class="gmail_msg">
&gt;&gt;     Cheers,<br class="gmail_msg">
&gt;&gt;<br class="gmail_msg">
&gt;&gt;     Berk<br class="gmail_msg">
&gt;&gt;<br class="gmail_msg">
&gt;&gt;     On 2017-02-13 01:08, Michael R Shirts wrote:<br class="gmail_msg">
&gt;&gt;     &gt; What?s the current state of free energy code on GPU?s, and what<br class="gmail_msg">
&gt;&gt; are the roadblocks?<br class="gmail_msg">
&gt;&gt;     &gt;<br class="gmail_msg">
&gt;&gt;     &gt; Thanks!<br class="gmail_msg">
&gt;&gt;     &gt; ~~~~~~~~~~~~~~~~<br class="gmail_msg">
&gt;&gt;     &gt; Michael Shirts<br class="gmail_msg">
--<br class="gmail_msg">
Gromacs Developers mailing list<br class="gmail_msg">
<br class="gmail_msg">
* Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List" rel="noreferrer" class="gmail_msg" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a> before posting!<br class="gmail_msg">
<br class="gmail_msg">
* Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" rel="noreferrer" class="gmail_msg" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br class="gmail_msg">
<br class="gmail_msg">
* For (un)subscribe requests visit<br class="gmail_msg">
<a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers" rel="noreferrer" class="gmail_msg" target="_blank">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a> or send a mail to <a href="mailto:gmx-developers-request@gromacs.org" class="gmail_msg" target="_blank">gmx-developers-request@gromacs.org</a>.<br class="gmail_msg">
<br class="gmail_msg">
</blockquote></div>