[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)

Krzysztof Kolman krzysztof.kolman at gmail.com
Tue Nov 6 10:55:22 CET 2018


Dear Gromacs Users,

I just wanted to add an additional information. After doing restart, the
simulation crashed (again segmentation fault) after the same time interval,
which is 12h and 22500000 steps (so now I am at 45000000 steps out of
50000000). I think that this obserevation proves that it is not related to
an unstable simulation but only to some kind of software issue.

Kind regards,
Krzysztof

pon., 5 lis 2018 o 21:12 Krzysztof Kolman <krzysztof.kolman at gmail.com>
napisał(a):

> Dear Gromacs Users,
>
> I have problem with my Gromacs 2018.3 that keeps crashing due to
> segmentation fault after quite long simulations time (more than 12 h wall
> clock). It is hard for me to tell why because there is no information why,
> except the segmentation fault message. Please find below shortened output
> from the log file:
> Command line:
>   gmx mdrun -v -deffnm md_0_1
>
> GROMACS version:    2018.3
> Precision:          single
> Memory model:       64 bit
> MPI library:        thread_mpi
> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support:        CUDA
> SIMD instructions:  AVX2_256
> FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
> RDTSCP usage:       enabled
> TNG support:        enabled
> Hwloc support:      disabled
> Tracing support:    disabled
> Built on:           2018-10-17 19:53:24
> Built by:           kolman at kolman-B85-HD3 [CMAKE]
> Build OS/arch:      Linux 4.15.0-36-generic x86_64
> Build CPU vendor:   Intel
> Build CPU brand:    Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> Build CPU family:   6   Model: 60   Stepping: 3
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
> intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
> rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> C compiler:         /usr/bin/gcc-6 GNU 6.4.0
> C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> C++ compiler:       /usr/bin/g++-6 GNU 6.4.0
> C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> CUDA compiler:      /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
> Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
> CUDA compiler
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver:        9.10
> CUDA runtime:       9.10
>
>
> Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
> Hardware detected:
>   CPU info:
>     Vendor: Intel
>     Brand:  Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
>     Family: 6   Model: 60   Stepping: 3
>     Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel
> lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
> sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
>   Hardware topology: Basic
>     Sockets, cores, and logical processors:
>       Socket  0: [   0   4] [   1   5] [   2   6] [   3   7]
>   GPU info:
>     Number of GPUs detected: 1
>     #0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC:  no, stat:
> compatible
> ...
>
> nput Parameters:
>    integrator                     = md
>    tinit                          = 0
>    dt                             = 0.002
>    nsteps                         = 50000000
>    init-step                      = 0
>    simulation-part                = 1
>    comm-mode                      = Linear
>    nstcomm                        = 100
>    bd-fric                        = 0
>    ld-seed                        = -105855329
>    emtol                          = 10
>    emstep                         = 0.01
>    niter                          = 20
>    fcstep                         = 0
>    nstcgsteep                     = 1000
>    nbfgscorr                      = 10
>    rtpi                           = 0.05
>    nstxout                        = 500000
>    nstvout                        = 500000
>    nstfout                        = 0
>    nstlog                         = 500000
>    nstcalcenergy                  = 100
>    nstenergy                      = 50000
>    nstxout-compressed             = 50000
>    compressed-x-precision         = 1000
>    cutoff-scheme                  = Verlet
>    nstlist                        = 10
>    ns-type                        = Grid
>    pbc                            = xyz
>    periodic-molecules             = false
>    verlet-buffer-tolerance        = 0.005
>    rlist                          = 1
>    coulombtype                    = PME
>    coulomb-modifier               = Potential-shift
>    rcoulomb-switch                = 0
>    rcoulomb                       = 1
>    epsilon-r                      = 1
>    epsilon-rf                     = inf
>    vdw-type                       = Cut-off
>    vdw-modifier                   = Potential-shift
>    rvdw-switch                    = 0
>    rvdw                           = 1
>    DispCorr                       = EnerPres
>    table-extension                = 1
>    fourierspacing                 = 0.118
>    fourier-nx                     = 52
>    fourier-ny                     = 52
>    fourier-nz                     = 52
>    pme-order                      = 4
>    ewald-rtol                     = 1e-05
>    ewald-rtol-lj                  = 0.001
>    lj-pme-comb-rule               = Geometric
>    ewald-geometry                 = 0
>    epsilon-surface                = 0
>    implicit-solvent               = No
>    gb-algorithm                   = Still
>    nstgbradii                     = 1
>    rgbradii                       = 1
>    gb-epsilon-solvent             = 80
>    gb-saltconc                    = 0
>    gb-obc-alpha                   = 1
>    gb-obc-beta                    = 0.8
>    gb-obc-gamma                   = 4.85
>    gb-dielectric-offset           = 0.009
>    sa-algorithm                   = Ace-approximation
>    sa-surface-tension             = 2.05016
>    tcoupl                         = V-rescale
>    nsttcouple                     = 10
>    nh-chain-length                = 0
>    print-nose-hoover-chain-variables = false
>    pcoupl                         = Parrinello-Rahman
>    pcoupltype                     = Isotropic
>    nstpcouple                     = 10
>    tau-p                          = 1
>    compressibility (3x3):
>       compressibility[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
>       compressibility[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
>       compressibility[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
>    ref-p (3x3):
>       ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
>       ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
>       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
>    refcoord-scaling               = COM
>    posres-com (3):
>       posres-com[0]= 0.00000e+00
>       posres-com[1]= 0.00000e+00
>       posres-com[2]= 0.00000e+00
>    posres-comB (3):
>       posres-comB[0]= 0.00000e+00
>       posres-comB[1]= 0.00000e+00
>       posres-comB[2]= 0.00000e+00
>    QMMM                           = false
>    QMconstraints                  = 0
>    QMMMscheme                     = 0
>    MMChargeScaleFactor            = 1
> qm-opts:
>    ngQM                           = 0
>    constraint-algorithm           = Lincs
>    continuation                   = true
>    Shake-SOR                      = false
>    shake-tol                      = 0.0001
>    lincs-order                    = 4
>    lincs-iter                     = 1
>    lincs-warnangle                = 30
>    nwall                          = 0
>    wall-type                      = 9-3
>    wall-r-linpot                  = -1
>    wall-atomtype[0]               = -1
>    wall-atomtype[1]               = -1
>    wall-density[0]                = 0
>    wall-density[1]                = 0
>    wall-ewald-zfac                = 3
>    pull                           = false
>    awh                            = false
>    rotation                       = false
>    interactiveMD                  = false
>    disre                          = No
>    disre-weighting                = Conservative
>    disre-mixed                    = false
>    dr-fc                          = 1000
>    dr-tau                         = 0
>    nstdisreout                    = 100
>    orire-fc                       = 0
>    orire-tau                      = 0
>    nstorireout                    = 100
>    free-energy                    = no
>    cos-acceleration               = 0
>    deform (3x3):
>       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>    simulated-tempering            = false
>    swapcoords                     = no
>    userint1                       = 0
>    userint2                       = 0
>    userint3                       = 0
>    userint4                       = 0
>    userreal1                      = 0
>    userreal2                      = 0
>    userreal3                      = 0
>    userreal4                      = 0
>    applied-forces:
>      electric-field:
>        x:
>          E0                       = 0
>          omega                    = 0
>          t0                       = 0
>          sigma                    = 0
>        y:
>          E0                       = 0
>          omega                    = 0
>          t0                       = 0
>          sigma                    = 0
>        z:
>          E0                       = 0
>          omega                    = 0
>          t0                       = 0
>          sigma                    = 0
> grpopts:
>    nrdf:     7859.43     33729.6
>    ref-t:         300         300
>    tau-t:         0.1         0.1
> annealing:          No          No
> annealing-npoints:           0           0
>    acc:            0           0           0
>    nfreeze:           N           N           N
>    energygrp-flags[  0]: 0
>
> Changing nstlist from 10 to 100, rlist from 1 to 1.148
>
> Using 1 MPI thread
> Using 8 OpenMP threads
>
> 1 GPU auto-selected for this run.
> Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
>   PP:0,PME:0
> Pinning threads with an auto-selected logical core stride of 1
> System total charge: 0.000
> Will do PME sum in reciprocal space for electrostatic interactions.
> ...
> Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
> Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
> Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size:
> 1073
>
> Long Range LJ corr.: <C6> 3.3459e-04
> Generated table with 1074 data points for Ewald.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for LJ6.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for LJ12.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for 1-4 COUL.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for 1-4 LJ6.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for 1-4 LJ12.
> Tabscale = 500 points/nm
>
> Using GPU 8x8 nonbonded short-range kernels
>
> Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
>   outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
>   inner list: updated every  12 steps, buffer 0.002 nm, rlist 1.002 nm
> At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would
> be:
>   outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
>   inner list: updated every  12 steps, buffer 0.050 nm, rlist 1.050 nm
>
> Using Lorentz-Berthelot Lennard-Jones combination rule
>
>
> Initializing LINear Constraint Solver
> The number of constraints is 3840
>
> There are: 20736 Atoms
>
> Started mdrun on rank 0 Sun Nov  4 23:01:29 2018
>            Step           Time
>               0        0.00000
>
>    Energies (kJ/mol)
>             U-B    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     7.80480e+03    5.27100e+03    8.63175e+01    4.08652e+03    4.83769e+03
>         LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.      Potential
>     3.63164e+04   -2.90354e+03   -3.22530e+05    1.96307e+03   -2.65067e+05
>     Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC (bar)
>     5.18776e+04   -2.13190e+05   -2.13177e+05    3.00053e+02   -2.32857e+02
>  Pressure (bar)   Constr. rmsd
>    -5.67996e+01    9.57285e-06
>
> step  200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
> M-cycles
> step  400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
> M-cycles
> step  600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
> M-cycles
> step  800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
> M-cycles
> step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
> M-cycles
> step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
> M-cycles
> step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
> M-cycles
> step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
> M-cycles
> step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
> M-cycles
>               optimal pme grid 48 48 48, coulomb cutoff 1.045
>
> Last checkpoint:
>
> Writing checkpoint, step 22388100 at Mon Nov  5 08:31:29 2018
>
>
>            Step           Time
>        22500000    45000.00000
>
>    Energies (kJ/mol)
>             U-B    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     7.74565e+03    5.28043e+03    5.63610e+01    3.87191e+03    4.35044e+03
>         LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.      Potential
>     3.61122e+04   -2.92965e+03   -3.24570e+05    1.59058e+03   -2.68492e+05
>     Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC (bar)
>     5.16199e+04   -2.16872e+05   -3.11535e+05    2.98562e+02   -2.37059e+02
>  Pressure (bar)   Constr. rmsd
>     4.08107e+01    9.30833e-06
>
>
> Thank you in advance for any help. Please let me know if any additional
> information is needed.
>
> Best regards,
> Krzysztof
>
>
>
>


More information about the gromacs.org_gmx-users mailing list