Log file opened on Tue Jun 17 15:20:44 2014 Host: cn004 pid: 2788 nodeid: 0 nnodes: 32 GROMACS: mdrun_mpi, VERSION 5.0-rc1-dev-20140520-2ae5141 GROMACS is written by: Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2014, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: mdrun_mpi, VERSION 5.0-rc1-dev-20140520-2ae5141 Executable: /home/X/Programme/gromacs/gromacs_50rc1_intel1214_mkl_openmp_nogpu_sse41/bin/mdrun_mpi Library dir: /home/X/Programme/gromacs/gromacs_50rc1_intel1214_mkl_openmp_nogpu_sse41/share/gromacs/top Command line: mdrun_mpi -s ../Ap10k.tpr -dlb yes -tunepme -v -deffnm n4-ppn8 Gromacs version: VERSION 5.0-rc1-dev-20140520-2ae5141 GIT SHA1 hash: 2ae5141b982519bc2abab15db2f29dbd3fee8415 Precision: single Memory model: 64 bit MPI library: MPI OpenMP support: enabled GPU support: disabled invsqrt routine: gmx_software_invsqrt(x) SIMD instructions: SSE4.1 FFT library: Intel MKL RDTSCP usage: enabled C++11 compilation: disabled TNG support: enabled Tracing support: disabled Built on: Wed Jun 11 11:04:14 MDT 2014 Built by: X@lattice [CMAKE] Build OS/arch: Linux 2.6.32-358.18.1.el6.x86_64 x86_64 Build CPU vendor: GenuineIntel Build CPU brand: Intel(R) Xeon(R) CPU X5550 @ 2.67GHz Build CPU family: 6 Model: 26 Stepping: 5 Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 C compiler: /global/software/openmpi/openmpi-1.6.5/bin/mpicc Intel 12.1.0.20120410 C compiler flags: -msse4.1 -mkl=sequential -std=gnu99 -w3 -wd111 -wd177 -wd181 -wd193 -wd271 -wd304 -wd383 -wd424 -wd444 -wd522 -wd593 -wd869 -wd981 -wd1418 -wd1419 -wd1572 -wd1599 -wd2259 -wd2415 -wd2547 -wd2557 -wd3280 -wd3346 -ip -funroll-all-loops -alias-const -ansi-alias -O3 -DNDEBUG C++ compiler: /global/software/openmpi/openmpi-1.6.5/bin/mpiCC Intel 12.1.0.20120410 C++ compiler flags: -msse4.1 -w3 -wd111 -wd177 -wd181 -wd193 -wd271 -wd304 -wd383 -wd424 -wd444 -wd522 -wd593 -wd869 -wd981 -wd1418 -wd1419 -wd1572 -wd1599 -wd2259 -wd2415 -wd2547 -wd2557 -wd3280 -wd3346 -wd1782 -ip -funroll-all-loops -alias-const -ansi-alias -O3 -DNDEBUG Boost version: 1.48.0 (external) ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation J. Chem. Theory Comput. 4 (2008) pp. 435-447 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen GROMACS: Fast, Flexible and Free J. Comp. Chem. 26 (2005) pp. 1701-1719 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ E. Lindahl and B. Hess and D. van der Spoel GROMACS 3.0: A package for molecular simulation and trajectory analysis J. Mol. Mod. 7 (2001) pp. 306-317 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, D. van der Spoel and R. van Drunen GROMACS: A message-passing parallel molecular dynamics implementation Comp. Phys. Comm. 91 (1995) pp. 43-56 -------- -------- --- Thank You --- -------- -------- Changing nstlist from 10 to 25, rlist from 1 to 1.024 Input Parameters: integrator = md nsteps = 10000 init-step = 0 cutoff-scheme = Verlet ns-type = Grid nstlist = 25 ndelta = 2 nstcomm = 100 comm-mode = Linear nstlog = 0 nstxout = 0 nstvout = 0 nstfout = 0 nstcalcenergy = 100 nstenergy = 0 nstxout-compressed = 0 init-t = 0 delta-t = 0.002 x-compression-precision = 1000 fourierspacing = 0.12 nkx = 96 nky = 96 nkz = 80 pme-order = 4 ewald-rtol = 1e-05 ewald-rtol-lj = 1e-05 ewald-geometry = 0 epsilon-surface = 0 optimize-fft = FALSE lj-pme-comb-rule = Geometric ePBC = xyz bPeriodicMols = FALSE bContinuation = FALSE bShakeSOR = FALSE etc = Berendsen bPrintNHChains = FALSE nsttcouple = 10 epc = Berendsen epctype = Isotropic nstpcouple = 10 tau-p = 1 ref-p (3x3): ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00} ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00} ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00} compress (3x3): compress[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00} compress[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00} compress[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05} refcoord-scaling = No posres-com (3): posres-com[0]= 0.00000e+00 posres-com[1]= 0.00000e+00 posres-com[2]= 0.00000e+00 posres-comB (3): posres-comB[0]= 0.00000e+00 posres-comB[1]= 0.00000e+00 posres-comB[2]= 0.00000e+00 verlet-buffer-tolerance = 0.005 rlist = 1.024 rlistlong = 1.024 nstcalclr = 10 rtpi = 0.05 coulombtype = PME coulomb-modifier = Potential-shift rcoulomb-switch = 0 rcoulomb = 1 vdwtype = Cut-off vdw-modifier = Potential-shift rvdw-switch = 1.2 rvdw = 1 epsilon-r = 1 epsilon-rf = 1 tabext = 1 implicit-solvent = No gb-algorithm = Still gb-epsilon-solvent = 80 nstgbradii = 1 rgbradii = 2 gb-saltconc = 0 gb-obc-alpha = 1 gb-obc-beta = 0.8 gb-obc-gamma = 4.85 gb-dielectric-offset = 0.009 sa-algorithm = Ace-approximation sa-surface-tension = 2.092 DispCorr = No bSimTemp = FALSE free-energy = no nwall = 0 wall-type = 9-3 wall-atomtype[0] = -1 wall-atomtype[1] = -1 wall-density[0] = 0 wall-density[1] = 0 wall-ewald-zfac = 3 pull = no rotation = FALSE interactiveMD = FALSE disre = No disre-weighting = Equal disre-mixed = FALSE dr-fc = 1000 dr-tau = 1.25 nstdisreout = 100 orires-fc = 0 orires-tau = 0 nstorireout = 100 dihre-fc = 0 em-stepsize = 0.01 em-tol = 1e-06 niter = 100 fc-stepsize = 0 nstcgsteep = 1000 nbfgscorr = 10 ConstAlg = Lincs shake-tol = 0.0001 lincs-order = 4 lincs-warnangle = 30 lincs-iter = 1 bd-fric = 0 ld-seed = 1993 cos-accel = 0 deform (3x3): deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} adress = FALSE userint1 = 0 userint2 = 0 userint3 = 0 userint4 = 0 userreal1 = 0 userreal2 = 0 userreal3 = 0 userreal4 = 0 grpopts: nrdf: 163634 ref-t: 300 tau-t: 0.1 anneal: No ann-npoints: 0 acc: 0 0 0 nfreeze: N N N energygrp-flags[ 0]: 0 efield-x: n = 0 efield-xt: n = 0 efield-y: n = 0 efield-yt: n = 0 efield-z: n = 0 efield-zt: n = 0 eSwapCoords = no bQMMM = FALSE QMconstraints = 0 QMMMscheme = 0 scalefactor = 1 qm-opts: ngQM = 0 Initializing Domain Decomposition on 32 nodes Dynamic load balancing: yes Will sort the charge groups at every domain (re)decomposition Initial maximum inter charge-group distances: two-body bonded interactions: 0.424 nm, LJ-14, atoms 4910 4912 multi-body bonded interactions: 0.416 nm, Proper Dih., atoms 3727 3731 Minimum cell size due to bonded interactions: 0.457 nm Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nm Estimated maximum distance required for P-LINCS: 0.820 nm This distance will limit the DD cell size, you can override this with -rcon Guess for relative PME load: 0.23 Will use 24 particle-particle and 8 PME only nodes This is a guess, check the performance at the end of the log file Using 8 separate PME nodes, as guessed by mdrun Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25 Optimizing the DD grid for 24 cells with a minimum initial size of 1.025 nm The maximum allowed number of cells is: X 10 Y 9 Z 9 Domain decomposition grid 8 x 3 x 1, separate PME nodes 8 PME domain decomposition: 8 x 1 x 1 Interleaving PP and PME nodes This is a particle-particle only node Domain decomposition nodeid 0, coordinates 0 0 0 Using two step summing over 4 groups of on average 6.0 processes Using 32 MPI processes Using 1 OpenMP thread per MPI process Detecting CPU SIMD instructions. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz Family: 6 Model: 26 Stepping: 5 Features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 SIMD instructions most likely to fit this hardware: SSE4.1 SIMD instructions selected at GROMACS compile time: SSE4.1 Will do PME sum in reciprocal space for electrostatic interactions. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen A smooth particle mesh Ewald method J. Chem. Phys. 103 (1995) pp. 8577-8592 -------- -------- --- Thank You --- -------- -------- Will do ordinary reciprocal space Ewald sum. Using a Gaussian width (1/beta) of 0.320163 nm for Ewald Cut-off's: NS: 1.024 Coulomb: 1 LJ: 1 System total charge: 0.000 Generated table with 1012 data points for Ewald. Tabscale = 500 points/nm Generated table with 1012 data points for LJ6. Tabscale = 500 points/nm Generated table with 1012 data points for LJ12. Tabscale = 500 points/nm Generated table with 1012 data points for 1-4 COUL. Tabscale = 500 points/nm Generated table with 1012 data points for 1-4 LJ6. Tabscale = 500 points/nm Generated table with 1012 data points for 1-4 LJ12. Tabscale = 500 points/nm Using SSE4.1 4x4 non-bonded kernels Using full Lennard-Jones parameter combination matrix Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05 Initialized non-bonded Ewald correction tables, spacing: 6.60e-04 size: 3069 Removing pbc first time Pinning threads with an auto-selected logical core stride of 1 Initializing Parallel LINear Constraint Solver ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess P-LINCS: A Parallel Linear Constraint Solver for molecular simulation J. Chem. Theory Comput. 4 (2008) pp. 116-122 -------- -------- --- Thank You --- -------- -------- The number of constraints is 22285 There are inter charge-group constraints, will communicate selected coordinates each lincs iteration ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Miyamoto and P. A. Kollman SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid Water Models J. Comp. Chem. 13 (1992) pp. 952-962 -------- -------- --- Thank You --- -------- -------- Linking all bonded interactions to atoms There are 140698 inter charge-group exclusions, will use an extra communication step for exclusion forces for PME The maximum number of communication pulses is: X 1 Y 1 The minimum size for domain decomposition cells is 1.024 nm The requested allowed shrink of DD cells (option -dds) is: 0.80 The allowed shrink of domain decomposition cells is: X 0.76 Y 0.30 The maximum allowed distance for charge groups involved in interactions is: non-bonded interactions 1.024 nm two-body bonded interactions (-rdd) 1.024 nm multi-body bonded interactions (-rdd) 0.882 nm atoms separated by up to 5 constraints (-rcon) 1.024 nm Making 2D domain decomposition grid 8 x 3 x 1, home cell index 0 0 0 Center of mass motion removal mode is Linear We have the following groups for center of mass motion removal: 0: rest ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak Molecular dynamics with coupling to an external bath J. Chem. Phys. 81 (1984) pp. 3684-3690 -------- -------- --- Thank You --- -------- -------- There are: 81743 Atoms Charge group distribution at step 0: 3280 3372 3220 3307 3278 3176 3366 3642 3452 3406 3771 3578 3342 3793 3456 3268 3764 3593 3119 3521 3372 3242 3183 3242 Constraining the starting coordinates (step 0) Constraining the coordinates at t0-dt (step 0) RMS relative constraint deviation after constraining: 2.92e-06 Initial temperature: 301.847 K Started mdrun on node 0 Tue Jun 17 15:20:45 2014 Step Time Lambda 0 0.00000 0.00000 Energies (kJ/mol) Angle Proper Dih. Ryckaert-Bell. Improper Dih. LJ-14 5.36318e+04 1.72954e+04 1.37537e+04 5.80913e+03 2.73783e+04 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential 9.80264e+04 4.92496e+04 -1.56493e+06 1.19681e+04 -1.28782e+06 Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd 2.05439e+05 -1.08238e+06 3.01997e+02 4.11986e+02 0.00000e+00 DD step 24 vol min/aver 1.000 load imb.: force 19.5% pme mesh/force 0.884 DD step 9999 vol min/aver 0.823 load imb.: force 1.3% pme mesh/force 0.871 Step Time Lambda 10000 20.00000 0.00000 Writing checkpoint, step 10000 at Tue Jun 17 15:22:09 2014 Energies (kJ/mol) Angle Proper Dih. Ryckaert-Bell. Improper Dih. LJ-14 5.22332e+04 1.76540e+04 1.34630e+04 5.62137e+03 2.61599e+04 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential 9.79558e+04 4.79139e+04 -1.56511e+06 1.16560e+04 -1.29245e+06 Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd 2.03886e+05 -1.08857e+06 2.99714e+02 1.52009e+01 1.58182e-05 <====== ############### ==> <==== A V E R A G E S ====> <== ############### ======> Statistics over 10001 steps using 101 frames Energies (kJ/mol) Angle Proper Dih. Ryckaert-Bell. Improper Dih. LJ-14 5.24395e+04 1.80111e+04 1.35980e+04 5.85244e+03 2.63650e+04 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential 9.79536e+04 4.76414e+04 -1.56536e+06 1.17114e+04 -1.29178e+06 Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd 2.04117e+05 -1.08767e+06 3.00054e+02 -6.02678e+00 0.00000e+00 Box-X Box-Y Box-Z 1.07888e+01 1.01845e+01 9.55230e+00 Total Virial (kJ/mol) 6.88206e+04 -1.19355e+02 8.17106e+01 -1.20462e+02 6.98046e+04 -1.18888e+02 7.78129e+01 -1.25994e+02 6.60646e+04 Pressure (bar) 2.57487e+00 3.58913e+00 -4.95436e-01 3.62404e+00 -3.07913e+01 4.51229e+00 -3.72254e-01 4.73721e+00 1.01361e+01 M E G A - F L O P S A C C O U N T I N G NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table W3=SPC/TIP3p W4=TIP4p (single or pairs) V&F=Potential and force V=Potential only F=Force only Computing: M-Number M-Flops % Flops ----------------------------------------------------------------------------- Pair Search distance check 10720.993402 96488.941 0.7 NxN QSTab Elec. + LJ [F] 160437.566576 6577940.230 45.5 NxN QSTab Elec. + LJ [V&F] 1637.132568 96590.822 0.7 NxN LJ [F] 15456.687360 510070.683 3.5 NxN LJ [V&F] 157.630016 6778.091 0.0 NxN QSTab Elec. [F] 118905.979424 4042803.300 28.0 NxN QSTab Elec. [V&F] 1212.816008 49725.456 0.3 1,4 nonbonded interactions 231.583156 20842.484 0.1 Calc Weights 2452.535229 88291.268 0.6 Spread Q Bspline 52320.751552 104641.503 0.7 Gather F Bspline 52320.751552 313924.509 2.2 3D-FFT 287447.801906 2299582.415 15.9 Solve PME 92.169216 5898.830 0.0 Reset In Box 32.778943 98.337 0.0 CG-CoM 32.860686 98.582 0.0 Angles 283.638361 47651.245 0.3 Propers 111.121111 25446.734 0.2 Impropers 54.645464 11366.257 0.1 RB-Dihedrals 62.336233 15397.050 0.1 Virial 82.905823 1492.305 0.0 Stop-CM 8.337786 83.378 0.0 P-Coupling 817.511743 4905.070 0.0 Calc-Ekin 327.135486 8832.658 0.1 Lincs 336.291243 20177.475 0.1 Lincs-Mat 5113.275384 20453.102 0.1 Constraint-V 1321.398682 10571.189 0.1 Constraint-Vir 98.593344 2366.240 0.0 Settle 216.315902 69870.036 0.5 ----------------------------------------------------------------------------- Total 14452388.189 100.0 ----------------------------------------------------------------------------- D O M A I N D E C O M P O S I T I O N S T A T I S T I C S av. #atoms communicated per step for force: 2 x 102915.8 av. #atoms communicated per step for LINCS: 2 x 17759.8 Average load imbalance: 1.3 % Part of the total run time spent waiting due to load imbalance: 1.0 % Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 % Average PME mesh/force load: 0.864 Part of the total run time spent waiting due to PP/PME imbalance: 2.8 % R E A L C Y C L E A N D T I M E A C C O U N T I N G On 24 MPI ranks doing PP, and on 8 MPI ranks doing PME Computing: Num Num Call Wall time Giga-Cycles Nodes Threads Count (s) total sum % ----------------------------------------------------------------------------- Domain decomp. 24 1 401 1.590 86.478 1.4 DD comm. load 24 1 400 0.004 0.234 0.0 DD comm. bounds 24 1 402 0.046 2.525 0.0 Send X to PME 24 1 10001 0.321 17.448 0.3 Neighbor search 24 1 401 2.969 161.545 2.6 Comm. coord. 24 1 9600 1.006 54.706 0.9 Force 24 1 10001 67.068 3648.836 59.4 Wait + Comm. F 24 1 10001 1.973 107.335 1.7 PME mesh * 8 1 10001 60.029 1088.629 17.7 PME wait for PP * 24.661 447.224 7.3 Wait + Recv. PME F 24 1 10001 0.354 19.256 0.3 NB X/F buffer ops. 24 1 29201 0.754 40.995 0.7 Write traj. 24 1 1 0.032 1.759 0.0 Update 24 1 10001 0.579 31.524 0.5 Constraints 24 1 10001 6.941 377.613 6.1 Comm. energies 24 1 2001 0.422 22.985 0.4 Rest 0.642 34.922 0.6 ----------------------------------------------------------------------------- Total 84.700 6144.213 100.0 ----------------------------------------------------------------------------- (*) Note that with separate PME nodes, the walltime column actually sums to twice the total reported, but the cycle count total and % are correct. ----------------------------------------------------------------------------- Breakdown of PME mesh computation ----------------------------------------------------------------------------- PME redist. X/F 8 1 20002 3.204 58.108 0.9 PME spread/gather 8 1 20002 21.830 395.895 6.4 PME 3D-FFT 8 1 20002 23.167 420.145 6.8 PME 3D-FFT Comm. 8 1 20002 6.549 118.762 1.9 PME solve Elec 8 1 10001 5.231 94.865 1.5 ----------------------------------------------------------------------------- Core t (s) Wall t (s) (%) Time: 2703.769 84.700 3192.2 (ns/day) (hour/ns) Performance: 20.403 1.176 Finished mdrun on node 0 Tue Jun 17 15:22:09 2014