[gmx-users] Restarting crashed simulation

Mark Abraham mark.j.abraham at gmail.com
Sat Nov 18 23:55:27 CET 2017


Hi,

Looks like you made a typo with state.cpt and that you perhaps have
multiple mdrun processes running such that the actual output file is in one
of the backup files labelled with # characters.

Mark

On Fri, 17 Nov 2017 19:37 Ali Ahmed <aa5635737 at gmail.com> wrote:

> Hello GROMACS users
> My MD simulation was crashed then I restarted the simulation from the point
> when the point was written using this command on 64 processors: mpirun -np
> 64  mdrun_mpi -s md.tpr -cpi stat.cpt
>
> After few days I got nothing in the folder usch as output.gro and I got the
> following
> _______________________________________________
> Command line:
>   mdrun_mpi -s md.tpr -cpi stat.cpt
>
> Warning: No checkpoint file found with -cpi option. Assuming this is a new
> run.
>
>
> Back Off! I just backed up md.log to ./#md.log.2#
>
> Running on 4 nodes with total 64 cores, 64 logical cores
>   Cores per node:           16
>   Logical cores per node:   16
> Hardware detected on host compute-2-27.local (the node of MPI rank 0):
>   CPU info:
>     Vendor: Intel
>     Brand:  Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
>     SIMD instructions most likely to fit this hardware: AVX_256
>     SIMD instructions selected at GROMACS compile time: AVX_256
>
>   Hardware topology: Basic
>
> Reading file md.tpr, VERSION 2016.3 (single precision)
> Changing nstlist from 10 to 40, rlist from 1 to 1.003
>
> Will use 48 particle-particle and 16 PME only ranks
> This is a guess, check the performance at the end of the log file
> Using 64 MPI processes
> Using 1 OpenMP thread per MPI process
>
> Non-default thread affinity set probably by the OpenMP library,
> disabling internal thread affinity
> WARNING: This run will generate roughly 50657 Mb of data
>
> starting mdrun 'Molecular Dynamics'
> 25000000 steps,  50000.0 ps.
>
> step 888000 Turning on dynamic load balancing, because the performance loss
> due to load imbalance is 8.7 %.
> step 930400 Turning off dynamic load balancing, because it is degrading
> performance.
> step 1328000 Turning on dynamic load balancing, because the performance
> loss due to load imbalance is 3.4 %.
> step 1328800 Turning off dynamic load balancing, because it is degrading
> performance.
> step 1336000 Turning on dynamic load balancing, because the performance
> loss due to load imbalance is 3.4 %.
> step 1338400 Turning off dynamic load balancing, because it is degrading
> performance.
> step 1340000 Will no longer try dynamic load balancing, as it degraded
> performance.
> Writing final coordinates.
>  Average load imbalance: 13.2 %
>  Part of the total run time spent waiting due to load imbalance: 7.5 %
>  Average PME mesh/force load: 1.077
>  Part of the total run time spent waiting due to PP/PME imbalance: 4.1 %
>
> NOTE: 7.5 % of the available CPU time was lost due to load imbalance
>       in the domain decomposition.
>       You might want to use dynamic load balancing (option -dlb.)
>
>
>                Core t (s)   Wall t (s)        (%)
>        Time: 26331875.601   411435.556     6400.0
>                          4d18h17:15
>                  (ns/day)    (hour/ns)
> Performance:       10.500        2.286
> _____________________________________________________________
>
> Any advise or suggestion will be helpful.
>
> Thanks in advance
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list