<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type" />
<title></title>
</head>
<body>
<p style="margin: 0px;"><span> </span></p>
<div style="margin: 5px 0px 5px 0px;">
On January 25, 2011 at 3:24 PM "Justin A. Lemkul" <jalemkul@vt.edu> wrote:<br />
<br />
><br />
><br />
> TJ Mustard wrote:<br />
> ><br />
> ><br />
> > <br />
> ><br />
> ><br />
> > On January 25, 2011 at 2:08 PM Mark Abraham <Mark.Abraham@anu.edu.au> wrote:<br />
> ><br />
> >> On 26/01/2011 5:50 AM, TJ Mustard wrote:<br />
> >>><br />
> >>> Hi all,<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> I am running MD/FEP on a protein-ligand system with gromacs 4.5.3 and<br />
> >>> FFTW 3.2.2.<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> My iMac will run the job (over 4000 steps, till I killed it) at 4fs<br />
> >>> steps. (I am using heavy H)<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> Once I put this on our groups AMD Cluster the jobs fail even with 2fs<br />
> >>> steps. (with thousands of lincs errors)<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> We have recompiled the clusters gromacs 4.5.3 build, with no change.<br />
> >>> I know the system is the same since I copied the job from the server<br />
> >>> to my machine, to rerun it.<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> What is going on? Why can one machine run a job perfectly and the<br />
> >>> other cannot? I also know there is adequate memory on both machines.<br />
> >>><br />
> >><br />
> >> You've posted this before, and I made a number of diagnostic<br />
> >> suggestions. What did you learn?<br />
> >><br />
> >> Mark<br />
> ><br />
> > Mark and all,<br />
> ><br />
> > <br />
> ><br />
> > First thank you for all our help. What you suggested last time helped<br />
> > considerably with our jobs/calculations. I have learned that using the<br />
> > standard mdp settings allow my heavyh 4fs jobs to run on my iMac (intel)<br />
> > and have made these my new standard for future jobs. We chose to use the<br />
> > smaller 0.8nm PME/Cutoff due to others papers/tutorials, but now we<br />
> > understand why we need these standard settings. Now what I see to be our<br />
> > problem is that our machines have some sort of variable we cannot<br />
> > account for. If I am blind to my error, please show me. I just don't<br />
> > understand why one computer works while the other does not. We have<br />
> > recompiled gromacs 4.5.3 single precission on our cluster, and still<br />
> > have this problem.<br />
> ><br />
><br />
> I know the feeling all too well.  PowerPC jobs crash instantly, on our cluster,<br />
> despite working beautifully on our lab machines.  There's a bug report about<br />
> that one, but I haven't heard anything about AMD failures.  It remains a<br />
> possibility that something beyond your control is going on.  To explore a bit<br />
> further:<br />
><br />
> 1. Do the systems in question crash immediately (i.e., step zero) or do they run<br />
> for some time?<br />
>
</div>
<p style="margin: 0px;">Step 0, every time.</p>
<p style="margin: 0px;"> </p>
<div style="margin: 5px 0px 5px 0px;">
> 2. If they give you even a little bit of output, you can analyze which energy<br />
> terms, etc go haywire with the tips listed here:<br />
>
</div>
<p style="margin: 0px;">All I have seen on these is LINCS Errors and Water molecules unable to be settled.</p>
<p style="margin: 0px;"> </p>
<p style="margin: 0px;">But I will check this out right now, and email if I smell trouble.</p>
<p style="margin: 0px;"> </p>
<div style="margin: 5px 0px 5px 0px;">
> http://www.gromacs.org/Documentation/Terminology/Blowing_Up#Diagnosing_an_Unstable_System<br />
><br />
> That would help in tracking down any potential bug or error.<br />
><br />
> 3. Is it just the production runs that are crashing, or everything?  If EM isn't<br />
> even working, that smells even buggier.
</div>
<p style="margin: 0px;">Awesome question here, we have seen some weird stuff. Sometimes the cluster will give us segmentation faults, then it will fail on our machines or sometimes not on our iMacs. I know weird! If EM starts on the cluster it will finish. Where we have issues is in positional restraint (PR) and MD and MD/FEP. It doesn't matter if FEP is on or off in a MD (although we are using SD for these MD/FEP runs).</p>
<p style="margin: 0px;"> </p>
<div style="margin: 5px 0px 5px 0px;">
><br />
> 4. Are the compilers the same on the iMac vs. AMD cluster?
</div>
<p style="margin: 0px;">No I am using x86_64-apple-darwin10 GCC 4.4.4 and the cluster is using x86_64-redhat-linux 4.1.2 GCC.</p>
<p style="margin: 0px;">I just did a quick yum search and there doesn't seem to be a newer GCC. We know you are going to cmake but we have yet to get it implemented on our cluster successfully.</p>
<p style="margin: 0px;"> </p>
<p style="margin: 0px;">Thank you,</p>
<p style="margin: 0px;">TJ Mustard</p>
<p style="margin: 0px;"> </p>
<div style="margin: 5px 0px 5px 0px;">
><br />
> -Justin<br />
><br />
> > <br />
> ><br />
> > Now I understand that my iMac works, but it only has 2 cpus and the<br />
> > cluster has 320. Since we are running our jobs via a Bennet's Acceptance<br />
> > Ratio FEP with 21 lambda windows, using just one 2 cpu machine would<br />
> > take too long. Especially since we wish to start pseudo high throughput<br />
> > drug testing.<br />
> ><br />
> > <br />
> ><br />
> > <br />
> ><br />
> > In my .mdp files now, the only changes are:<br />
> ><br />
> > (the default setting is on the right of the ";")<br />
> ><br />
> > <br />
> ><br />
> > <br />
> ><br />
> > define                   =     ; =<br />
> ><br />
> > ; RUN CONTROL PARAMETERS<br />
> > integrator               = sd    ; = md<br />
> > ; Start time and timestep in ps<br />
> > tinit                    = 0    ; = 0<br />
> > dt                       = 0.004    ; = 0.001<br />
> > nsteps                   = 750000       ; = 0 (this one depends on the<br />
> > window and particular part of our job)<br />
> ><br />
> > ; OUTPUT CONTROL OPTIONS<br />
> > ; Output frequency for coords (x), velocities (v) and forces (f)<br />
> > nstxout                  = 10000    ; = 100 (to save on disk space)<br />
> > nstvout                  = 10000    ; = 100<br />
> ><br />
> > <br />
> ><br />
> > ; OPTIONS FOR ELECTROSTATICS AND VDW<br />
> > ; Method for doing electrostatics<br />
> > coulombtype              = PME    ; = Cutoff<br />
> > rcoulomb-switch          = 0    ; = 0<br />
> > rcoulomb                 = 1  ; = 1<br />
> > ; Relative dielectric constant for the medium and the reaction field<br />
> > epsilon_r                = 1    ; = 1<br />
> > epsilon_rf               = 1    ; = 1<br />
> > ; Method for doing Van der Waals<br />
> > vdw-type                 = Cut-off    ; = Cut-off<br />
> > ; cut-off lengths       <br />
> > rvdw-switch              = 0    ; = 0<br />
> > rvdw                     = 1  ; = 1<br />
> > ; Spacing for the PME/PPPM FFT grid<br />
> > fourierspacing           = 0.12    ; = 0.12<br />
> > ; EWALD/PME/PPPM parameters<br />
> > pme_order                = 4    ; = 4<br />
> > ewald_rtol               = 1e-05    ; = 1e-05<br />
> > ewald_geometry           = 3d    ; = 3d<br />
> > epsilon_surface          = 0    ; = 0<br />
> > optimize_fft             = yes    ; = no<br />
> ><br />
> > <br />
> ><br />
> > ; OPTIONS FOR WEAK COUPLING ALGORITHMS<br />
> > ; Temperature coupling <br />
> > tcoupl                   = v-rescale    ; = No<br />
> > nsttcouple               = -1    ; = -1<br />
> > nh-chain-length          = 10    ; = 10<br />
> > ; Groups to couple separately<br />
> > tc-grps                  = System    ; =<br />
> > ; Time constant (ps) and reference temperature (K)<br />
> > tau-t                    = 0.1    ; =<br />
> > ref-t                    = 300    ; =<br />
> > ; Pressure coupling     <br />
> > Pcoupl                   = Parrinello-Rahman    ; = No<br />
> > Pcoupltype               = Isotropic<br />
> > nstpcouple               = -1    ; = -1<br />
> > ; Time constant (ps), compressibility (1/bar) and reference P (bar)<br />
> > tau-p                    = 1    ; = 1<br />
> > compressibility          = 4.5e-5    ; =<br />
> > ref-p                    = 1.0    ; =<br />
> ><br />
> > <br />
> ><br />
> > ; OPTIONS FOR BONDS   <br />
> > constraints              = all-bonds    ; = none<br />
> > ; Type of constraint algorithm<br />
> > constraint-algorithm     = Lincs    ; = Lincs<br />
> ><br />
> > <br />
> ><br />
> > ; Free energy control stuff<br />
> > free-energy              = yes    ; = no<br />
> > init-lambda              = 0.00       ; = 0<br />
> > delta-lambda             = 0    ; = 0<br />
> > foreign_lambda           =        0.05 ; =<br />
> > sc-alpha                 = 0.5    ; = 0<br />
> > sc-power                 = 1.0    ; = 0<br />
> > sc-sigma                 = 0.3    ; = 0.3<br />
> > nstdhdl                  = 1    ; = 10<br />
> > separate-dhdl-file       = yes    ; = yes<br />
> > dhdl-derivatives         = yes    ; = yes<br />
> > dh_hist_size             = 0    ; = 0<br />
> > dh_hist_spacing          = 0.1    ; = 0.1<br />
> > couple-moltype           = LGD    ; =<br />
> > couple-lambda0           = vdw-q    ; = vdw-q<br />
> > couple-lambda1           = none    ; = vdw-q<br />
> > couple-intramol          = no     ;    = no<br />
> ><br />
> > <br />
> ><br />
> > <br />
> ><br />
> > Some of these change due to positional restraint md and energy minimization.<br />
> ><br />
> > <br />
> ><br />
> > All of these settings have come from either tutorials, papers or peoples<br />
> > advice.<br />
> ><br />
> > <br />
> ><br />
> > If it would be advantageous I can post my entire energy minimization,<br />
> > positional restraint, md, and FEP mdp files.<br />
> ><br />
> > <br />
> ><br />
> > Thank you,<br />
> ><br />
> > TJ Mustard<br />
> ><br />
> > <br />
> ><br />
> > <br />
> ><br />
> >>> <br />
> >>><br />
> >>> Below is my command sequence:<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> echo<br />
> >>> ==============================================================================================================================<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-grompp -f em.mdp -c RNAP-C_b4em.gro -p RNAP-C.top -o<br />
> >>> RNAP-C_em.tpr<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f em.mdp -c<br />
> >>> RNAP-C_b4em.gro -p RNAP-C.top -o RNAP-C_em.tpr<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-mdrun -v -s RNAP-C_em.tpr -c RNAP-C_after_em.gro -g<br />
> >>> emlog.log -cpo state_em.cpt -nt 2<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_em.tpr<br />
> >>> -c RNAP-C_after_em.gro -g emlog.log -cpo stat_em.cpt -nt 2<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-grompp -f pr.mdp -c RNAP-C_after_em.gro -p RNAP-C.top -o<br />
> >>> RNAP-C_pr.tpr<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f pr.mdp -c<br />
> >>> RNAP-C_after_em.gro -p RNAP-C.top -o RNAP-C_pr.tpr<br />
> >>> echo g453s-mdrun -v -s RNAP-C_pr.tpr -e pr.edr -c RNAP-C_after_pr.gro<br />
> >>> -g prlog.log -cpo state_pr.cpt -nt 2 -dhdl dhdl-pr.xvg<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_pr.tpr<br />
> >>> -e pr.edr -c RNAP-C_after_pr.gro -g prlog.log -cpo state_pr.cpt -nt 2<br />
> >>> -dhdl dhdl-pr.xvg<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-grompp -f md.mdp -c RNAP-C_after_pr.gro -p RNAP-C.top -o<br />
> >>> RNAP-C_md.tpr<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f md.mdp -c<br />
> >>> RNAP-C_after_pr.gro -p RNAP-C.top -o RNAP-C_md.tpr<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-mdrun -v -s RNAP-C_md.tpr -o RNAP-C_md.trr -c<br />
> >>> RNAP-C_after_md.gro -g md.log -e md.edr -cpo state_md.cpt -nt 2 -dhdl<br />
> >>> dhdl-md.xvg<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_md.tpr<br />
> >>> -o RNAP-C_md.trr -c RNAP-C_after_md.gro -g md.log -e md.edr -cpo<br />
> >>> state_md.cpt -nt 2 -dhdl dhdl-md.xvg<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-grompp -f FEP.mdp -c RNAP-C_after_md.gro -p RNAP-C.top -o<br />
> >>> RNAP-C_fep.tpr<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f FEP.mdp -c<br />
> >>> RNAP-C_after_md.gro -p RNAP-C.top -o RNAP-C_fep.tpr<br />
> >>> date >>RNAP-C.joblog<br />
> >>> echo g453s-mdrun -v -s RNAP-C_fep.tpr -o RNAP-C_fep.trr -c<br />
> >>> RNAP-C_after_fep.gro -g fep.log -e fep.edr -cpo state_fep.cpt -nt 2<br />
> >>> -dhdl dhdl-fep.xvg<br />
> >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_fep.tpr<br />
> >>> -o RNAP-C_fep.trr -c RNAP-C_after_fep.gro -g fep.log -e fep.edr -cpo<br />
> >>> state_fep.cpt -nt 2 -dhdl dhdl-fep.xvg<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> <br />
> >>><br />
> >>> I can add my .mdps but I do not think they are the problem since I<br />
> >>> know it works on my personal iMac.<br />
> >>><br />
> >>> <br />
> >>><br />
> >>> Thank you,<br />
> >>><br />
> >>> TJ Mustard<br />
> >>> Email: mustardt@onid.orst.edu <mailto:mustardt@onid.orst.edu><br />
> >>><br />
> >><br />
> > <br />
> ><br />
> > TJ Mustard<br />
> > Email: mustardt@onid.orst.edu<br />
> ><br />
><br />
> --<br />
> ========================================<br />
><br />
> Justin A. Lemkul<br />
> Ph.D. Candidate<br />
> ICTAS Doctoral Scholar<br />
> MILES-IGERT Trainee<br />
> Department of Biochemistry<br />
> Virginia Tech<br />
> Blacksburg, VA<br />
> jalemkul[at]vt.edu | (540) 231-9080<br />
> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin<br />
><br />
> ========================================<br />
> --<br />
> gmx-users mailing list    gmx-users@gromacs.org<br />
> http://lists.gromacs.org/mailman/listinfo/gmx-users<br />
> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!<br />
> Please don't post (un)subscribe requests to the list. Use the<br />
> www interface or send it to gmx-users-request@gromacs.org.<br />
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists<br />
>
</div>
<p style="margin: 0px;"> </p>
<p style="font-family: monospace; white-space: nowrap; margin: 5px 0px 5px 0px;">TJ Mustard<br />
Email: mustardt@onid.orst.edu</p>
</body>
</html>