Hi,<br><br><div class="gmail_quote">On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas <span dir="ltr"><<a href="mailto:renatoffs@gmail.com">renatoffs@gmail.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Do you think that the "NODE" and "Real" time difference could be<br>
attributed to some compilation problem in the mdrun-gpu. Despite I'm<br>
asking this I didn't get any error in the compilation.<br></blockquote><div><br></div><div>It is very odd that these are different for you system. What operating system and compiler do you use? </div><div><br></div><div>
Is HAVE_GETTIMEOFDAY set in src/config.h? </div><div><br></div><div>I attached a small test program which uses the two different timers used for NODE and Real time. You can compile it with cc time.c -o time and run it with ./time. Do you get roughly the same time twice with the test program or do you see the same discrepancy as with GROMACS?</div>
<div><br></div><div>Roland</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Thanks,<br>
<br>
Renato<br>
<br>
2010/10/22 Szilárd Páll <<a href="mailto:szilard.pall@cbr.su.se">szilard.pall@cbr.su.se</a>>:<br>
<div><div></div><div class="h5">> Hi Renato,<br>
><br>
> First of all, what you're seeing is pretty normal, especially that you<br>
> have a CPU that is crossing the border of insane :) Why is it normal?<br>
> The PME algorithms are just simply not very well not well suited for<br>
> current GPU architectures. With an ill-suited algorithm you won't be<br>
> able to see the speedups you can often see in other application areas<br>
> - -even more so that you're comparing to Gromacs on a i7 980X. For<br>
> more info + benchmarks see the Gromacs-GPU page:<br>
> <a href="http://www.gromacs.org/gpu" target="_blank">http://www.gromacs.org/gpu</a><br>
><br>
> However, there is one strange thing you also pointed out. The fact<br>
> that the "NODE" and "Real" time in your mdrun-gpu timing summary is<br>
> not the same, but has 3x deviation is _very_ unusual. I've ran<br>
> mdrun-gpu on quite a wide variety of hardware but I've never seen<br>
> those two counter deviate. It might be an artifact from the cycle<br>
> counters used internally that behave in an unusual way on your CPU.<br>
><br>
> One other thing I should point out is that you would be better off<br>
> using the standard mdrun which in 4.5 by default has thread-support<br>
> and therefore will run on a single cpu/node without MPI!<br>
><br>
> Cheers,<br>
> --<br>
> Szilárd<br>
><br>
><br>
><br>
> On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas <<a href="mailto:renatoffs@gmail.com">renatoffs@gmail.com</a>> wrote:<br>
>> Hi gromacs users,<br>
>><br>
>> I have installed the lastest version of gromacs (4.5.1) in an i7 980X<br>
>> (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its<br>
>> mpi version. Also I compiled the GPU-accelerated<br>
>> version of gromacs. Then I did a 2 ns simulation using a small system<br>
>> (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi.<br>
>> The results that I got are bellow:<br>
>><br>
>> ############################################<br>
>> My *.mdp is:<br>
>><br>
>> constraints = all-bonds<br>
>> integrator = md<br>
>> dt = 0.002 ; ps !<br>
>> nsteps = 1000000 ; total 2000 ps.<br>
>> nstlist = 10<br>
>> ns_type = grid<br>
>> coulombtype = PME<br>
>> rvdw = 0.9<br>
>> rlist = 0.9<br>
>> rcoulomb = 0.9<br>
>> fourierspacing = 0.10<br>
>> pme_order = 4<br>
>> ewald_rtol = 1e-5<br>
>> vdwtype = cut-off<br>
>> pbc = xyz<br>
>> epsilon_rf = 0<br>
>> comm_mode = linear<br>
>> nstxout = 1000<br>
>> nstvout = 0<br>
>> nstfout = 0<br>
>> nstxtcout = 1000<br>
>> nstlog = 1000<br>
>> nstenergy = 1000<br>
>> ; Berendsen temperature coupling is on in four groups<br>
>> tcoupl = berendsen<br>
>> tc-grps = system<br>
>> tau-t = 0.1<br>
>> ref-t = 298<br>
>> ; Pressure coupling is on<br>
>> Pcoupl = berendsen<br>
>> pcoupltype = isotropic<br>
>> tau_p = 0.5<br>
>> compressibility = 4.5e-5<br>
>> ref_p = 1.0<br>
>> ; Generate velocites is on at 298 K.<br>
>> gen_vel = no<br>
>><br>
>> ########################<br>
>> RUNNING GROMACS ON GPU<br>
>><br>
>> mdrun-gpu -s topol.tpr -v > & out &<br>
>><br>
>> Here is a part of the md.log:<br>
>><br>
>> Started mdrun on node 0 Wed Oct 20 09:52:09 2010<br>
>> .<br>
>> .<br>
>> .<br>
>> R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
>><br>
>> Computing: Nodes Number G-Cycles Seconds %<br>
>> ------------------------------------------------------------------------------------------------------<br>
>> Write traj. 1 1021 106.075 31.7 0.2<br>
>> Rest 1 64125.577 19178.6 99.8<br>
>> ------------------------------------------------------------------------------------------------------<br>
>> Total 1 64231.652 19210.3 100.0<br>
>> ------------------------------------------------------------------------------------------------------<br>
>><br>
>> NODE (s) Real (s) (%)<br>
>> Time: 6381.840 19210.349 33.2<br>
>> 1h46:21<br>
>> (Mnbf/s) (MFlops) (ns/day) (hour/ns)<br>
>> Performance: 0.000 0.001 27.077 0.886<br>
>><br>
>> Finished mdrun on node 0 Wed Oct 20 15:12:19 2010<br>
>><br>
>> ########################<br>
>> RUNNING GROMACS ON MPI<br>
>><br>
>> mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &<br>
>><br>
>> Here is a part of the md.log:<br>
>><br>
>> Started mdrun on node 0 Wed Oct 20 18:30:52 2010<br>
>><br>
>> R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
>><br>
>> Computing: Nodes Number G-Cycles Seconds %<br>
>> --------------------------------------------------------------------------------------------------------------<br>
>> Domain decomp. 3 100001 1452.166 434.7 0.6<br>
>> DD comm. load 3 10001 0.745 0.2<br>
>> 0.0<br>
>> Send X to PME 3 1000001 249.003 74.5<br>
>> 0.1<br>
>> Comm. coord. 3 1000001 637.329 190.8<br>
>> 0.3<br>
>> Neighbor search 3 100001 8738.669 2616.0<br>
>> 3.5<br>
>> Force 3 1000001 99210.202<br>
>> 29699.2 39.2<br>
>> Wait + Comm. F 3 1000001 3361.591 1006.3 1.3<br>
>> PME mesh 3 1000001 66189.554 19814.2<br>
>> 26.2<br>
>> Wait + Comm. X/F 3 60294.513 8049.5 23.8<br>
>> Wait + Recv. PME F 3 1000001 801.897 240.1 0.3<br>
>> Write traj. 3 1015 33.464<br>
>> 10.0 0.0<br>
>> Update 3 1000001 3295.820<br>
>> 986.6 1.3<br>
>> Constraints 3 1000001 6317.568<br>
>> 1891.2 2.5<br>
>> Comm. energies 3 100002 70.784 21.2<br>
>> 0.0<br>
>> Rest 3 2314.844<br>
>> 693.0 0.9<br>
>> --------------------------------------------------------------------------------------------------------------<br>
>> Total 6 252968.148 75727.5<br>
>> 100.0<br>
>> --------------------------------------------------------------------------------------------------------------<br>
>> --------------------------------------------------------------------------------------------------------------<br>
>> PME redist. X/F 3 2000002 1945.551 582.4<br>
>> 0.8<br>
>> PME spread/gather 3 2000002 37219.607 11141.9 14.7<br>
>> PME 3D-FFT 3 2000002 21453.362 6422.2<br>
>> 8.5<br>
>> PME solve 3 1000001 5551.056<br>
>> 1661.7 2.2<br>
>> --------------------------------------------------------------------------------------------------------------<br>
>><br>
>> Parallel run - timing based on wallclock.<br>
>><br>
>> NODE (s) Real (s) (%)<br>
>> Time: 12621.257 12621.257 100.0<br>
>> 3h30:21<br>
>> (Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>
>> Performance: 388.633 28.773 13.691 1.753<br>
>> Finished mdrun on node 0 Wed Oct 20 22:01:14 2010<br>
>><br>
>> ######################################<br>
>> Comparing the performance values for the two simulations I saw that in<br>
>> "numeric terms" the simulation using the GPU gave (for example) ~27<br>
>> ns/day, while when I used mpi this value is aproximatelly half (13.7<br>
>> ns/day).<br>
>> However, when I compared the time that each simulation<br>
>> started/finished, the simulation using mpi tooks 211 minutes while the<br>
>> gpu simulation tooked 320 minutes to finish.<br>
>><br>
>> My questions are:<br>
>><br>
>> 1. Why in the performace values I got better results with the GPU?<br>
>><br>
>> 2. Why the simulation running on GPU was 109 min. slower than on 6<br>
>> cores, since my video card is a GTX 480 with 480 gpu cores? I was<br>
>> expecting that the GPU would accelerate greatly the simulations.<br>
>><br>
>><br>
>> Does anyone have some idea?<br>
>><br>
>> Thanks,<br>
>><br>
>> Renato<br>
>> --<br>
>> gmx-users mailing list <a href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</a><br>
>> <a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>
>> Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/Search" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/Search</a> before posting!<br>
>> Please don't post (un)subscribe requests to the list. Use the<br>
>> www interface or send it to <a href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</a>.<br>
>> Can't post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br>
>><br>
> --<br>
> gmx-users mailing list <a href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</a><br>
> <a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>
> Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/Search" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/Search</a> before posting!<br>
> Please don't post (un)subscribe requests to the list. Use the<br>
> www interface or send it to <a href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</a>.<br>
> Can't post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br>
><br>
</div></div>--<br>
<div><div></div><div class="h5">gmx-users mailing list <a href="mailto:gmx-users@gromacs.org">gmx-users@gromacs.org</a><br>
<a href="http://lists.gromacs.org/mailman/listinfo/gmx-users" target="_blank">http://lists.gromacs.org/mailman/listinfo/gmx-users</a><br>
Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/Search" target="_blank">http://www.gromacs.org/Support/Mailing_Lists/Search</a> before posting!<br>
Please don't post (un)subscribe requests to the list. Use the<br>
www interface or send it to <a href="mailto:gmx-users-request@gromacs.org">gmx-users-request@gromacs.org</a>.<br>
Can't post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists" target="_blank">http://www.gromacs.org/Support/Mailing_Lists</a><br>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov">cmb.ornl.gov</a><br>865-241-1537, ORNL PO BOX 2008 MS6309<br>