<br><br><div class="gmail_quote">On Thu, Oct 21, 2010 at 3:18 PM, Renato Freitas <span dir="ltr"><<a href="mailto:renatoffs@gmail.com" target="_blank">renatoffs@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi gromacs users,<br>
<br>
I have installed the lastest version of gromacs (4.5.1) in an i7 980X<br>
(6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its<br>
mpi version. Also I compiled the GPU-accelerated<br>
version of gromacs. Then I did a 2 ns simulation using a small system<br>
(11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi.<br>
The results that I got are bellow:<br>
<br>
############################################<br>
My *.mdp is:<br>
<br>
constraints = all-bonds<br>
integrator = md<br>
dt = 0.002 ; ps !<br>
nsteps = 1000000 ; total 2000 ps.<br>
nstlist = 10<br>
ns_type = grid<br>
coulombtype = PME<br>
rvdw = 0.9<br>
rlist = 0.9<br>
rcoulomb = 0.9<br>
fourierspacing = 0.10<br>
pme_order = 4<br>
ewald_rtol = 1e-5<br>
vdwtype = cut-off<br>
pbc = xyz<br>
epsilon_rf = 0<br>
comm_mode = linear<br>
nstxout = 1000<br>
nstvout = 0<br>
nstfout = 0<br>
nstxtcout = 1000<br>
nstlog = 1000<br>
nstenergy = 1000<br>
; Berendsen temperature coupling is on in four groups<br>
tcoupl = berendsen<br>
tc-grps = system<br>
tau-t = 0.1<br>
ref-t = 298<br>
; Pressure coupling is on<br>
Pcoupl = berendsen<br>
pcoupltype = isotropic<br>
tau_p = 0.5<br>
compressibility = 4.5e-5<br>
ref_p = 1.0<br>
; Generate velocites is on at 298 K.<br>
gen_vel = no<br>
<br>
########################<br>
RUNNING GROMACS ON GPU<br>
<br>
mdrun-gpu -s topol.tpr -v > & out &<br>
<br>
Here is a part of the md.log:<br>
<br>
Started mdrun on node 0 Wed Oct 20 09:52:09 2010<br>
.<br>
.<br>
.<br>
R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
<br>
Computing: Nodes Number G-Cycles Seconds %<br>
------------------------------------------------------------------------------------------------------<br>
Write traj. 1 1021 106.075 31.7 0.2<br>
Rest 1 64125.577 19178.6 99.8<br>
------------------------------------------------------------------------------------------------------<br>
Total 1 64231.652 19210.3 100.0<br>
------------------------------------------------------------------------------------------------------<br>
<br>
NODE (s) Real (s) (%)<br>
Time: 6381.840 19210.349 33.2<br>
1h46:21<br>
(Mnbf/s) (MFlops) (ns/day) (hour/ns)<br>
Performance: 0.000 0.001 27.077 0.886<br>
<br>
Finished mdrun on node 0 Wed Oct 20 15:12:19 2010<br>
<br>
########################<br>
RUNNING GROMACS ON MPI<br>
<br>
mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &<br>
<br>
Here is a part of the md.log:<br>
<br>
Started mdrun on node 0 Wed Oct 20 18:30:52 2010<br>
<br>
R E A L C Y C L E A N D T I M E A C C O U N T I N G<br>
<br>
Computing: Nodes Number G-Cycles Seconds %<br>
--------------------------------------------------------------------------------------------------------------<br>
Domain decomp. 3 100001 1452.166 434.7 0.6<br>
DD comm. load 3 10001 0.745 0.2<br>
0.0<br>
Send X to PME 3 1000001 249.003 74.5<br>
0.1<br>
Comm. coord. 3 1000001 637.329 190.8<br>
0.3<br>
Neighbor search 3 100001 8738.669 2616.0<br>
3.5<br>
Force 3 1000001 99210.202<br>
29699.2 39.2<br>
Wait + Comm. F 3 1000001 3361.591 1006.3 1.3<br>
PME mesh 3 1000001 66189.554 19814.2<br>
26.2<br>
Wait + Comm. X/F 3 60294.513 8049.5 23.8<br>
Wait + Recv. PME F 3 1000001 801.897 240.1 0.3<br>
Write traj. 3 1015 33.464<br>
10.0 0.0<br>
Update 3 1000001 3295.820<br>
986.6 1.3<br>
Constraints 3 1000001 6317.568<br>
1891.2 2.5<br>
Comm. energies 3 100002 70.784 21.2<br>
0.0<br>
Rest 3 2314.844<br>
693.0 0.9<br>
--------------------------------------------------------------------------------------------------------------<br>
Total 6 252968.148 75727.5<br>
100.0<br>
--------------------------------------------------------------------------------------------------------------<br>
--------------------------------------------------------------------------------------------------------------<br>
PME redist. X/F 3 2000002 1945.551 582.4<br>
0.8<br>
PME spread/gather 3 2000002 37219.607 11141.9 14.7<br>
PME 3D-FFT 3 2000002 21453.362 6422.2<br>
8.5<br>
PME solve 3 1000001 5551.056<br>
1661.7 2.2<br>
--------------------------------------------------------------------------------------------------------------<br>
<br>
Parallel run - timing based on wallclock.<br>
<br>
NODE (s) Real (s) (%)<br>
Time: 12621.257 12621.257 100.0<br>
3h30:21<br>
(Mnbf/s) (GFlops) (ns/day) (hour/ns)<br>
Performance: 388.633 28.773 13.691 1.753<br>
Finished mdrun on node 0 Wed Oct 20 22:01:14 2010<br>
<br>
######################################<br>
Comparing the performance values for the two simulations I saw that in<br>
"numeric terms" the simulation using the GPU gave (for example) ~27<br>
ns/day, while when I used mpi this value is aproximatelly half (13.7<br>
ns/day).<br>
However, when I compared the time that each simulation<br>
started/finished, the simulation using mpi tooks 211 minutes while the<br>
gpu simulation tooked 320 minutes to finish.<br>
<br>
My questions are:<br>
<br>
1. Why in the performace values I got better results with the GPU?<br></blockquote><div>Your CPU version probably can be optimized a bit. You should use HT and run on 12. Make sure PME/PP is balanced and use the best rlist/fourier_spacing ratio. Also your PME accuracy is rather high. Make sure you need that (.11 fourier spacing should be accurate enough for rlist of .9). Your PME node spent 23% waiting on the PP nodes.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
2. Why the simulation running on GPU was 109 min. slower than on 6<br>
cores, since my video card is a GTX 480 with 480 gpu cores? I was<br>
expecting that the GPU would accelerate greatly the simulations.<br></blockquote><div>The output you posted says the GPU version was faster (running only for 106min) The CPU cores are much more powerful. I would expect them to be at about as fast as the GPU.</div>
<div><br></div><div>Roland</div></div>