<div dir="ltr"><div dir="ltr"><div>Hi Guido,</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 1, 2021 at 5:34 PM Guido Giuntoli &lt;<a href="mailto:guido.giuntoli@huawei.com" target="_blank">guido.giuntoli@huawei.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div lang="EN-US">

<div>

<p class="MsoNormal">Hi,<u></u><u></u></p>

<p class="MsoNormal"><u></u>Â <u></u></p>

<p class="MsoNormal">I am searching for representative input cases for GROMACS for performance evaluation.<u></u><u></u></p>

<p class="MsoNormal"><u></u>Â <u></u></p>

<p class="MsoNormal">The selection criteria is centered on (priority order):<u></u><u></u></p>

<p style="margin-left:38.25pt">

<u></u><span>1.<span style="font:7pt &quot;Times New Roman&quot;">Â Â Â Â Â Â 

</span></span><u></u>inputs that are good representatives for the GROMACS Â user community<u></u><u></u></p>

<p style="margin-left:38.25pt">

<u></u><span>2.<span style="font:7pt &quot;Times New Roman&quot;">Â Â Â Â Â Â 

</span></span><u></u>desirable to run in a single node (approx. 96 ARM cores / CPU-only) in no more than 1 hour</p></div></div></blockquote><div>Note that a molecular dynamics simulation consists of many consecutive iterations that (typically) have the same average computational cost over time. As iterations are short (milliseconds or less), reliable benchmarking can generally be done in relatively short wall-time. Typically, running a few minutes long benchmarks is sufficient, assuming that there are no other external factors with longer timescales that impact performance, e.g. system temperature stabilization and related clock throttling.<br></div><div><br></div><div>The run lengths can be controlled using the command line, see</div><div><a href="https://manual.gromacs.org/current/user-guide/mdrun-features.html#controlling-the-length-of-the-simulation">https://manual.gromacs.org/current/user-guide/mdrun-features.html#controlling-the-length-of-the-simulation</a></div><div><br></div><div>One typical command line invocation for a 5 minute benchmark with counter resetting is:</div><div>gmx mdrun -nsteps -1 -maxh 0.083 -resethway<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div><p style="margin-left:38.25pt"><u></u><u></u></p>

<p style="margin-left:38.25pt">

<span>3.<span style="font:7pt &quot;Times New Roman&quot;">Â Â Â Â Â Â 

</span></span>makes good use of the memory system and exploits well the SIMD units (NEON and possibly SVE)</p></div></div></blockquote><div>Molecular dynamics simulations, in particular those in the interest to a large part of the GROAMCS community run in the strong scaling regime where all working sets typically fit in at most L3 (or often L2) caches. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div><p style="margin-left:38.25pt"><u></u><u></u></p>

<p class="MsoNormal">Currently I have in mind 2 inputs cases:<u></u><u></u></p>

<p><u></u><span>1.<span style="font:7pt &quot;Times New Roman&quot;">Â Â Â Â Â Â 

The â€œion channelâ€ that belongs to the Unified European Applications Benchmark Suite (UEABS). This input seems to be a good representative for the industry of proteins and polymer simulation (<a href="https://hpc.nih.gov/apps/gromacs/" target="_blank">https://hpc.nih.gov/apps/gromacs/</a>).

<u></u><u></u></p>

<p><u></u><span>2.<span style="font:7pt &quot;Times New Roman&quot;">Â Â Â Â Â Â 

</span></span><u></u>The â€œ<b>nonbonded-benchmark</b>â€ which is integrated inside GROMACS. The benchmark would allow analyzing different problem sizes and internal kernels (<a href="https://manual.gromacs.org/documentation/current/onlinehelp/gmx-nonbonded-benchmark.html" target="_blank">https://manual.gromacs.org/documentation/current/onlinehelp/gmx-nonbonded-benchmark.html</a>).</p></div></div></blockquote><div>Those are reasonable, but given my above point regarding strong scaling, do to make sure you focus on the right parallelization regime (so it is representative of real-world use). Unless you run in a heterogeneous setup (CPU+GPU), the main regime of interest is 50-1000 atoms/core (and the ~milliseconds per iteration performance). Hence, for a 96-core server, the relevant use-cases will be in the 5000-100000 (at most 100000) atoms. The &quot;ion channel&quot; case is on the larger end of this range. <br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div><p><u></u><u></u></p>

<p class="MsoNormal">I would appreciate some guidance or feedback about these two inputs cases and suggestions for considering. Are there good representatives of what the community normally use when they run GROMACS? Are there special parameters (such as time

 steps or problem size, etc.) to take into account for finally saying that the simulations are good examples of normal GROMACS executions?</p></div></div></blockquote><div><br></div><div><div>Further test cases you can find here: <a href="https://zenodo.org/record/3893789">https://zenodo.org/record/3893789</a></div><div><br></div><div>Cheers,</div><div>--</div><div>SzilÃ¡rd</div><br></div></div></div>