Hi again devs,<div><br></div><div>We&#39;ve got our fortnightly teleconference scheduled again this Wednesday. Thinking of topics to discuss has been a bit challenging - they can neither be too vague we can&#39;t decide anything or too detailed that only a few people can make useful input. Suggestions are most welcome!</div>

<div><br></div><div>So far I&#39;ve come up with</div><div><br></div><div>1. replacing rvec with something more friendly to C++</div><div>2. coding strategy for whatever will replace do_md()</div><div><br></div><div>I&#39;ve put some initial thoughts on these together, which you can find below. If someone can identify other suitable topics, do speak up.</div>

<div><br></div><div>Details will be the same as last time </div><div>* a Google Hangout will be run by the <a href="mailto:mark.abraham@scilifelab.se">mark.abraham@scilifelab.se</a> account. Please mail that account from the Google account with which you might want to connect, so that I can have you in the Circle before the meeting is due to start</div>

<div>* start 6pm, end 6:30pm Stockholm time, Wed 20 Feb (should be during working hours for Americans)</div><div>* if there&#39;s interest in continued discussion, perhaps on implementation details, those people can continue on after 6:30 </div>

<div>* please use the best quality hardware and connection you reasonably can (not on your laptop at the local cafe, or with your kids screaming at you). Know how to mute yourself, or we might have to drop you!</div><div>

* I&#39;ll issue the hangout invitation shortly after 5pm if you want to test your connection or setup</div><div>* I&#39;ll post a summary after the meeting of what was discussed/decided/whatever</div><div><br></div><div>

People who haven&#39;t attended before are welcome. We had 10 connections last time and things were pretty good, so the technology seems to scale reasonable well. If you&#39;re new, please let me know what part(s) of the meeting programme is of interest to you so I can help manage discussion suitably.</div>

<div><br></div><div>If you can&#39;t attend, please feel free to contribute in this thread, or email me, etc.</div><div><br></div><div>Cheers,</div><div><br></div><div>Mark</div><div>GROMACS development manager</div><div>

<br></div><div>Thoughts for Wed 20 Feb</div><div>========</div><div><br></div><div><b id="internal-source-marker_0.01790935220196843" style="font-family:Times;font-size:medium;font-weight:normal"><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">1. planning for an internal coordinate format</span><br>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">can’t keep using rvec</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">rvec can’t be put into STL containers (need copy constructor, etc.)</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">rvec guarantees we can’t use aligned loads anywhere (important for leveraging SIMD possibilities)</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">makes using RAII harder</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">probably makes writing const-correct code harder</span></li></ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">we want to be able to use STL containers when that makes code writing, review and maintenance easier</span></li><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">we need to be able to get flat C arrays of atom coordinates with no overhead for compute kernels</span></li><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">straightforward suggestion: switch to using an RVec class with a 4-tuple of reals and use them for x, y, z and q</span></li><ul style="margin-top:0pt;margin-bottom:0pt">

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">in many places q won’t be used</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">16-byte alignment for free (opportunities for compiler auto-SIMD)</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">perhaps 4/3 increase in cache traffic where q is not being used</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">std::vector&lt; std::vector&lt;real&gt; &gt; doesn’t map to a flat C array - need to write/find a “tuple” class that lets the compiler know what is going on, so that std::vector&lt; tuple&lt;real,4&gt; &gt; ends up as a flat C array of xyzqxyzqxyzq...</span></li>

</ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">separate vectors for x, y, z and q could be useful because that would help avoid the swizzling (group kernels) and coordinate copying (Verlet kernels) that currently occurs</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">downside is that x, y, and z are normally used together, so a naive approach pretty much guarantees we need 3 cache lines for each point... if we don’t re-use that data a few times, that could kill us</span></li>

</ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">internally use some kind of “packed rvec” laid out xxxxyyyyzzzz(qqqq) and have some kind of intelligent object that we can use just like we use rvec now, e.g. coords[3][YY] magically returns the 8th element of xxxxyyyyzzzz</span></li>

<li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><b id="internal-source-marker_0.01790935220196843" style="font-family:Times;font-size:medium;font-weight:normal"><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">the needs of mdrun and analysis tools are different, and we can perhaps explore different implementations for each - but a common interface would be highly desirable</span></b></li>

<li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">ideally we would not commit in 2013 to an internal representation that we might regret in the future... how can we plan to be flexible?</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">run-time polymorphism, e.g. have the coordinate representation classes share a common base with virtual functions - probably too slow, and we don’t want to store the virtual function tables</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">code versioning - ugh</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">bury our heads in the sand - we might get lucky and never want to change our coordinate representation</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">compile-time polymorphism, e.g. mdrun&lt;RVec&gt; vs mdrun&lt;PackedRVec,4&gt;</span></li><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">might also allow a more elegant implementation of double- vs mixed-precision</span></li><li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">code bloat if we want binaries that can run on any x86 if different CPUs will want different packings</span></li><li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">compile-time bloat if compiling more than one such representation, as a lot of routines would now be parameterized</span></li></ul></ul></ul><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">2. planning for do_md()</span><br>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><a href="http://redmine.gromacs.org/issues/1137"><span style="color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap">http://redmine.gromacs.org/issues/1137</span></a><span style="vertical-align:baseline;white-space:pre-wrap"> discusses some thoughts about how we might like to make the integrator more awesome</span></li>

<li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Main loop inside do_md() is currently ~1300 lines, mostly with heavily nested conditionality</span></li>

<li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Currently, the need to pass lots of arguments to and from the functions it calls limits our ability to change anything, else we could probably break it into</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">ManageSpecialCases()</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">DoNeighbourSearching()</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">CalculateForces()</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">DoFirstUpdate()</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">WriteTrajectories()</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">DoSecondUpdate()</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">WriteEnergies()</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">MoreManagementOfSpecialCases()</span></li><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">PrepareForNextIteration()</span></li></ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">In C++, being able to construct an MDLoop object that contains (lots of) objects that already have their own “constant” data will mean we only need to pass to methods of those objects any remaining control values for the current operation</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">passing of state information managed by letting the MDLoop own that data and have the object implementing the strategy ask for what it needs?</span></li>

</ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Those objects will have a lot of inter-relationships, so probably need a common interface for (say) thermostat algorithms so that (say) the MDLoop update method knows it can just call (say) the thermostat object’s method and the result will be correct, whether there’s a barostat involved, or not</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">easily done with an (abstract?) base class and overriding virtual functions</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">however, that kind of *dynamic-binding* run-time polymorphism is overkill - likely any simulation knows before it gets into the main loop that it’s only ever going to call (say) AndersenThermostat’s methods</span></li>

<li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">the overhead from such function calls is probably not a big deal - this loop is always going to be heavily dominated by CalculateForces()</span></li>

<li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">inheritance can maximise code re-use</span></li></ul><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline">

<span style="vertical-align:baseline;white-space:pre-wrap">can be done by having function pointers that get set up correctly in the MDLoop constructor (i.e. “static” run-time polymorphism, as dictated by the .tpr)</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">this might lead to code duplication?</span></li>

<li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">might lead to the current kind of conditional-heavy code, because it is now the coder’s job to choose the right code path, but hopefully only in construction</span></li>

</ul><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">could be done with compile-time polymorphism (i.e. templates)</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:square;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">lots of duplicated object code because of the explosion of templated possibilities</span></li>

</ul></ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Need to bear in mind that probably this pretty front end will be queueing up work requests that will be dynamically dispatched to available hardware (obviously the dispatcher will focus on hardware that has the right data locality). That seems OK to Mark:</span></li>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">we need an interface that makes it reasonably easy to see that the physics of our algorithm should be working</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">how the work gets done *should* be somewhat opaque to MDLoop</span></li>

<li dir="ltr" style="list-style-type:circle;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">separating the two makes for future extensibility and customizability</span></li>

</ul><li dir="ltr" style="list-style-type:disc;font-size:15px;font-family:Arial;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">perhaps a good way to start to get a handle on what kinds of objects and relationships we needs is to make an ideal flowchart for a plausible subset of mdrun functionality, and see what data has to be known where. Perhaps Michael can sketch something for us that illustrates what the algorithmic requirements of a “full Trotter decomposition framework” would be. (But probably not in time for this week!)</span></li>

</ul></b></div>