<p dir="ltr">Hi,</p>
<p dir="ltr">There is a real performance difference, but it's smaller than I first thought. We should write a few line test program to measure the actual overhead and look at understandable assembly.<br>
In any case, we should use a pointer from .data() in loops. This is automatically enforced by all current kernel functions, which take either rvec *of real * as arguments.</p>
<p dir="ltr">Cheers,</p>
<p dir="ltr">Berk</p>
<div class="gmail_quote">On Mar 23, 2016 3:09 PM, Berk Hess &lt;hess@kth.se&gt; wrote:<br type='attribution'><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div>
    <div>Hi,<br />
      <br />
      It could be that I was wrong. I can&#39;t reproduce the performance
      difference in this particular loop. It might have been caused by
      the extra vector copy (but I am pretty sure I checked without as
      well). So it looks like there is no issue with
      std::vector&lt;RVec&gt;<br />
      <br />
      Szilard observation of performance degradation between my vsite
      patch with and without std::vector is still there though. We&#39;ll
      track down where this comes from.<br />
      <br />
      Cheers,<br />
      <br />
      Berk<br />
      <br />
      On 2016-03-23 14:46, Berk Hess wrote:<br />
    </div>
    <blockquote>
      
      <div>On 2016-03-23 14:18, Mark Abraham
        wrote:<br />
      </div>
      <blockquote>
        <div dir="ltr">Hi,<br />
          <br />
          <div class="elided-text">
            <div dir="ltr">On Wed, Mar 23, 2016 at 1:56 PM Berk Hess
              &lt;<a href="mailto:hess&#64;kth.se">hess&#64;kth.se</a>&gt; wrote:<br />
            </div>
            <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div>Hi,<br />
                  <br />
                  It&#39;s impractical and bad for performance to go from
                  AoS to SoA.<br />
                </div>
              </div>
            </blockquote>
            <div><br />
            </div>
            <div>Sure, but that&#39;s what we do every MD step on typical
              CPUs. We do DD communication on XYZXYZXYZXYZ and then
              transpose them to XXXXYYYYZZZZ to use in short-ranged
              kernels. If we did DD communication on XXXXX....YYYYY.....
              ZZZZZZ.... then we still have gather-scatter overhead, but
              we don&#39;t have transposition in the mix.</div>
            <div><br />
            </div>
            <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div> The question is now if we can fix this issue or if
                  we should always try to use .data() in inner-loops</div>
              </div>
            </blockquote>
            <div><br />
            </div>
            <div>I think you&#39;re missing my point... we&#39;ve always done a
              cast from rvec * to real * to pass to inner loops. :-)
              Using data() is a similar idea - hence my suggestion of
              as_real_array(theVectorOfRVecs).</div>
          </div>
        </div>
      </blockquote>
      No. The bonded, vsite and constraint code uses rvec and we&#39;d like
      to keep this in most cases to use rvec operations.<br />
      <blockquote>
        <div dir="ltr">
          <div class="elided-text">
            <div><br />
            </div>
            <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div> or if we should completely avoid
                  std::vector&lt;RVec&gt; in performance sensitive code
                  and use good old C pointers with nasty manual
                  allocation.<br />
                </div>
              </div>
            </blockquote>
            <div><br />
            </div>
            <div>Allocation is irrelevant to how one uses it. One must
              have a real * to be sure the compiler isn&#39;t confused. To
              get that from a vector, we use data(). Whether the
              allocation of the memory is managed by std::vector or snew
              is immaterial.</div>
          </div>
        </div>
      </blockquote>
      That&#39;s true. But we need a way of avoiding coders from using [] on
      std::vector&lt;RVec&gt; in performance sensitive code. Or we could
      consider not allowing that anywhere, which is at least possible (I
      just discussed this with Mark).<br />
      <br />
      Berk<br />
      <blockquote>
        <div dir="ltr">
          <div class="elided-text">
            <div><br />
            </div>
            <div>Mark</div>
            <div><br />
            </div>
            <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div> Cheers,<br />
                  <br />
                  Berk</div>
              </div>
              <div>
                <div><br />
                  <br />
                  On 2016-03-23 13:43, Mark Abraham wrote:<br />
                </div>
              </div>
              <div>
                <blockquote>
                  <div dir="ltr">Hi,
                    <div><br />
                    </div>
                    <div>RVec was intended as a convenience feature for
                      people writing high-level, rather than low-level
                      code. I would indeed not advocate using it in
                      performance-sensitive mdrun code without checking
                      for such performance effects.</div>
                    <div><br />
                    </div>
                    <div>I think the real issue is closer to that the
                      use of *rvec* generally confuses compilers because
                      annotations about pointers do indeed get lost, and
                      perhaps [3] in real [3][] is not as close to being
                      part of the type in the compiler implementation as
                      we might expect. (For a long time, we&#39;ve used very
                      little rvec close to the short-ranged kernels. The
                      first thing we do with the rvec * args to the SIMD
                      bondeds is to cast them to real *. I think this is
                      just as hard to police as Berk&#39;s concern about not
                      using std::vector&lt;RVec&gt;::operator[] naively
                      - the problem isn&#39;t the &#34;fault&#34; of std::vector or
                      its use.)</div>
                    <div><br />
                    </div>
                    <div>The use of rvec * engenders some transposing
                      that might be avoided if we had SoA rather than
                      AoS layout in mdrun. A single rvec is likely to
                      have all three coordinates on the same cache line,
                      but if one is anyway often using nearby atoms the
                      caching behaviour of three separate real * arrays
                      should be rather similar. One still tends to have
                      gather/scatter overhead, however.</div>
                    <div><br />
                    </div>
                    <div>Mark</div>
                    <div><br />
                    </div>
                  </div>
                  <br />
                  <div class="elided-text">
                    <div dir="ltr">On Wed, Mar 23, 2016 at 11:24 AM Berk
                      Hess &lt;<a href="mailto:hess&#64;kth.se">hess&#64;kth.se</a>&gt;


                      wrote:<br />
                    </div>
                    <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div>
                        <div>No.<br />
                          using the RVec * pointer obtained through
                          .data() already fixes the issue. We don&#39;t even
                          need as_rvec_array().<br />
                          <br />
                          The real issue is that it&#39;s near impossible to
                          enforce this and detect vector access of RVecs
                          if we start using std::vector everywhere
                          through the code.<br />
                          <br />
                          Cheers,<br />
                          <br />
                          Berk</div>
                      </div>
                      <div>
                        <div><br />
                          <br />
                          On 2016-03-23 11:19, Mark Abraham wrote:<br />
                        </div>
                      </div>
                      <div>
                        <blockquote>
                          <div dir="ltr">Hi,
                            <div><br />
                            </div>
                            <div>If so, then we might want a helper
                              function like as_rvec_array, but
                              as_real_array instead.</div>
                            <div><br />
                            </div>
                            <div>Mark</div>
                          </div>
                          <br />
                          <div class="elided-text">
                            <div dir="ltr">On Wed, Mar 23, 2016 at 10:51
                              AM Berk Hess &lt;<a href="mailto:hess&#64;kth.se">hess&#64;kth.se</a>&gt;
                              wrote:<br />
                            </div>
                            <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div>
                                <div>On 2016-03-23 10:50, Erik Lindahl
                                  wrote:<br />
                                </div>
                                <blockquote> Hi,
                                  <div><br />
                                  </div>
                                  <div>I haven’t followed the discussion
                                    in detail, but a long time ago I
                                    remember having simliar issues in
                                    the kernels when using a list[] of
                                    rvec[] (in plain-old-c, no c&#43;&#43;)
                                    instead of extracting the pointer
                                    and handling the multiply-by-3
                                    manually. Could it be something
                                    similar here, e.g. that the compiler
                                    things it does not know enough about
                                    RVec rather than something going
                                    from with the outer list?</div>
                                </blockquote>
                              </div>
                              <div>
                                That would be my guess. The index used
                                in the same loop comes from a vector as
                                well and doesn&#39;t seem to affect
                                performance.</div>
                              <div><br />
                                <br />
                                Berk</div>
                              <div><br />
                                <blockquote>
                                  <div><br />
                                  </div>
                                  <div>Cheers,</div>
                                  <div><br />
                                  </div>
                                  <div>Erik</div>
                                  <div><br />
                                  </div>
                                  <div>
                                    <div>
                                      <blockquote>
                                        <div>On 23 Mar 2016, at 10:44,
                                          Berk Hess &lt;<a href="mailto:hess&#64;kth.se"></a><a href="mailto:hess&#64;kth.se">hess&#64;kth.se</a>&gt;


                                          wrote:</div>
                                        <br />
                                        <div>
                                          <div>
                                            <div>On 2016-03-23 10:42,
                                              Mark Abraham wrote:<br />
                                            </div>
                                            <blockquote>
                                              <div dir="ltr">Hi,<br />
                                                <br />
                                                <div class="elided-text">
                                                  <div dir="ltr">On Wed,
                                                    Mar 23, 2016 at 9:44
                                                    AM Berk Hess &lt;<a href="mailto:hess&#64;kth.se"></a><a href="mailto:hess&#64;kth.se">hess&#64;kth.se</a>&gt;
                                                    wrote:<br />
                                                  </div>
                                                  <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br />
                                                    <br />
                                                    Luckily Szilard does
                                                    thorough testing and
                                                    noticed a
                                                    performance<br />
                                                    degradation in
                                                    change set 25 of <a href="https://gerrit.gromacs.org/#/c/5232/"></a><a href="https://gerrit.gromacs.org/#/c/5232/">https://gerrit.gromacs.org/#/c/5232/</a>
                                                    The<br />
                                                    only signifcant
                                                    change with respect
                                                    to previous sets is
                                                    replacing C<br />
                                                    pointers by
                                                    std::vector. I
                                                    traced the
                                                    performance
                                                    difference back to a<br />
                                                    single loop, which
                                                    must have become
                                                    several factors
                                                    slower to explain<br />
                                                    the time difference.
                                                    I get the
                                                    performance back
                                                    when replacing the<br />
                                                    vector by a pointer
                                                    extracted with
                                                    .data(), see below.
                                                    I looked at the<br />
                                                    assembly code from
                                                    gcc 5.3.1 and the
                                                    vector case
                                                    generated 200 extra<br />
                                                    instructions, which
                                                    makes it difficult
                                                    to see what the
                                                    actual difference<br />
                                                    is. The pointer case
                                                    uses a lot of vmovss
                                                    and vaddss, which
                                                    the vector<br />
                                                    one does much less,
                                                    but this is only
                                                    serial SIMD
                                                    instruction. I
                                                    thought<br />
                                                    that [] in vector
                                                    might does bounds
                                                    checking,</blockquote>
                                                  <div><br />
                                                  </div>
                                                  <div>Definitely not in
                                                    release builds.</div>
                                                  <div> </div>
                                                  <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                    but it seems it does
                                                    not.<br />
                                                    Can anyone explain
                                                    why the vector case
                                                    can be so slow?</blockquote>
                                                  <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                    If this is a general
                                                    issue (with RVec or
                                                    more?), we need to
                                                    always extra<br />
                                                    a pointer with
                                                    .data() for use in
                                                    all inner-loops.
                                                    This is pretty<br />
                                                    annoying and
                                                    difficult to
                                                    enforce.<br />
                                                    <br />
                                                    Cheers,<br />
                                                    <br />
                                                    Berk<br />
                                                    <br />
                                                                       
                                                        const
                                                    std::vector&lt;RVec&gt;
                                                    f_foreign &#61;<br />
idt_foreign-&gt;force<br />
                                                  </blockquote>
                                                  <div><br />
                                                  </div>
                                                  <div>This does a copy
                                                    of the vector, and
                                                    doesn&#39;t seem to be
                                                    in any version of
                                                    this patch in
                                                    gerrit. Is this what
                                                    you meant to write?</div>
                                                </div>
                                              </div>
                                            </blockquote>
                                            I tried this. But my
                                            original &#34;vectorized&#34; patch
                                            set took a pointer from
                                            idt_foreign and did not copy
                                            the vector, that gives the
                                            same, slow, performance.<br />
                                            <br />
                                            Berk<br />
                                            <blockquote>
                                              <div dir="ltr">
                                                <div class="elided-text">
                                                  <div><br />
                                                  </div>
                                                  <div>Mark</div>
                                                  <div><br />
                                                  </div>
                                                  <blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                    or<br />
                                                                       
                                                        const RVec     
                                                             *f_foreign 
                                                     &#61;<br />
idt_foreign-&gt;force.data();<br />
                                                    <br />
                                                                       
                                                         int natom &#61;
                                                    atomList-&gt;atom.size();<br />
                                                                       
                                                         for (int i &#61; 0;
                                                    i &lt; natom; i&#43;&#43;)<br />
                                                                       
                                                         {<!-- --><br />
                                                                       
                                                             int ind &#61;
                                                    atomList-&gt;atom[i];<br />
                                                                       
                                                           
                                                     rvec_inc(f[ind],
                                                    f_foreign[ind]);<br />
                                                                       
                                                         }<br />
                                                    --<br />
                                                    Gromacs Developers
                                                    mailing list<br />
                                                    <br />
                                                    * Please search the
                                                    archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List"></a><a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a>
                                                    before posting!<br />
                                                    <br />
                                                    * Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists"></a><a href="http://www.gromacs.org/Support/Mailing_Lists">http://www.gromacs.org/Support/Mailing_Lists</a><br />
                                                    <br />
                                                    * For (un)subscribe
                                                    requests visit<br />
                                                    <a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers"></a><a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a>
                                                    or send a mail to <a href="mailto:gmx-developers-request&#64;gromacs.org"></a><a href="mailto:gmx-developers-request&#64;gromacs.org">gmx-developers-request&#64;gromacs.org</a>.<br />
                                                  </blockquote>
                                                </div>
                                              </div>
                                              <br />
                                              <fieldset></fieldset>
                                              <br />
                                            </blockquote>
                                            <br />
                                          </div>
                                          -- <br />
                                          Gromacs Developers mailing
                                          list<br />
                                          <br />
                                          * Please search the archive at
                                          <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a>
                                          before posting!<br />
                                          <br />
                                          * Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists"></a><a href="http://www.gromacs.org/Support/Mailing_Lists">http://www.gromacs.org/Support/Mailing_Lists</a><br />
                                          <br />
                                          * For (un)subscribe requests
                                          visit<br />
                                          <a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a>
                                          or send a mail to <a href="mailto:gmx-developers-request&#64;gromacs.org"></a><a href="mailto:gmx-developers-request&#64;gromacs.org">gmx-developers-request&#64;gromacs.org</a>.</div>
                                      </blockquote>
                                    </div>
                                    <br />
                                  </div>
                                  <br />
                                  <fieldset></fieldset>
                                  <br />
                                </blockquote>
                                <br />
                              </div>
                              --<br />
                              Gromacs Developers mailing list<br />
                              <br />
                              * Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List"></a><a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a>
                              before posting!<br />
                              <br />
                              * Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists"></a><a href="http://www.gromacs.org/Support/Mailing_Lists">http://www.gromacs.org/Support/Mailing_Lists</a><br />
                              <br />
                              * For (un)subscribe requests visit<br />
                              <a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a>
                              or send a mail to <a href="mailto:gmx-developers-request&#64;gromacs.org"></a><a href="mailto:gmx-developers-request&#64;gromacs.org">gmx-developers-request&#64;gromacs.org</a>.</blockquote>
                          </div>
                          <br />
                          <fieldset></fieldset>
                          <br />
                        </blockquote>
                        <br />
                      </div>
                      --<br />
                      Gromacs Developers mailing list<br />
                      <br />
                      * Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List"></a><a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a>
                      before posting!<br />
                      <br />
                      * Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists">http://www.gromacs.org/Support/Mailing_Lists</a><br />
                      <br />
                      * For (un)subscribe requests visit<br />
                      <a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a>
                      or send a mail to <a href="mailto:gmx-developers-request&#64;gromacs.org">gmx-developers-request&#64;gromacs.org</a>.</blockquote>
                  </div>
                  <br />
                  <fieldset></fieldset>
                  <br />
                </blockquote>
                <br />
              </div>
              --<br />
              Gromacs Developers mailing list<br />
              <br />
              * Please search the archive at <a href="http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List">http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List</a>
              before posting!<br />
              <br />
              * Can&#39;t post? Read <a href="http://www.gromacs.org/Support/Mailing_Lists">http://www.gromacs.org/Support/Mailing_Lists</a><br />
              <br />
              * For (un)subscribe requests visit<br />
              <a href="https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers">https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers</a>
              or send a mail to <a href="mailto:gmx-developers-request&#64;gromacs.org">gmx-developers-request&#64;gromacs.org</a>.</blockquote>
          </div>
        </div>
        <br />
        <fieldset></fieldset>
        <br />
      </blockquote>
      <br />
      <br />
      <fieldset></fieldset>
      <br />
    </blockquote>
    <br />
  </div>

</blockquote></div>