Hi,<br><br><div class="gmail_quote">On Tue, Apr 17, 2012 at 9:48 AM, Erik Lindahl <span dir="ltr">&lt;<a href="mailto:erik@kth.se" target="_blank">erik@kth.se</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><br>

On Apr 17, 2012, at 3:18 PM, Roland Schulz wrote:<br>

<br>

&gt; E.g. for parallelization the issue is very similar as it is for portability. Supporting domain decomposition makes it more difficult for everyone and everyone has to make sure that they don&#39;t brake it. And it is only included because it essential to Gromacs and used by almost everyone.<br>


<br>

</div>Right - and that&#39;s of course something we don&#39;t want to push down just on the few people working with parallelization :-) We don&#39;t have automated tests for it yet, but when we have more functional tests the idea is that we should automatically reject patches that break parallel runs!<br>


</blockquote><div>Yes. But we only do it for parallelization because the majority (in this case probably everyone) agrees that this is important. We wouldn&#39;t accept a feature which would be as time consuming for every developer as parallelization is, but only useful for a small minority. :-)</div>


<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I simply don&#39;t buy the argument that just because these 1132 lines are not perfect (they obviously aren&#39;t) portability doesn&#39;t matter at all and we might as well include 10 megabytes of additional source code where we have no control of the portability.<br>


</blockquote><div>I didn&#39;t say portability isn&#39;t important at all. All I&#39;m saying is that portability shouldn&#39;t be treated as a Boolean. In practice portability is, as any other metric, a scale. And the decision to support 99.9% of platforms instead 99.5% should be a matter of cost benefit analysis as is adding a new feature.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>

&gt; But I think that &quot;fancy&quot; IO is also an optional feature. I agree that it is a very important feature and it has many disadvantages if the same format is not used everywhere. But it is also non-essential. And at that point it should become a matter of cost-benefit and not a matter of principal. I.e. how many people benefit from features made possible by HDF5 (e.g. because limited developer time wouldn&#39;t allow them without HDF5) versus how much of a pain is it to the few people how have to live with XTC (and conversion). And one very important factor in that cost-benefit analysis is the ratio of users.<br>


<br>

</div>But now you are moving the goal-posts!  The aim of the present TNG-based project was NOT &quot;fancy&quot; IO, but a new default simple portable Gromacs trajectory format that (1) includes headers for atom names and stuff, (2) is a small free library that can easily be contributed to other codes so they can read/write our files, and (3) enable better compression.<br>


</blockquote><div>What I meant with &quot;fancy&quot; IO was that it is optional. These 3 things aren&#39;t required to run a simulation on an exotic platform (e.g. Kei) and to be able to analysis the results (after potentially converting).</div>


<div>

 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It would of course be nice if this format also allowed efficient parallel IO and advanced slicing, but that has never been the primary goal of the file format project, in particular not if it starts to come in conflict with the aims above.<br>


</blockquote><div>As a said before, parallel IO isn&#39;t the issue. (Simple) parallel writing is easier without HDF5. Parallel reading (for analysis) is possible as long as the format is seekable (can be easily added even to XTC by creating a 2nd file with the index). </div>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Having said that, we just discussed things here in the lab, and one alternative could be to have a simple built-in HDF5 implementation that can write correct headers for 1-3 dimensional arrays so our normal files are HDF5-compliant when written on a single node. This should be possible to do in ~100k of source code. If there is no external HDF5 library present, this will be the only alternative supported, and you will not be able to use e.g. parallel IO - but the file format will work.<br>


</blockquote><div><br></div><div>Option 1) Up to 100k lines we have to write and support. And the code can only use the subset of HDF5 supported. </div><div>Option 2) Users on very exotic platforms have to keep using XTC and in post-production convert their files (only if they want to benefit of HDF5 advantages in analysis)</div>


<div><br></div><div>I really don&#39;t see how Option 1 could win in any reasonable cost benefit analysis. :-)</div><div><br></div><div>BTW: All of HDF5 is 135k lines (according to sloccount, exluding C++, HL or Fortran binding). And HDF5 has all OS depending functions (IO, threads, ..) abstracted. Thus only a small part (18 files, total 9300 lines - this includes the respective headers and the abstraction layer itself) have any #ifdef for windows. Thus only those files would need to be touched to add support for a non POSIX, WINDOWS, or VMS OS. It is even possible to write an own low level file layer (<a href="http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html" target="_blank">http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html</a>) which could be based on futil.c to have our own OS abstraction.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The caveat is what happens to the physical file format when HDF5 writes parallel IO? Will this result in a file with different properties that is difficult for us to read with a naive implementation? </blockquote><div>No problem. HDF5 parallel IO doesn&#39;t produce different formats. It writes in standard chunks (which would need to be supported anyhow for block compression and fast seek).</div>


<div><br></div><div>Roland</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>ORNL/UT Center for Molecular Biophysics <a href="http://cmb.ornl.gov" target="_blank">cmb.ornl.gov</a><br><a href="tel:865-241-1537" value="+18652411537" target="_blank">865-241-1537</a>, ORNL PO BOX 2008 MS6309<br>