[Kst] WG: NetCDF Data Compression (Packing) via signed 16-Bit-Integers (type SHORT INT)
Rix, Patrick
Patrick.Rix at repower.de
Mon Mar 4 15:37:06 UTC 2013
(..the post did not appear in the mailing list, so I re-send it without test data)
Hi Nicolas,
thank you for your reply. I agree with you: NetCDF is a very flexible and transparent format being widely used since many years and thus being proven.
Especially that it can handle (descriptive) meta-data and measurement / simulation data as well in the same file - and not to forget the tons of third-party tools and interfaces to a lot of programming languages.
Links: NetCDF - Conventions / Best Practices
--------------------------------------------
--> see section "Packed Data Values"
<http://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html>
..there you can find the formula I used in my last mail.
<http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html>
Regarding your questions:
The structure of each variable remains unchanged, only the data type will modified, i.e. the number of time steps (samples) NT for a time dependent variable will be kept the same. Also if the original float variable contains a scalar (1D: X(t)) , a vector (2D: V(k,t) k=1,2,3) or fielded data per time step (3D: A(i,j,t)) , the packed variable's data will be identically organized. During the data compression the "header"-section of _each_ packed variable will be enhanced by the two extra (float) attributes 'scale_factor' and 'add_offset', which will be required later for back-transformation into float values.
To pick up your example, if you have two scalar (1D) time series X(t) and Y(t) of type float (4 byte), each having e.g. NT=1000 time steps, these will require 8000 byte in total. After packing to (2 byte) short ints both variables still contain NT time steps but only require 4000 byte in total plus 16 bytes for the 4 floats of two sets of 'scale_factor' & 'add_offset'.
To access / inquire a variable's attributes a set of 'nc_get_att_???' functions is available in the C-interface version (see:
<https://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/nc_005fget_005fatt_005f-type.html>) or one can use the NcAtt class of the C++ interface version.
I agree that the implementation should not be that difficult, though up to now I did not succeed in compiling KST-plot. But if you could point me to the right place in the source code that would be helpful so that I can try to assist.
I attached a ZIP-achive 'test_data.zip' containing 4 files with an EXAMPLE:
test_data_FLOAT.nc binary NetCDF file with floating point values
test_data_SHORT.nc binary NetCDF file with signed short integers
test_data_FLOAT.cdf ascii text file with floating point values
test_data_SHORT.cdf ascii text file with signed short integers
If you compare the *.cdf files, you can see the two extra attributes ('varname:scale_factor', 'varname:add_offset') for each variable and the data type of the variables changing from float to short (except for the time variable).
You can use the tool 'ncgen' to generate binary *.nc files from the *.cdf text files, e.g. ncgen -o test_data_FLOAT_bin.nc test_data_FlOAT.cdf
With the tool 'ncdump' you can generate the *.cdf text file from the binary *.nc files: ncdump test_data_SHORT.nc > test_data_SHORT_txt.cdf
The two tools 'ncgen' and 'ncdump' are usually included in the C/C++ NetCDF-library package. I can send you a Windows executable in case you don't have them.
If you find difficulties to inspect / plot the packed short int binary *.nc file the Java-based tool 'ncBrowse' might help which is a browser for *.nc files with limited plotting function. Linux and Windows binaries can be downloaded from < http://www.epic.noaa.gov/java/ncBrowse/>
BTW:
..packing and unpacking can be done with the tool 'ncpdq' of the NCO-toolbox (NetCDF-Operators page: <http://nco.sourceforge.net/>).
Along with the source code one can also download binaries for Linux or Windows from the NCO-page.
The command for PACKING (data compression float/double --> signed short int) is:
ncpdq.exe --pack_policy all_new --pack_map hgh_sht --fl_fmt=classic --overwrite test_data_FLOAT.nc test_data_SHORTINT.nc
..same but shorter:
ncpdq.exe -P all_new -M hgh_sht -3 -O test_data_FLOAT.nc test_data_SHORTINT.nc
The command for UNPACKING (back-transformation signed short int --> float) is:
ncpdq.exe --unpack test_data_SHORT.nc test_data_FLOATVALS.nc
or ncpdq.exe -U test_data_SHORT.nc test_data_FLOATVALS.nc
Best regards,
Patrick.
-----Ursprüngliche Nachricht-----
Von: Nicolas Brisset [mailto:nicolas.brisset at free.fr]
Gesendet: Sonntag, 3. März 2013 23:02
An: kst at kde.org
Cc: Rix, Patrick
Betreff: Re: [Kst] NetCDF Data Compression (Packing) via signed 16-Bit-Integers (type SHORT INT)
Hi Patrick,
Thanks for your message. I hope you enjoy using Kst with netCDF, which is y pretty nice format.
I have implemented the netCDF datasource so I can tell you: the convention is not taken into account as of now. It should be fairly easy to add, but could you provide a sample file so that I can cross-check the code once implemented? Also, I would need to understand how to retrieve all the variable names. It is not clear to me from your mail whether each 2-byte int variable contains one data series, or if you have e.g. two data-series packed into a 4-byte int?
I think the change should be pretty trivial to implement and if it's a clearly documented convention, we could probably apply it unconditionally, the other option being to add a configuration dialog, as the ASCII source has.
If you know a bit of C, I could point you to the right places in the code so that you try your luck at making the change. You can then provide me a patch that I'll integrate. Otherwise I'll do it, but theses days I'm unfortunately not as responsive as I'd like since I have very challenging deadlines in my work and a busy personal life. However, this change sounds small enough that I should be able to do it in a reasonable time.
Best regards,
Nicolas
----- Mail original -----
> De: "Patrick Rix" <Patrick.Rix at repower.de>
> À: kst at kde.org
> Envoyé: Jeudi 28 Février 2013 11:14:07
> Objet: [Kst] NetCDF Data Compression (Packing) via signed 16-Bit-Integers (type SHORT INT)
>
>
>
>
>
> Dear KST-Users & Developers,
>
>
>
> I'm a newbie to KST-plot and I've got a question concerning the
> interface for reading data in binary NetCDF format.
>
> I found KST when searching for a tool for plotting time series result
> data of MBDyn -multibody-simulations of our wind turbines.
>
> MBDyn can use either ascii text format or binary NetCDF format (i.e.
> float numbers) for its output files.
>
> I found that plotting of scalar (1D) time series of data type float
> (4 bytes) works well in KST, so I was wondering if also support for
> packed / compressed data could be added (if not already implemented)
> to the NetCDF interface as indicated by [Eq.1] below ?
>
>
>
> When simulating wind turbines we usually have to deal with a huge
> amount of data from a lot of simulations arising from parameter
> variations.
>
> Fortunately there is a way to pack the data by using a simple
> transformation into 2-byte integer numbers of data type 'signed
> short int' which results in a small loss of information (i.e.
> precision) but 65.535 different niveaus of the signed short int type
> (ranging from -32.768 to 32.767) are usually sufficient for
> capturing the data range of a simulated float time series while the
> packed data only require half of the disk space than the (4 bytes)
> floating point values.
>
>
>
> I found that this sort of simple packing algorithm seems to be part
> of the "commonly used" NetCDF conventions and being integrated in
> some NetCDF tools:
>
> If a (time dependent, scalar 1D-) variable owns the special
> attributes 'scale_factor' and 'add_offset' (refer to:
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html
> ) these indicate that the variable has been packed to short int
> format.
>
> The original float values 'X(ti)' of a time series at time index 'ti'
> can be retrieved back by applying its 'scale_factor' and
> 'add_offset' attributes (saved in the NetCDF file for each variable)
> to the short integers 'I_short(ti)' :
>
>
>
> [Eq.1] X(ti) = I_short(ti) * scale_factor + add_offset
>
>
>
> The values for 'scale_factor' and 'add_offset' (which are floating
> point numbers) were derived during the packing procedure by scanning
> the float time series 'X(ti)' for its ultimate values 'Xmax' and
> 'Xmin' marking the data range of the time series Xrange = Xmax -
> Xmin
>
>
>
> scale_factor = Xrange / DataTypeResolution_short
>
>
>
> add_offset = Xmin + 0.5 * Xrange
>
>
>
> with 'DataTypeResolution_short' being the resolution of data type
> short int, e.g.
>
>
>
> DataTypeResolution_short = 65.000
>
>
>
> The integer numbers I_short(ti) were calculated by
>
>
>
> I_short(ti) = round( ( X(ti) - add_offset ) / scale_factor )
>
>
>
>
>
>
>
> è Does anybody know if something similar has been already implemented
> into KST-plot
>
> orr
>
> è (if not) what would be the best and easiest way to implement the
> backtransformation according to [Eq.1] ?
>
>
>
> Any help on that will be warmly appreciated.
>
>
>
> With kind regards
>
> Patrick Rix
> Loads & System Simulation
>
>
> REpower Systems SE
> TechCenter
> Albert-Betz-Str. 1
> D - 24783 Osterrönfeld
>
>
> Tel.: +49 - (0) - 4331 - 1313 - 9408
> Fax: +49 - (0) - 4331 - 1313 - 9954
>
>
> e-mail: p.rix at repower.de
> Internet: www.repower.de
>
>
>
>
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
> E-Mail irrtümlich erhalten haben, informieren Sie bitte umgehend den
> Absender und löschen Sie diese E-Mail. Das unerlaubte Kopieren sowie
> die unbefugte Weitergabe der in dieser E-Mail enthaltenen Daten ist
> nicht gestattet. Wie Sie wissen, kann die Sicherheit von
> Übermittlungen per E-Mail nicht gewährleistet werden, E-Mails können
> missbräuchlich unter fremdem Namen erstellt oder verändert werden.
> Aus diesem Grund bitten wir um Verständnis dafür, dass wir zu Ihrem
> und unserem Schutz die rechtliche Verbindlichkeit der vorstehenden
> Erklärungen ausschließen müssen. Diese Regelung gilt nur dann nicht,
> wenn wir mit Ihnen eine anderweitige schriftliche Vereinbarung über
> die Einhaltung von Sicherheits- und Verschlüsselungsstandards
> getroffen haben.
>
> This e-mail contains confidential and/or privileged information. If
> you are not the intended recipient (or have received this e-mail in
> error) please notify the sender immediately and delete this e-mail.
> Any unauthorised copying, disclosure or distribution of the material
> in this e-mail is strictly forbidden. As you know, the security of
> e-mail transmissions can not be guaranteed. E-mails can be misused
> to be written or modified under false names. For that reason, we ask
> you to understand the necessity for us to rule out the legal
> obligation of the above statement, for your protection and ours.
> This regulation is only invalid if we have concluded a special
> written agreement with you about the compliance with security and
> encryption standards.
>
> REpower Systems SE · Sitz: Hamburg · Vorstand: Andreas Nauen
> (Vorsitz), Matthias Schubert, Marcus A. Wassenberg, Vinod R. Tanti ·
> Aufsichtsratsvorsitzender: Tulsi Tanti · Registergericht: AG Hamburg
> (Mitte) HRB Nr.: 118644
>
> _______________________________________________
> Kst mailing list
> Kst at kde.org
> https://mail.kde.org/mailman/listinfo/kst
>
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte umgehend den Absender und löschen Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe der in dieser E-Mail enthaltenen Daten ist nicht gestattet. Wie Sie wissen, kann die Sicherheit von Übermittlungen per E-Mail nicht gewährleistet werden, E-Mails können missbräuchlich unter fremdem Namen erstellt oder verändert werden. Aus diesem Grund bitten wir um Verständnis dafür, dass wir zu Ihrem und unserem Schutz die rechtliche Verbindlichkeit der vorstehenden Erklärungen ausschließen müssen. Diese Regelung gilt nur dann nicht, wenn wir mit Ihnen eine anderweitige schriftliche Vereinbarung über die Einhaltung von Sicherheits- und Verschlüsselungsstandards getroffen haben.
This e-mail contains confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. As you know, the security of e-mail transmissions can not be guaranteed. E-mails can be misused to be written or modified under false names. For that reason, we ask you to understand the necessity for us to rule out the legal obligation of the above statement, for your protection and ours. This regulation is only invalid if we have concluded a special written agreement with you about the compliance with security and encryption standards.
REpower Systems SE · Sitz: Hamburg · Vorstand: Andreas Nauen (Vorsitz), Matthias Schubert, Marcus A. Wassenberg, Vinod R. Tanti · Aufsichtsratsvorsitzender: Tulsi Tanti · Registergericht: AG Hamburg (Mitte) HRB Nr.: 118644
More information about the Kst
mailing list