Change to tarball generation?

Michael Pyne mpyne at kde.org
Thu May 24 00:10:17 UTC 2012


On Wednesday, May 23, 2012 19:40:52 Allen Winter wrote:
> This whole thread is confusing me.
> 
> Maybe a command line would help?
> 
> Is this correct?
> % tar cvf kdefoo-x.y.z.tar <files>
> % xz kdefoo-xy.z.tar
> => resulting in kdefoo-x.y.z.tar.xz

That's fine.

> if not, please tell us what a command line should be
> 
> I take it from mpyne's original posting that:
> % tar Jcvf kdefoo-x.y.z.tar.xz <files>
> isn't the way to go??

That's actually fine too, as it turns out.

As an example, try:

$ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/
$ pixz kdefoo-x.y.z.tar
# resulting in kdefoo-x.y.z.tar.xz

Because pixz is parallelized it works on whole blocks of data at a time and as 
far as I can tell makes no special provision for the last bits of compressed 
data being smaller than the block size.

With a normal tar file the decompressed data you get is:

0--------------------------------*  (where * is end of data and end of file)

With a pixz-encoded tar file the decompressed data you get is:

0--------------------------------*x$  (* is end of data, $ is end of file)

When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will 
still work fine: tar knows exactly where the data should really end and will 
stop decompressing when it needs to.

When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -" 
though, there's no way to tell xz to stop decompressing early. It tries to 
write all the decompressed data to the pipe. tar still knows exactly where to 
stop, and does so at the '*', not the '$', and closes its input (a pipe!) 
early.

When xz tries to write the 'x$' (garble data) of the decompressed output it 
gets sent to a now-broken pipe, which kills xz on SIGPIPE.

Scripts trying to drive automated extraction of that data using a pipeline 
just see that an error occurred, and will therefore abort. This has affected a 
couple of distributions that are source-based, but is annoying even for those 
manually extracting to have to figure out that their tarball actually 
extracted correctly.

So the problem is only parallelizing compressors that take advantage of the 
allowance to write garbled data past the end of a file and still have the 
decompressor "figure it out". It seems pretty implausible to me that a 
parallelizing compressor would always do this, perhaps this only occurs when 
the compressor is run with tar (e.g. tar cJf) instead of as a separate step?

I hope this makes more sense.

Regards,
 - Michael Pyne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/release-team/attachments/20120523/9ee24171/attachment.sig>


More information about the release-team mailing list