Change to tarball generation?
Michael Pyne
mpyne at kde.org
Thu May 24 00:10:17 UTC 2012
On Wednesday, May 23, 2012 19:40:52 Allen Winter wrote:
> This whole thread is confusing me.
>
> Maybe a command line would help?
>
> Is this correct?
> % tar cvf kdefoo-x.y.z.tar <files>
> % xz kdefoo-xy.z.tar
> => resulting in kdefoo-x.y.z.tar.xz
That's fine.
> if not, please tell us what a command line should be
>
> I take it from mpyne's original posting that:
> % tar Jcvf kdefoo-x.y.z.tar.xz <files>
> isn't the way to go??
That's actually fine too, as it turns out.
As an example, try:
$ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/
$ pixz kdefoo-x.y.z.tar
# resulting in kdefoo-x.y.z.tar.xz
Because pixz is parallelized it works on whole blocks of data at a time and as
far as I can tell makes no special provision for the last bits of compressed
data being smaller than the block size.
With a normal tar file the decompressed data you get is:
0--------------------------------* (where * is end of data and end of file)
With a pixz-encoded tar file the decompressed data you get is:
0--------------------------------*x$ (* is end of data, $ is end of file)
When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will
still work fine: tar knows exactly where the data should really end and will
stop decompressing when it needs to.
When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -"
though, there's no way to tell xz to stop decompressing early. It tries to
write all the decompressed data to the pipe. tar still knows exactly where to
stop, and does so at the '*', not the '$', and closes its input (a pipe!)
early.
When xz tries to write the 'x$' (garble data) of the decompressed output it
gets sent to a now-broken pipe, which kills xz on SIGPIPE.
Scripts trying to drive automated extraction of that data using a pipeline
just see that an error occurred, and will therefore abort. This has affected a
couple of distributions that are source-based, but is annoying even for those
manually extracting to have to figure out that their tarball actually
extracted correctly.
So the problem is only parallelizing compressors that take advantage of the
allowance to write garbled data past the end of a file and still have the
decompressor "figure it out". It seems pretty implausible to me that a
parallelizing compressor would always do this, perhaps this only occurs when
the compressor is run with tar (e.g. tar cJf) instead of as a separate step?
I hope this makes more sense.
Regards,
- Michael Pyne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/release-team/attachments/20120523/9ee24171/attachment.sig>
More information about the release-team
mailing list