Supporting the MAFF web archive format, based on ZIP

Paolo Amadini paolo.02.prg at amadzone.org
Mon Mar 22 11:00:20 GMT 2010


On 21/03/2010 23.31, Matthias Grimrath wrote:
> My comments on the MAFF specs:
> 
> Pro:

I agree with everything :-)

> Con:
> IMO the scope of the webarchiver should be rather narrow
> 1) it should only archive webpages. MAFF seems to allow PNG and
>    SVG as well.

I see your point. In fact, I don't know how many people actually use
MAFF in this way. However, what I found out is that when the support
for reading the two mandatory content types for the main document
(HTML and XHTML) was implemented in Mozilla, supporting other built-in
content types (SVG, JPEG and PNG) required no additional lines of code.

> Besides wrapping one PNG file in a ZIP archive does not make much
> sense to me.

You only get the original URL in the metadata (not so useful). For SVG
however you can embed the referenced raster images, which is useful.

> 2) it should only archive one webpage at a time. I am referering to
>    the "extended" conformance level that allows multiple webpages
>    to be archived.

While having multiple pages in the same archive is an unique feature
of MAFF, it seems to be largely appreciated, and because of this it
is part of the base specs. I often use it to save a bunch of related
pages together.

> The primary reason is that it is easy to understand and sticks to the 
> Unix principle: do one job and do it well. Right now the webarchiver
> "freezes" the currently visible webpage and does nothing else. It is 
> an easy and obvious concept.

I agree with this point of view, but I think that the obviousness is
a characteristic of the user interface, rather than the file format,
and derives from current user habits. In a tabbed browser (which I
think most modern browsers are), I find it quite intuitive that I
can save and restore multiple tabs in a file.

Maybe the user interface of Konqueror simply isn't as suitable for such
use as the one of Firefox. As I'm not familiar with it and can't say
whether supporting multiple pages in a file would make sense, for now
I'm fine with Konqueror reading only the first page in an archive (the
first directory name in alphabetical order).

> What this means is I am going to work on supporting a subset of the 
> "basic" conformance level: archived HTML files with metadata.

I think that, for archive generation, Konqueror can stick with any
subset you developers feel comfortable with. For reading, the only
crucial archive types are single-page HTML and XHTML, and further
developments can be based on user feedback.

Besides, if you think that some features can be useful even if you
don't have the time to work on them, I might be able to contribute
to their development in the future.

Thanks for your insightful feedback, it is much appreciated.

Regards,
Paolo




More information about the kfm-devel mailing list