[Okular-devel] md5 hash for annotation file name

Thu Sep 11 00:05:18 CEST 2008

Am Mittwoch, 10. September 2008 schrieb Albert Astals Cid:
> A Dimecres 10 Setembre 2008, Markus Grabner va escriure:
> > 	Hi!
> >
> >     It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614) to
> > use a hash function to determine the name of the annotation file created
> > by okular. The attached patch implements this behaviour (thanks Ivo for
> > pointing me to QCryptographicHash - I looked for such a thing but somehow
> > missed it).
> >
> > It works nicely in several ways:
> > *) Annotations keep associated with the file after renaming it.
> > *) It also works for non-local URLs (http://...) since we don't need to
> > care for mapping the URL to some valid file name.
> > *) Annotations keep associated with the file after downloading it from
> > the web and opening a local copy (possibly under a different name).
>
> It works not nicely in several ways:
>  *) Md5 sucks, use Sha1
I don't see any serious security threat by using a weak hash function at this 
point. All an attacker could do would be to create a modified file for which 
the same annotations would be displayed as for the file the annotations were 
initially created for.
I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and 
Sha1, so these are natural candidates.

>  *) Reading the whole file sucks, i don't want the 100MB of my pdf file to
> be piped though a hash, it't probably take *some* time
Just tried it on my ancient AMD64 2GHz machine and found the following 
computing times for a 500MB file:
MD4: 1.3 seconds
MD5: 2 seconds
SHA1: 4 seconds
Loading the file from a local hard disk takes considerably longer, so I'm not 
very much concerned about the hash computation time. However, the "readAll()" 
definitely has to be replaced by reading smaller chunks and processing them 
sequentially, that was just for the "proof of concept".

> so reading up to 1MB as much would be much better imho.
If an annotation refers to a typo on the last page of a huge document, and 
this gets fixed, the same annotation would still be displayed for the 
corrected file if the correction appears after the portion of the file for 
which the hash value is computed (at least for uncompressed formats such as 
PostScript). BTW, the current implementation in okular has the same problem 
since changing a single character in a PostScript file usually doesn't change 
its size.

	Kind regards,
		Markus


-- 
Markus Grabner - Computer Graphics and Vision
Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria
Phone: +43/316/873-5041, Fax: +43/316/873-5050
WWW: http://www.icg.tugraz.at/Members/grabner