How to Hack on KDEPrint pre-filters (was: "searchable pdf files")

Kurt Pfeifle k1pfeifle at gmx.net
Wed Mar 22 20:05:41 CET 2006


Sorry, I had to cut my first reply short, because I was supposed to 
go to a meeting (which was now cancelled). So here are a few more 
thoughts...

On Wednesday 22 March 2006 12:08, Johannes Maier wrote:
> Hi at all,
> 
> I'm wondering how to produce searchable pdf files with kprinter. 

I told you already, that in principle you *can* do it, and that a
better success rate is likely with....

 a) ...newer versions of Ghostscript
 b) ...correct settings when using "Print to PDF"

I recommend to play with these settings, and check for the different 
results they give (select the PDF printer; click "Properties"; select 
"Driver Settings" tab):

 * Embed all fonts (enabled|disabled)
 * Never use "Embed subset", always use "Embed complete font"

I also recommend to update to the newest possible Ghostscript version
you can find when it comes to PDF processing; Ghostscript since 8.15
has seen a hell of a lot improvements on this field!


You should be aware of this background info:
--------------------------------------------

kprinter does not do the main work -- its PDF creating "virtual 
printer" is just a GUI fronted to Ghostscript. Ghostscript does the
main work to create the PDF for kprinter. kprinter merely calls 
Ghostscript's "pdfwrite" device (with parameters that you configure
via the GUI I outlined above). "pdfwrite" needs PostScript as input
files to create the PDFs.

Now some PostScript files are just not good enough to make searchable
PDFs from. They may contain those shapes pretending to be a character
(when you look at them on screen or on paper) not as "glyphs" from a 
particular font, but only as bitmaps. And bitmap fonts embedded in PDF
are evil for searchability. 

You should check this by a "Print to PostScript" first; then use a 
different tool (Acrobat Distiller on Windows?) to see if the PostScript
is good enough. Or try to use a utility like "pstotext" to see if it 
is possible to extract text contents from your file.

So, first rule: never use a bitmapped font in an application you use 
to create a new document.


About font embedding:
---------------------

Usually, selecting the embedding of fonts gives you more fidelity of
the document on different media (how it looks like on screen and how 
it looks like on paper). Because if you do *not* embed the font into
the PostScript, but only a name/reference on the Font's name, the next 
device that has to render or process the file, could happen to not have
the original font at its disposal, and will therefore try to use a
replacement font (which may look *very* different to the original one).

My "play with the 'embed all fonts'-settings" advice given above is 
fuzzy for that very reason: because it *could* be the case that by 
forcing the font embedding you will in fact force a bitmapped font to 
be embedded (if you had choosen such a type in your document-creating 
applications, say kword), while by not embedding it, the next device
handling the file may choose a replacement font that is not bitmapped,
and maybe searchable....

We all know what a powerful framework and foundation Qt provided over
the years for for all of KDE's achievements. Unfortunately, when it 
comes to printing and fonts, Qt up to version 3.x was less then sub-
optimal. The PostScript generated by Qt is Level 1 only (PS Level 2 
and 3 are not supported).

I hear that this is greatly improved in Qt-4.x, and that this will 
also eventually show in KOffice, KWord and more applications (which to 
the day are struggling with Font fidelity in print and output...).

Anyway, here is how you configure Qt-based applications for font 
embedding:

 1. start "qtconfig"
 2. go to the "Printer" tab
 3. check the "Enable Font embedding" box
 4. enter all directories where your system holds its fonts into 
    the "Font Paths" listing.

You do not need to add the fonts which are known to the X server.
You can find out which font paths the X server (and thusly, Qt) 
already know, by using this command:

  "grep FontPath /etc/X11/xorg.conf /etc/X11/XFree86Config*"

> The printing  
> of pdf files genrally works fine, but they all seem to consist of pictures. 
> So it isn't possible to search for information inside the pdf files for 
> example with acroread. How can I change this?

One of my intentions since quite some time is to try and come up with
an improved "PDF Printer", that takes advantage of the newer Ghostscript
capabilities, and at the same time write a detailed HOWTO/Tutorial that
can serve as an inspiration to create more such virtual printers.

This is not too difficult, in principle. Thanks to Michael Goffioul's
original work on the kprinter framework, you are able to add all kinds
of different filters, prefilters and virtual printers to KDEPrint.

However, it looks like I do not have enough time to tackle it any time
soon. So, volunteers to the front!

For a look at the current setup of the PDF Printer, follow these steps:

 1. click "System Options" (2nd button from left, bottom of kprinter 
    dialog)
 2. click "Commands" in left column
 3. select "PostScript to PDF Converter" from dropdown list
 4. click on the "Edit command" icon
 5. click on the "Edit command" button

You've now arrived at a GUI interface that would let you customize the 
PDF Printer (simplify it, or add new features supported by Ghostscript).

However, my recommendation is to leave this as it is (it works :-) and
just study how it works. Then go and create a new "virtual printer" 
based on that. (You only have to click on the "New command" icon 
instead of the "Edit command" one in step "4." above...)

What you see now is the GUI representation of an XML file named 
"ps2pdf.xml". That file is located in directory

  "$(kde-config --prefix)/share/apps/kdeprint/filters/"

alongside its companion .desktop file. You'll also see all the other
prefilters of KDEPrint at that place.

So if you want to hack on the PDF Printer, all you do is hack on these 
files. If you prefer a text editor (instead of the above explained GUI),
feel free :-)

You could just create copies of these 2 (.xml + .desktop) files, under 
a new name inside your $HOME/.kde/share/apps/kdeprint/filter/ directory
and hack away.

> Thanks in advance,
> Johannes

Cheers,
Kurt


More information about the kde-print mailing list