Help regarding project

Sebastian Sauer mail at dipe.org
Thu Dec 22 16:34:38 GMT 2011


On 12/22/2011 09:44 AM, Panks wrote:
>
>
>
>     Very great. Lot of thanks for sharing your progress. For poppler
>     you may like to have a look at
>     http://people.freedesktop.org/~aacid/docs/qt4/
>     <http://people.freedesktop.org/%7Eaacid/docs/qt4/> and for
>     implementations using it
>     http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html (
>     http://quickgit.kde.org/index.php?p=okular.git&a=summary
>     <http://quickgit.kde.org/index.php?p=okular.git&a=summary> ).
>
>     For the initial skeleton what means the very first code to start a
>     PDF-importer with I could provide some helping hands to get it
>     done. We could start with creating a branch in our git and add a
>     calligra/filters/words/pdfimport directory and then copy over the
>     Ascii-filter + rename + adapt the CMakeLists.txt + link against
>     libpoppler and create the first lines of code that use libpoppler
>     to have a look first code that extracts content from a PDF and
>     writes it into a ODT. You can ping me at IRC or write a mail to
>     get started on this :-)
>
>
> Hello Sebastian,
>
> I did little bit of modification in code on my system, I created a new 
> direcory pdfimport inside calligra/filters/words.I copied import 
> files, cmakefile and .desktop file from ascii directory and renamed 
> them to pdfimport.
> this is my CMakeList.txt - http://paste.kde.org/176486/
> and this is word_pdf_import.desktop file - http://paste.kde.org/176498/
> I added the line
> > add_subdirectory( pdfimport )
> in CMakeList.txt in calligra/filters/words directory. I tried building 
> the code after this without doing much modification to pdfimport.cpp 
> and pdfimport.h (the code in them was same as asciiimport.cpp amd 
> asciiimport.h). Build was successful but I didn't see any change in 
> filter after launching calligraword, I mean the 'Open Document' window 
> still wasn't showing the pdf documents neither there was any entry as 
> pdf in drop down list of filter. So, What all changes do I need to do 
> and in which all file to at least make pdf file visible in 'Open 
> Document' dialog and make it accept it?
>

Looks all correct. Did you do a "kbuildsycoca4" so the new desktop-file 
is proper picked-up?

Back then it was also needed to define in the PdfImport.cpp the proper 
libname. So something like;

K_PLUGIN_FACTORY(PdfImportFactory, registerPlugin<PdfImport>();)
K_EXPORT_PLUGIN(PdfImportFactory("wordspdfimport", "calligrafilters"))

Not sure if that is needed any longer but it certainly cannot harm.

> and, second thing, I was going through the code of asciiimport.cpp, in 
> that code the input file has been passed to a QTextStream object and 
> appropriate codec is set to the object.
>     QTextStream stream(&in);
>     stream.setCodec(codec);
>
> and after that using a QString the lines are being appended to the 
> document-
>
>     QString line = stream.readLine();.
>     bodyWriter->addTextSpan(line);
>
>
> whereas using poppler there is no such straing forward option to get 
> the text line by line, I think.

Correct. Text-files are simple compared to PDF-files. The later can have 
formatings (bold, italic, underline, different font-sizes, font-color, 
etc. pp) and even images. Our target would be to take all that over. But 
step by step. We can start with simple things like the pure text and 
some basic formatings and later go on to e.g. images.
> One method I could think of was to go to each pdf page one by one and use
> QString text(const QRectF &rect, TextLayout)
> function to get the text within a rectangle into a QString, but in 
> this case what value of rect should I pass to the function and apart 
> from this what other method I can use to fetch the text out of pdf 
> using poppler? Please give some suggestion.
>

It looks as poppler Qt is not enough for us to to anything more put 
extracting the pure plain-text :-(

What we ideally like to have is something like 
http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h 
. So an own OutputDev that does compared to the ArthurOutputDev not 
render by drawing it using a QPainter but by producing proper ODF out of it.

poppler ships with 
http://cgit.freedesktop.org/poppler/poppler/tree/utils/ which is a nice 
show-case how to output to a HTML file. I guess that's a good starting 
point. We could first investigate what would be needed to create our own 
OdtOutputDevice and then just create it :-)

May I suggest to commit early and often. Means it would really rock if 
you can create a branch for out work and commit what you have so far 
(doesn't need to compile or work) with something like;

# create branch
git checkout master -b filter-words-pdfimport-panks
# add your new filter
git add filters/words/pdfimport
#commit everything
git commit -a
# and push the branch upstream
git push

Hope the above steps work. git is rather tricky sometimes if not all 
times :-/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20111222/8095f663/attachment.htm>


More information about the calligra-devel mailing list