Help regarding project

Panks pankajxdx at gmail.com
Mon Dec 26 06:42:26 GMT 2011


On Thu, Dec 22, 2011 at 10:04 PM, Sebastian Sauer <mail at dipe.org> wrote:

> **
> On 12/22/2011 09:44 AM, Panks wrote:
>
>
>
>  Very great. Lot of thanks for sharing your progress. For poppler you may
> like to have a look at http://people.freedesktop.org/~aacid/docs/qt4/ and
> for implementations using it
> http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html (
> http://quickgit.kde.org/index.php?p=okular.git&a=summary ).
>
> For the initial skeleton what means the very first code to start a
> PDF-importer with I could provide some helping hands to get it done. We
> could start with creating a branch in our git and add a
> calligra/filters/words/pdfimport directory and then copy over the
> Ascii-filter + rename + adapt the CMakeLists.txt + link against libpoppler
> and create the first lines of code that use libpoppler to have a look first
> code that extracts content from a PDF and writes it into a ODT. You can
> ping me at IRC or write a mail to get started on this :-)
>
>
> Hello Sebastian,
>
>  I did little bit of modification in code on my system, I created a new
> direcory pdfimport inside calligra/filters/words.I copied import files,
> cmakefile and .desktop file from ascii directory and renamed them to
> pdfimport.
>  this is my CMakeList.txt - http://paste.kde.org/176486/
>  and this is word_pdf_import.desktop file - http://paste.kde.org/176498/
> I added the line
> > add_subdirectory( pdfimport )
> in CMakeList.txt in calligra/filters/words directory. I tried building
> the code after this without doing much modification to pdfimport.cpp and
> pdfimport.h (the code in them was same as asciiimport.cpp amd
> asciiimport.h). Build was successful but I didn't see any change in filter
> after launching calligraword, I mean the 'Open Document' window still
> wasn't showing the pdf documents neither there was any entry as pdf in drop
> down list of filter. So, What all changes do I need to do and in which all
> file to at least make pdf file visible in 'Open Document' dialog and make
> it accept it?
>
>
> Looks all correct. Did you do a "kbuildsycoca4" so the new desktop-file is
> proper picked-up?
>
> Back then it was also needed to define in the PdfImport.cpp the proper
> libname. So something like;
>
> K_PLUGIN_FACTORY(PdfImportFactory, registerPlugin<PdfImport>();)
> K_EXPORT_PLUGIN(PdfImportFactory("wordspdfimport", "calligrafilters"))
>
> Not sure if that is needed any longer but it certainly cannot harm.
>
>    and, second thing, I was going through the code of asciiimport.cpp, in
> that code  the input file has been passed to a QTextStream object and
> appropriate codec is set to the object.
>      QTextStream stream(&in);
>     stream.setCodec(codec);
>
>  and after that using a QString the lines are being appended to the
> document-
>
>  QString line = stream.readLine();.
> bodyWriter->addTextSpan(line);
>
>
>  whereas using poppler there is no such straing forward option to get the
> text line by line, I think.
>
>
> Correct. Text-files are simple compared to PDF-files. The later can have
> formatings (bold, italic, underline, different font-sizes, font-color, etc.
> pp) and even images. Our target would be to take all that over. But step by
> step. We can start with simple things like the pure text and some basic
> formatings and later go on to e.g. images.
>
>   One method I could think of was to go to each pdf page one by one and
> use
> QString text(const QRectF &rect, TextLayout)
>  function to get the text within a rectangle into a QString, but in this
> case what value of rect should I pass to the function and apart from this
> what other method I can use to fetch the text out of pdf using poppler?
> Please give some suggestion.
>
>
> It looks as poppler Qt is not enough for us to to anything more put
> extracting the pure plain-text :-(
>
> What we ideally like to have is something like
> http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h. So an own OutputDev that does compared to the ArthurOutputDev not render
> by drawing it using a QPainter but by producing proper ODF out of it.
>
> poppler ships with http://cgit.freedesktop.org/poppler/poppler/tree/utils/which is a nice show-case how to output to a HTML file. I guess that's a
> good starting point. We could first investigate what would be needed to
> create our own OdtOutputDevice and then just create it :-)
>
> May I suggest to commit early and often. Means it would really rock if you
> can create a branch for out work and commit what you have so far (doesn't
> need to compile or work) with something like;
>
> # create branch
> git checkout master -b filter-words-pdfimport-panks
> # add your new filter
> git add filters/words/pdfimport
> #commit everything
> git commit -a
> # and push the branch upstream
> git push
>
> Hope the above steps work. git is rather tricky sometimes if not all times
> :-/
>
>

Hello Sebastian :-)

Sorry for late reply, College reopening next week so have few assignments
to deal with in this week.
Anyway, I made that skeleton work, now it is showing pdf files in 'Open
Document' window and pushed it to kde git too.
I went through that Outputdev file once roughly. Can you please give me
some hint on what should I hit upon/do next?

Thank you,

*

*
Pankaj
UG Student *|* Dept. of Computer Science and Engineering
IIT Madras, Chennai, India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20111226/3f216c40/attachment.htm>


More information about the calligra-devel mailing list