<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
On 12/22/2011 09:44 AM, Panks wrote:
<blockquote
cite="mid:CAKi+3nrhRXF6HnDwJex-oaqOv9o1eK-_ZUSttzuZgtk1TyohoQ@mail.gmail.com"
type="cite">
<meta http-equiv="Context-Type" content="text/html;
charset=ISO-8859-1">
<div>
<blockquote>
<div>
<div>
<div><br>
<br>
</div>
</div>
Very great. Lot of thanks for sharing your progress. For
poppler you may like to have a look at <a
moz-do-not-send="true"
href="http://people.freedesktop.org/%7Eaacid/docs/qt4/">http://people.freedesktop.org/~aacid/docs/qt4/</a>
and for implementations using it <a moz-do-not-send="true"
href="http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html">http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html</a>
( <a moz-do-not-send="true"
href="http://quickgit.kde.org/index.php?p=okular.git&a=summary">http://quickgit.kde.org/index.php?p=okular.git&a=summary</a>
).<br>
<br>
For the initial skeleton what means the very first code to
start a PDF-importer with I could provide some helping hands
to get it done. We could start with creating a branch in our
git and add a calligra/filters/words/pdfimport directory and
then copy over the Ascii-filter + rename + adapt the
CMakeLists.txt + link against libpoppler and create the
first lines of code that use libpoppler to have a look first
code that extracts content from a PDF and writes it into a
ODT. You can ping me at IRC or write a mail to get started
on this :-)<br>
<br>
</div>
</blockquote>
</div>
<br>
<div>Hello Sebastian,
<div><br>
</div>
<div>I did little bit of modification in code on my system, I
created a new direcory pdfimport inside <span>calligra/filters/words.I
copied import files, cmakefile and .desktop file from ascii
directory and renamed them to pdfimport.</span></div>
<div> this is my CMakeList.txt - <a moz-do-not-send="true"
href="http://paste.kde.org/176486/">http://paste.kde.org/176486/</a>
</div>
<div> and this is word_pdf_import.desktop file - <a
moz-do-not-send="true" href="http://paste.kde.org/176498/">http://paste.kde.org/176498/</a>
</div>
<div>I added the line </div>
<div>> add_subdirectory( pdfimport ) </div>
<div>in CMakeList.txt in <span>calligra/filters/words
directory. </span><span>I tried building the code after this
without doing much modification to pdfimport.cpp and
pdfimport.h (the code in them was same as asciiimport.cpp
amd asciiimport.h). Build was successful but I didn't see
any change in filter after launching calligraword, I mean
the 'Open Document' window still wasn't showing the pdf
documents neither there was any entry as pdf in drop
down list of filter. So, What all changes do I need to do
and in which all file to at least make pdf file visible in
'Open Document' dialog and make it accept it?</span></div>
<div> <br>
</div>
</div>
</blockquote>
<br>
Looks all correct. Did you do a "kbuildsycoca4" so the new
desktop-file is proper picked-up?<br>
<br>
Back then it was also needed to define in the PdfImport.cpp the
proper libname. So something like;<br>
<br>
K_PLUGIN_FACTORY(PdfImportFactory,
registerPlugin<PdfImport>();)<br>
K_EXPORT_PLUGIN(PdfImportFactory("wordspdfimport",
"calligrafilters"))<br>
<br>
Not sure if that is needed any longer but it certainly cannot harm.<br>
<br>
<blockquote
cite="mid:CAKi+3nrhRXF6HnDwJex-oaqOv9o1eK-_ZUSttzuZgtk1TyohoQ@mail.gmail.com"
type="cite">
<div>
<div> </div>
<div> and, second thing, I was going through the code of
asciiimport.cpp, in that code <span>the input file has been
passed to a QTextStream object and appropriate codec is set
to the object.</span></div>
<div>
<div> QTextStream stream(&in);</div>
<div> stream.setCodec(codec);</div>
<div><br>
</div>
<div>and after that using a QString the lines are being
appended to the document-</div>
</div>
<blockquote>
<div>
<div>QString line = stream.readLine();.</div>
<div>bodyWriter->addTextSpan(line);</div>
</div>
</blockquote>
<div><br>
</div>
whereas using poppler there is no such straing forward option to
get the text line by line, I think. </div>
</blockquote>
<br>
Correct. Text-files are simple compared to PDF-files. The later can
have formatings (bold, italic, underline, different font-sizes,
font-color, etc. pp) and even images. Our target would be to take
all that over. But step by step. We can start with simple things
like the pure text and some basic formatings and later go on to e.g.
images.<br>
<blockquote
cite="mid:CAKi+3nrhRXF6HnDwJex-oaqOv9o1eK-_ZUSttzuZgtk1TyohoQ@mail.gmail.com"
type="cite">
<div>
<div> One method I could think of was to go to each pdf page one
by one and use
<div> QString text(const QRectF &rect, TextLayout) </div>
<div> function to get the text within a rectangle into a
QString, but in this case what value of rect should I pass
to the function and apart from this what other method I can
use to fetch the text out of pdf using poppler? Please give
some suggestion.
<div> <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
It looks as poppler Qt is not enough for us to to anything more put
extracting the pure plain-text :-(<br>
<br>
What we ideally like to have is something like
<a class="moz-txt-link-freetext" href="http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h">http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h</a>
. So an own OutputDev that does compared to the ArthurOutputDev not
render by drawing it using a QPainter but by producing proper ODF
out of it.<br>
<br>
poppler ships with
<a class="moz-txt-link-freetext" href="http://cgit.freedesktop.org/poppler/poppler/tree/utils/">http://cgit.freedesktop.org/poppler/poppler/tree/utils/</a> which is a
nice show-case how to output to a HTML file. I guess that's a good
starting point. We could first investigate what would be needed to
create our own OdtOutputDevice and then just create it :-)<br>
<br>
May I suggest to commit early and often. Means it would really rock
if you can create a branch for out work and commit what you have so
far (doesn't need to compile or work) with something like;<br>
<br>
# create branch<br>
git checkout master -b filter-words-pdfimport-panks<br>
# add your new filter<br>
git add filters/words/pdfimport<br>
#commit everything<br>
git commit -a<br>
# and push the branch upstream<br>
git push<br>
<br>
Hope the above steps work. git is rather tricky sometimes if not all
times :-/<br>
<br>
</body>
</html>