Could you help me in parsing of .DOC files

Yuriy Kardapolov clotofdarkness at hotmail.com
Thu Apr 14 11:55:28 BST 2011


Hello,


Project background:

I need to read .doc files in asp.net. It's needed for our project (converter).
I have downloaded documentation from Microsoft about msword file format.
But the instruction is very tangled and contains just description of different msword structures.
I can read compound file format (OLE2) and get any stream from it such as "WordDocument" "Table1" "Table0" etc.
I can get text from "WordDocument" stream. As I know there is all text of whole documents.
Also I have download wvWare 2 but can't compile it.
What I want is to know how parse the .DOC files and get text formatting such as font name,color,size,boldness etc.

My question:
Could you advise me how to read text formatting? What structures should I read for that in my .NET project?

Your any advice and suggestion will be very helpful for me!

Thank you in advance,



Kardapolov Yuriy





>I'm not really working on wvWare anymore. Actually, the code has been copied 

>into the Calligra office suite (http://www.calligra-suite.org) repository, and 
>that is where people are really working on the filter. It might be better to 
>contact them with your questions.

-Benjamin

On Thursday 14 April 2011 04:23:26 you wrote:
> Hello Benjamin Cail,
>  
> My questions:
> 
> 1) Is it possible to compile wvWare 2 in MS Visual C++ 6? Is there other
> alternative to compile wvWare 2 in Windows where I will be able to debug
> and trace the code?
> 
> 2) Could you advise me how to read text formatting? What structures should
> I read for that?
> 
> Your any advice and suggestion will be very helpful for me!
> 
> Thank you in advance,
 
> Kardapolov Yuriy
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20110414/a8a55b5c/attachment.htm>


More information about the calligra-devel mailing list