Could you help me in parsing of .DOC files

Yuriy Kardapolov clotofdarkness at hotmail.com
Thu Apr 14 11:46:11 BST 2011


Hello,


Project background:

I need to read .doc files in asp.net. It's needed for our project (converter).
I have downloaded documentation from Microsoft about msword file format.
But the instruction is very tangled and contains just description of different msword structures.
I can read compound file format (OLE2) and get any stream from it such as "WordDocument" "Table1" "Table0" etc.
I can get text from "WordDocument" stream. As I know there is all text of whole documents.
Also I have download wvWare 2 but can't compile it.
What I want is to know how parse the .DOC files and get text formatting such as font name,color,size,boldness etc.

My question:
Could you advise me how to read text formatting? What structures should I read for that in my .NET project?


Your any advice and suggestion will be very helpful for me!

Thank you in advance,



Kardapolov Yuriy
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20110414/54de9924/attachment.htm>


More information about the calligra-devel mailing list