REad *.doc
Blitz3D Forums/Blitz3D Beginners Area/REad *.doc
| ||
Is it possible to read a word document file ino blitz and get the text contained I know how to open and read files but I wonder how to do it from word! |
| ||
hm.. iirc there's no LoadDoc(file$,flags) command yet :) I guess you have to find out how the word fileformat works and make your own doc-importer. |
| ||
you can find more info about the .doc fileformat at wotsit's format hope this help you :) |
| ||
An alternative is to convert the word doc into a plain textfile (.txt), which removes all the header info a word doc contains ... |
| ||
But then again I wonder why Agamer didn't use notepad orso in the first place.. if he really wants those bold/italic/underlined/font/color things, then converting to .txt doesn't help much here.. |
| ||
RTF is an alternative (and easier to code since it's just tags). |
| ||
rtf is tags? nice :) thats gonna come in handy hehe |
| ||
yeh I am using it in a program I'm writing at the moment it already ses .txt but it is still nrrd to be able to resad .doc |
| ||
Part of the problem of .doc is that Microsoft never fully released all specifications of word files -- which is why 3rd party word processors like Open Office, Word Perfect, etc. all have minor problems with certain documents. It is probably a lot easier to use RTF, which is what Microsoft used with Write/Wordpad, and MS Word 2.0. This is a much easier markup language, and has much more complete documentation. Anyway, for pretty much any file format description this is the place to go: http://www.wotsit.org Hundreds upon hundreds of file format documents can be found there. Great resource. |
| ||
thanks but I can't find the file format for ms word 200 and above |
| ||
Huh, that's a lot of file formats, why didn't anybody tell me about that site before I started cracking various formats? Anyway pretty usefull :) |
| ||
to my knowledge, Microsoft never released the file format for Word 2000/2003. They don't want people to be able to open them with different programs, they want those people to buy word as well. Microsoft simply has too much to lose if other programs like OpenOffice can read/write word documents flawlessly. OpenOffice is free. Would you buy Microsoft office for $$extortion$$ if you could get a completely free, legal alternative that can do the same thing? No. and microsoft knows that too, hence they simply don't release the specifications for their document formats anymore. Any info on Word 2000/2003 you'll find has been obtained by people painstakingly tring to reverse engineer the document format... and still not perfect. bottom line: I don't think that a Word 2000/2003 document viewer in blitz is going to be a realistic expectation... Or in *any* language any time soon, for that matter. |
| ||
Ohh I don't want to be able to view it or retain the font/bold/italic/size settings all I want do s read the text some one must of used it my program can import from notpwead text but it would be nice to import documnets |
| ||
If you just want the text from a Word doc, try this: (it's easier to show you the code than to type in the explanation) Graphics 800,600 SetBuffer BackBuffer() ; Open the file to Read filein% = ReadFile("C:\My Documents\Blitz test.doc") ;Just copied and pasted this code into a Word document ; Loop this until we reach the end of file While Not Eof(filein) GetByte% = ReadByte( filein ) ;count the bytes as you read them in count = count + 1 ;The Word doc header is 1,536 characters, so just skip past them ;BTW, the header length is the same for both Word 97 and Word 2000 If count > 1536 Then ;Chr$(13) is the next line character, so print the current line, and reset "Word$" to null If GetByte = 13 Then Text 0,spacing,Word$ spacing = spacing + 15 Word$ = "" End If ;If valid ASCII character then continue adding to Word$ If GetByte > 31 And GetByte < 128 Then Word$ = Word$ + Chr$(GetByte) End If End If Wend Flip WaitKey() CloseFile filein End The straggler characters at the end can be ignored, or I'll leave it to you to figure the rest (cuz I don't know how). As you can see in the code, the header data is skipped and any valid ASCII character is concatenated to the "Word$" variable. I haven't checked to see if the length of the "footer" data is the same length for every document, but this may be a way to eliminate the stray characters at the end of the text. Andy |
| ||
With this I stilll get 3 lines of jumbo afterwards |
| ||
Yeah I know, that's what I said in my previous post. Let me think about it a while. (you can too!) There's bound to be a way around this. |