on 08-01-2005 3:41 PM
Hello,
I am looking for a way to convert word files to pdf using Java. I am familiar with the jakarta POI project but unfortunally it doesn't contain this option. Does anyone have a sample code to do that?
Roy
Hi Roy,
this is a hard task. The problem is not only reading Word (POI's word API is somewhere between alpha and beta, it's not as good as the XLS-API, and at the moment, not under further development).
Some good source to look out for libraries for reading/writing different file formats is http://schmidt.devlib.org/java/document-libraries.html
Anyhow, when I had to convert Word into pure txt, I used StarOffice and it's Java API (not very comfortable, but possible). This is a way I would think about (you'd need StarOffice installed on the server). Maybe extracting to RTF before would bring you further, don't know...
As said, a hard task...
Hope it helps
Detlev
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey Detlev,
I think I found good library to do that, it is called iText: http://www.lowagie.com/iText/docs.html.
I will investigate it and let you know...
Cheers,
Roy
Hi Roy,
from the FAQ:
> Can I convert WORD doc-files or RTF to PDF using iText?
>
> No, iText is only able to generate RTF.
> It doesn't do RTF or Word-doc parsing. Try Apache POI.
At least, it would be a very hard task to support each Word feature manually into PDF creation.
I don't think that there is a comfortable way except really "printing" the PDF (of course, this should be done automatically, so this is the question if StarOffice or a third tool provide the API for this functionality).
Hope it helps
Detlev
Dear all,
I am looking for a java component that will allow me to upload a .doc and convert it to .txt on the fly. In other words, the user clicks on my upload button, selects a .doc file, and sends it. This wonder component converts the file to .txt and stores it in my designated place.
I already have the upload working.
Is there any such marvel of a component out there that will do the trick?
thanks
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Roy,
You can explore the following options:
1). Use an opensource java-com inter-op tool like jawin,jacob to read and use various features of MS Word. I have used Jawin to manipulate word files. It works for me. It gave complete access to all MS Word features. For conversion into PDF, you will have to install PDF conversion plugin in your word. And, you can access it programatically. It's initally, a little messy, but as soon as you get a hang of it, its very simple and exiting.
You can find jawin at:http://jawinproject.sourceforge.net/
It's documentation has an example of MS PPT creation from Java. You can use it to create your implementation for MS Word.
2). If you are using MS Office 2003+ editions, you can give MS-WML (a microsoft xml format of Word documents) a shot. You can extract XML from MS Word, and then you'll need to figure out how can a PDF can be created form this XML file. You can check out the microsoft site for further details. You can also check out the following like abt WML:http://www.javaworld.com/javaworld/jw-07-2004/jw-0712-officeml.html
Also MS Link:http://msdn.microsoft.com/office/understanding/word/codesamples/default.aspx?pull=/library/en-us/odc_wd2003_ta/html/odc_wdalpine.asp
Hope this solves your problem.
Cheers,
Rahul
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This solution may not be appropriate but if you have either Adobe or one of several PDF generating libraries installed then they work as a printer driver. You can start word and via its automation interface have it print the document using the PDF printer driver. This will give you a PDF file.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Roy, hi Chris,
this is somewhat the mechanism I thought of using StarOffice (using StarOffice and not Word, for having the Java API at hand, even if - already said - the API for accessing such internals is all but intuitive).
So the aim would be to (programatically) "press" the "print" button / PDF Generator button (or, that's the StarOffice-thought, to simulate all this).
Also "printing" as PostScript into a certain subdir would do it if Adobe Acrobat is fully installed (and watching this directory); then Acrobat will convert the PS into PDF on the fly.
Hope it helps
Detlev
User | Count |
---|---|
89 | |
10 | |
10 | |
9 | |
6 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.