cancel
Showing results for 
Search instead for 
Did you mean: 

Looking for java code to convert word files to pdf

Former Member
0 Kudos

Hello,

I am looking for a way to convert word files to pdf using Java. I am familiar with the jakarta POI project but unfortunally it doesn't contain this option. Does anyone have a sample code to do that?

Roy

Accepted Solutions (1)

Accepted Solutions (1)

detlev_beutner
Active Contributor
0 Kudos

Hi Roy,

this is a hard task. The problem is not only reading Word (POI's word API is somewhere between alpha and beta, it's not as good as the XLS-API, and at the moment, not under further development).

Some good source to look out for libraries for reading/writing different file formats is http://schmidt.devlib.org/java/document-libraries.html

Anyhow, when I had to convert Word into pure txt, I used StarOffice and it's Java API (not very comfortable, but possible). This is a way I would think about (you'd need StarOffice installed on the server). Maybe extracting to RTF before would bring you further, don't know...

As said, a hard task...

Hope it helps

Detlev

Former Member
0 Kudos

Hey Detlev,

I think I found good library to do that, it is called iText: http://www.lowagie.com/iText/docs.html.

I will investigate it and let you know...

Cheers,

Roy

detlev_beutner
Active Contributor
0 Kudos

Hi Roy,

from the FAQ:

> Can I convert WORD doc-files or RTF to PDF using iText?

>

> No, iText is only able to generate RTF.

> It doesn't do RTF or Word-doc parsing. Try Apache POI.

At least, it would be a very hard task to support each Word feature manually into PDF creation.

I don't think that there is a comfortable way except really "printing" the PDF (of course, this should be done automatically, so this is the question if StarOffice or a third tool provide the API for this functionality).

Hope it helps

Detlev

Former Member
0 Kudos

Yep, I just saw it 2.

Well, I'll try looking for a free web service which does that and than I'll simply pass it the word file and get the pdf. I found such web services but unfortunally not free ones...

Roy

Answers (3)

Answers (3)

Former Member
0 Kudos

Dear all,

I am looking for a java component that will allow me to upload a .doc and convert it to .txt on the fly. In other words, the user clicks on my upload button, selects a .doc file, and sends it. This wonder component converts the file to .txt and stores it in my designated place.

I already have the upload working.

Is there any such marvel of a component out there that will do the trick?

thanks

detlev_beutner
Active Contributor
0 Kudos

Hi Saurabh,

first, welcome on SDN!

Second: For a new issue please open a new thread.

Third: This for example is easy by using StarOffice as mentioned before (for this I used it).

Best regards

Detlev

Former Member
0 Kudos

Hello Roy,

You can explore the following options:

1). Use an opensource java-com inter-op tool like jawin,jacob to read and use various features of MS Word. I have used Jawin to manipulate word files. It works for me. It gave complete access to all MS Word features. For conversion into PDF, you will have to install PDF conversion plugin in your word. And, you can access it programatically. It's initally, a little messy, but as soon as you get a hang of it, its very simple and exiting.

You can find jawin at:http://jawinproject.sourceforge.net/

It's documentation has an example of MS PPT creation from Java. You can use it to create your implementation for MS Word.

2). If you are using MS Office 2003+ editions, you can give MS-WML (a microsoft xml format of Word documents) a shot. You can extract XML from MS Word, and then you'll need to figure out how can a PDF can be created form this XML file. You can check out the microsoft site for further details. You can also check out the following like abt WML:http://www.javaworld.com/javaworld/jw-07-2004/jw-0712-officeml.html

Also MS Link:http://msdn.microsoft.com/office/understanding/word/codesamples/default.aspx?pull=/library/en-us/odc_wd2003_ta/html/odc_wdalpine.asp

Hope this solves your problem.

Cheers,

Rahul

Former Member
0 Kudos

This solution may not be appropriate but if you have either Adobe or one of several PDF generating libraries installed then they work as a printer driver. You can start word and via its automation interface have it print the document using the PDF printer driver. This will give you a PDF file.

Former Member
0 Kudos

Hey Chris,

Can you please elaborate this?

Roy

detlev_beutner
Active Contributor
0 Kudos

Hi Roy, hi Chris,

this is somewhat the mechanism I thought of using StarOffice (using StarOffice and not Word, for having the Java API at hand, even if - already said - the API for accessing such internals is all but intuitive).

So the aim would be to (programatically) "press" the "print" button / PDF Generator button (or, that's the StarOffice-thought, to simulate all this).

Also "printing" as PostScript into a certain subdir would do it if Adobe Acrobat is fully installed (and watching this directory); then Acrobat will convert the PS into PDF on the fly.

Hope it helps

Detlev