Solved: Read Pdf file to XML via java mapping

vinaymittal · ‎10-16-2014

Hi

the scenario is File to Proxy, i have to read a pdf files content(all text) i have written the code

import java.io.IOException;

import java.io.FileReader;

import java.io.BufferedReader;

import java.io.*;

import org.apache.pdfbox.util.*;

import org.apache.pdfbox.pdmodel.*;

class ReadPdf

{

public static void main(String args[])

{

PDDocument pd;

BufferedWriter wr;

try {

File input = new File("original.pdf"); // The PDF file from where you would like to extract

File output = new File("SampleText.txt"); // The text file where you are going to store the extracted data

pd = PDDocument.load(input);

System.out.println(pd.getNumberOfPages()); //prints number of pages

System.out.println(pd.isEncrypted()); //false as not encrypted

pd.save("CopyOfOriginal.pdf"); // Creates a copy called "CopyOforiginal.pdf"

PDFTextStripper stripper = new PDFTextStripper();

stripper.setStartPage(1); //Start extracting from page 1

stripper.setEndPage(1); //Extract till page 1

wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output)));

stripper.writeText(pd, wr);

if (pd != null) {

pd.close();

}

// I use close() to flush the stream.

wr.close();

}

catch (Exception e)

{

e.printStackTrace();

}

it works i have modified it to work in java mapping as

import java.io.InputStream;

import java.io.OutputStream;

import java.util.Map;

import java.util.HashMap;

import java.io.IOException;

import java.io.FileReader;

import java.io.BufferedReader;

import java.io.*;

import org.apache.pdfbox.util.*;

import org.apache.pdfbox.pdmodel.*;

import com.sap.aii.mapping.api.AbstractTransformation;

import com.sap.aii.mapping.api.StreamTransformationException;

import com.sap.aii.mapping.api.TransformationInput;

import com.sap.aii.mapping.api.TransformationOutput;

public class PdftoXml extends AbstractTransformation

{

public void transform(TransformationInput in, TransformationOutput out) throws StreamTransformationException

{

PDDocument pd;

BufferedWriter wr;

try {

pd = PDDocument.load(in.getInputPayload().getInputStream()); //convert Tranformationimput to inputstream than pass it to PDDocument constructor to read Pdf from Inputstream.

//System.out.println(pd.getNumberOfPages()); //prints number of pages

PDFTextStripper stripper = new PDFTextStripper();

stripper.setStartPage(1); //Start extracting from page 1

stripper.setEndPage(1); //Extract till page 1

String str = stripper.getText(pd);

String content[] = str.split("\n");

String result ="<?xml version=\"1.0\" encoding=\"UTF-8\"?>";

result = result.concat("<ns0:MTPdf xmlns:ns0=\"urn:mmm-com:pi:Vinay:10\">");

result = result.concat("<field1>"+content[0]+"</field1>");

result = result.concat("<field2>"+content[1]+"</field1>");

result = result.concat("<field3>"+content[2]+"</field1>");

result = result.concat("<field4>"+content[3]+"</field1>");

result = result.concat("</ns0:MTPdf>");

out.getOutputPayload().getOutputStream().write(result.getBytes("UTF-8")); //writing to output

}

catch (Exception e)

{

e.printStackTrace();

}

i am using apache third party API "PdfBox" where shall i import this API in ESR for my java mapping to work

former_member181985 · ‎10-17-2014

Hi Vinay,

The external api jar files should be part of your java development archive under root folder.

You could also use my blog concept to directly test your java mapping code from interface/operation mapping

Best Regards,

Praveen Gujjeti

former_member184720 · ‎10-16-2014

You just need to add those jars to project root folder(in eclipse/nwds)

Right click on the project folder(root)->import-> General(Archive File)->select your jar file

Read Pdf file to XML via java mapping

Accepted Solutions (1)

Accepted Solutions (1)

Answers (1)

Answers (1)

Re: CIG transaction tracker handling

Re: CIG transaction tracker handling

Re: The data action couldn't run because of a prob...

Re: can S-user List report be modified?

Re: can S-user List report be modified?