cancel
Showing results for 
Search instead for 
Did you mean: 

Can plain text be extractted from word doc in OfficeControl?

Former Member
0 Kudos

Hi Everyone,

I am doing a task to embed MS Word in Web Dynpro.

Using SAP UI element OfficeControl to realize it.

The link is: http://help.sap.com/saphelp_nw70ehp1/helpdata/en/d1/af8841349e1909e10000000a155106/frameset.htm

I read the Demo Source Code for OfficeControl in Package "SIOS".

OfficeControl can create, open, save and close a MS Word Doc, -


It's very good!

The Content of Word Doc in OfficeControl is saved as XString format, (usually include plain text, different font, style, color or image,)

Can I extract plain text from the content of Word Doc in OfficeControl?

Any solution or suggestion are welcome,

Thanks in advance!

Best Regards,

Derek Zhao

Accepted Solutions (0)

Answers (1)

Answers (1)

Madhu2004
Active Contributor
0 Kudos

hi,

XSTRING data can be converted to string data using the function modules ECATT_CONV_XSTRING_TO_STRING or HR_KR_XSTRING_TO_STRING

Regards,

Madhu

Former Member
0 Kudos

When I call the Function Module: ECATT_CONV_XSTRING_TO_STRING

I encounter run-time error:

in SD7 the systm: Error occurred during character conversion

The termination type was: ERROR_MESSAGE_STATE

The ABAP call hierarchy was:

Function: ECATT_CONV_XSTRING_TO_STRING of program SAPLECATT_REUSE

My source code is as follows:

DATA lo_el_context TYPE REF TO if_wd_context_element.

DATA ls_context TYPE wd_this->Element_context.

DATA lv_datas TYPE wd_this->Element_context-datas. "-- XString variable

DATA lv_datas_string TYPE wd_this->Element_context-datas_string. "-- String variable

lo_el_context = wd_context->get_element( ).

IF lo_el_context IS INITIAL.

ENDIF.

lo_el_context->get_attribute(

EXPORTING

name = `DATAS`

IMPORTING

value = lv_datas ).

CALL FUNCTION 'ECATT_CONV_XSTRING_TO_STRING'

EXPORTING

IM_XSTRING = lv_datas

" IM_ENCODING = 'UTF-8'

IMPORTING

EX_STRING = lv_datas_string.

When I call the function module "ECATT_CONV_XSTRING_TO_STRING", I get run-time error.

Who can give me some suggestion?

Many thanks in advance!

Madhu2004
Active Contributor
0 Kudos

Can you try SCMS_XSTRING_TO_BINARY and SCMS_BINARY_TO_STRING..

Former Member
0 Kudos

When I use the FM "SCMS_XSTRING_TO_BINARY" and "SCMS_BINARY_TO_STRING".

I translate the XString to String, however, I get messy code as:

ÐÏ#ࡱ#á################>###þÿ #####################################þÿÿÿ########

Can I translate it to normal ASCII code?

Thank you!

Below is my source code:

CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'

EXPORTING

BUFFER = lv_datas

IMPORTING

OUTPUT_LENGTH = lv_length

TABLES

binary_tab = itab.

CALL FUNCTION 'SCMS_BINARY_TO_STRING'

EXPORTING

input_length = lv_length

  • mimetype = 'text/plain; charset=utf-8'

IMPORTING

text_buffer = lv_datas_string

output_length = lv_data_len

TABLES

binary_tab = itab.

lo_el_context = wd_context->get_element( ).

IF lo_el_context IS INITIAL.

ENDIF.

lo_el_context->set_attribute(

name = `DATAS_STRING`

value = lv_datas_string ).

Former Member
0 Kudos

Hi

Please try with FM "HR_KR_XSTRING_TO_STRING" . It did work for me.

Cheers,

Aditya.

Former Member
0 Kudos

Thank you,

I use the function module "HR_KR_XSTRING_TO_STRING",

I still get messy code as: 饉#胥#################>#################################################

The input XString is: D0CF11E0A1B11AE1000000000000000000000000000000003E000300FEFF0900060000000000000000000000010000000200000000000000001000000400000001000000FEFFFFFF0000000003000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Actually, it is a MS Word Doc, which only include a text "Hello World!", I get the MS Word Content as above input XString.

I use source code:

CALL FUNCTION 'HR_KR_XSTRING_TO_STRING'

EXPORTING

from_codepage = '8500'

in_xstring = lv_datas "-- the input XString, actrually it presents a text "Hello World!"

IMPORTING

out_string = lv_datas_string. "-- the output String, should be "Hello World!", but the result is messy code.

What's the key point?

thomas_jung
Developer Advocate
Developer Advocate
0 Kudos

You can't simply convert hte XSTRING of a Word Document to a STRING. The internal format of the word document isn't clear text. It is microsoft's propriatery binary format.

Former Member
0 Kudos

Thank you very much, Mr. Jung.

Do you know the Developer's E-mail of SIOS package?

I want to write an e-mail and ask them some questions.

ChrisPaine
Active Contributor
0 Kudos

If the MS word document was formatted in an DOCX format as opposed to DOC - you could read it... Although you'd have to unwrap the XML from the archive that is the DOCX format, and then parse the XML. In a very similar way to the xlsx2abap code.

So - you can't extract data from a MS Word .doc file - but that's not the only format that MS Word (and the OfficeControl) can work with and some of the other formats do allow for a path to data extraction.

Chris

thomas_jung
Developer Advocate
Developer Advocate
0 Kudos

>Do you know the Developer's E-mail of SIOS package?

Nope, nor is that really protocol. If you have an issue with the interfaces then you should enter an internal support ticket.

Former Member
0 Kudos

Hi Chris,

I am very glad to get your constructive reply, thank you.

How can I save the content of MS Word In OfficeControl, in XML format or Docx format?

In: (System) SMV --> (package) SIOS --> (Dynpro Component) IOS_TEST_HELLOWORLD_MS,

I run the Web Dynpro application, input some contents such as "Hello World!" in MS Word in OfficeControl.

When I click the "savedocument" button, the process source code is as follows:

DATA error_savedocument TYPE REF TO if_wd_context_element.

DATA error_savedocument_stru TYPE wdr_ext_attribute_pointer.

error_savedocument = wd_context->get_lead_selection( ).

error_savedocument_stru-attribute_name = 'error_savedocument'.

error_savedocument_stru-element = error_savedocument.

wd_this->document->savedocument( errorinformation = error_savedocument_stru ).

CATCH cx_ios_document INTO refexp.

I think after I clicked the button "savedocument",

OfficeControl doesn't actually save the content of MS Word in any format (anyway, neither .docx nor xml)

Because the content of MS Word in OfficeControl is mapped to a Context Attribute DATAS(XString type),

But the Context Attribute DATAS isn't parameter for function:

wd_this->document->savedocument( errorinformation = error_savedocument_stru )

In another word, OfficeControl just display MS Word in web, but not really save the Content of MS Word as a file (.docx or xml)

What do you think about it?

Many thanks in advance.

Best Regards,

Derek

ChrisPaine
Active Contributor
0 Kudos

Hello,

If you pass in an alternatively formatted document when you create the Office Control - populate the document source property with a "template". Then this should "save" the document in the same format.

NB if the user creates a new document and then "saves" it, you have no control over the format that they "save" the document into.

Former Member
0 Kudos

Hello Chris,

Thank you very much!

But how can I "populate the document source property with a 'template'"?

I go to (package) SIOS --> (Dynpro Component) IOS_TEST_HELLOWORLD_MS --> (View) view_helloworld -->

(Layout) OfficeControl,

I can't find there is a property named "document source property" in OfficeControl.

Nor can I find the property "document source property" in class "CL_WD_OFFICE_CONTRO".

Can you give me some advice?

Thanks a lot.

Best Regards,

Derek

ChrisPaine
Active Contributor
0 Kudos

Property dataSource of the UI element:

http://help.sap.com/saphelp_nw70ehp1/helpdata/en/8e/128b41b4b3b55fe10000000a1550b0/frameset.htm

The same one that you receive the "saved" document in.

alternately to complicate things, I wouldn't but just to be more complete, interface IF_IOS_DOCUMENT has a method SET_DOCUMENTURL which allows you to set the document to be loaded into the control.

Former Member
0 Kudos

Hi Chris,

Your advice helps me a lot, thank you.

I upload an XML-Format template Word Doc to server as a MIME Object.

When OfficeControl is started in Web Dynpro, OfficeControl automatically open the XML-Format template.

For the first time, I get the XString-type Context attribute bind to the content of the Word Doc,

then translate it to string, I got the XML-format content, it's great!

However, after the first time, when I input any new contents in MS Word in Web Dynpro,

no matter I execute "Ctrl + S" or click the "savedocument" button,

when I translate the XString Context attribute to String, I got messy code. (but the first time, it is good plain text)

I use the function module: ECATT_CONV_XSTRING_TO_STRING (good for first time, dump after first time),

SCMS_XSTRING_TO_BINARY, SCMS_BINARY_TO_STRING (good for first time, messy code after first time).

My Demo source code is in: (system) SMV --> (local object) zhaode --> (Dynpro Component) ztest_office_control

core source code is as:

clear itab.

CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'

EXPORTING

BUFFER = lv_datas

IMPORTING

OUTPUT_LENGTH = lv_length

TABLES

binary_tab = itab.

CALL FUNCTION 'SCMS_BINARY_TO_STRING'

EXPORTING

input_length = lv_length

mimetype = 'text/plain; charset=utf-8'

IMPORTING

text_buffer = lv_datas_string

output_length = lv_data_len

TABLES

binary_tab = itab.

Can you give me some advice?

Best Regards,

Derek

ChrisPaine
Active Contributor
0 Kudos

>My Demo source code is in: (system) SMV --> (local object) zhaode --> (Dynpro Component) ztest_office_contro

much as I'm flattered by the implication that I might have access to SAP internal systems - this is not the case A long time ago I did have a C user id - but that was a long time ago...

As for the rest - I'm sorry I'm really not sure where to go - it is my understanding that the control should not overwrite the format of the document that it is working with - so document provided in XML format should be saved in XML format - this works like standard MS Word - if you provide a doc in RTF - even if you name the extension .DOC then it saves back in RTF.

Try it yourself - create a document in wordpad - save it as RTF (test.rtf) - then rename the file to a .doc extension (test.doc) open in MS word. do some changes, save it.

you should see that the document still is human readable (well just about - MS Word adds a load of junk formatting nto file too)

the routines you are using to convert from XSTRING to string are fine - it may be that the file is saving in a format other than utf-8 - potentially utf-16 or utf-32. Have you checked that?

Other than that - I'm sorry I'm really not sure how to help any further.

Chris