cancel
Showing results for 
Search instead for 
Did you mean: 

Changing the Collation without rebuilding the DB

Former Member
0 Kudos

Hi Gurus,

I am looking for an easy option to change the collation of my SQLA10 DB from 'windows-1252' to 'UTF-08'.

I know that I can do it using unload, create new DB with 'UTF-08' and reload the data back into the new DB.

I have few challenges in doing so and trying to avoid this cumbersome process.

1. The no of database that need to be changed - around 1000

2. I found that this is not a easy and straightforward process and faced lot of errors during loading the data. e.g. right truncate of string data, foreign key errors.

3. The cost of supporting the users if any error occurs- As the no. of DB are huge, if any errors occur during the migration may lead to spend lot of time.

I need to know if anyone faced similar kind of issue and have any quick solution to handle it.

The reason I want to change collation to 'UTF-08' is to support Non-English characters in the DB and also have to export it into XML files.

Any help will be appreciated.

Thanks

Ramendra

Accepted Solutions (0)

Answers (2)

Answers (2)

former_member194571
Active Participant
0 Kudos

Hi Ramendra,

there may be a fundamentally different approach to your issue, but I don't know if that's possible in v.10. Btw I think it's somewhat odd to run a system that you can't afford to rebuild on a software that has been unsupported for such a long time.

Anyway, why don't you change those columns intended to accept international characters to [long ]n[var]char? This should be possible while the database is running and provides the extra benefit that you don't have to recalculate column max lengths. Remember that [var]char max length is specified in bytes, so a char (20) in a unicode encoded database will not be able to store arbitrary 20 characters in such a column. The max length of n[var]char columns is specified in characters.

HTH

Volker

Former Member
0 Kudos

Hi Volker

Thanks for looking into my problem and sharing the options.

Regarding your question 1-  I think it's somewhat odd to run a system that you can't afford to rebuild on a software that has been unsupported for such a long time.

Reply - I do understand your point and we are moving into migrating to V16 but as the volume is huge, it's slow and bit difficult to manage the old sites. And I am looking into an easy way to do it with minimal impact.

Regarding your question 2- Anyway, why don't you change those columns intended to accept international characters to [long ]n[var]char?

Reply - I did tried that and it's saves the data perfectly but we are also relying into extracting the data into XML and with 'windows-1252' DB collation, FOR XML EXPLICIT throws control character error. This is the reason I am thinking to change the collation to 'UTF-08'.

If XML extracts can work without changing the collation and having international characters, I am more that happy to do it and this task will be less painful.

Regarding your suggestion - This should be possible while the database is running and provides the extra benefit that you don't have to recalculate column max lengths. Remember that [var]char max length is specified in bytes, so a char (20) in a unicode encoded database will not be able to store arbitrary 20 characters in such a column. The max length of n[var]char columns is specified in characters.

Reply - Thanks for sharing this point, I will verify the impact but it there are a guideline you know which can be helpful.

Thanks

Ramendra

former_member194571
Active Participant
0 Kudos

Hi Ramendra,

I can't help you with your XML issues, but for your final question, I've got a set of suggestions.

1. The easy approach to the char length / byte length issue is to specify the max column length in characters [c char (32 char)]. This is possible for recent versions, but I don't know if the support goes back as far as v.10. This works for column lengths up to 8191 characters, everything beyond must be declared as long varchar. The result is, just like with the nchar family of data types, that the system will grant the maximum string length in characters, with a variable storage footprint in bytes per character. You may be able to achieve this with a search & replace against your create database SQL script (or reload.SQL from dbunload).

2. The more tricky approach is to multiply the length of columns by a factor x. The maximum number of bytes per character in UTF-8 is four, but if you know of possible restrictions in eligible characters, you may reduce this to three (which is used by SAP applications afaik) or even two. To make sure that the old maximum length in characters is not exceeded, you'd need a check constraint using the LENGTH or CHAR_LENGTH function. The maximum length for [var]char columns continues to be 32767 [bytes], so you're in a similar situation as in opt. 1 if your old length exceeds 8191 / 10922 / 16383. For this approach you'll need a generator script against system tables or manual changes against your create database script.

HTH

Volker

VolkerBarth
Active Participant
0 Kudos

As to the XML issue (which seems to be your need for the database collation change):

Have you tried to use the builtin csconvert() to convert between CHAR and NCHAR (and thereby hopefully eliminating the need for a collation change)?

The builtin XML data type is a CHAR data type (which might come as a surprise, at least for me it came...), as is discussed here:

http://sqlanywhere-forum.sap.com/questions/26652/whats-the-difference-between-the-xml-and-long-varch...

former_member194571
Active Participant
0 Kudos

My understanding of XML (and I'm not an expert on this) is that characters beyond ASCII (127) are represented by an escape sequence, so while they can be included in the document, they are not included in the XML syntax representation.

VolkerBarth
Active Participant
0 Kudos

Hm, AFAIK, that's a matter of the actual encoding, as specified in the XML declaration or w.t.h. of a BOM marker. Usually, nearly all Unicode characters can be used.

Former Member
0 Kudos

Hi Volker

Thanks for the suggestion.

I tried your first approach (in source DB) and then did the migration.

This helped me also to get rid of 'right truncation error' while migration.

Now, I am getting another error while 'decrypting the data'.

Database error -851: Select Error: SQLSTATE = 08001 [Sybase][ODBC Driver][SQL Anywhere]Decryption error: Input must be a multiple of 16 bytes in length for AES


Although I still need to go through the details and try of find the root cause but though to share.

Thanks

Ramendra

Former Member
0 Kudos

Thanks Volker for your suggestion.

I tried it with my 'windows-1252' collation DB.

I did following things:

1. Converted the column data type from 'long varchar' to 'long nvarchar' to hold Non-English characters.

2. Used e.g. CSCONVERT( mytext, 'cp936' ) in FOR XML EXPLICIT Query

Although it didn't through the 'control character' error but the output was also not the actual data I have in DB.

e.g. I converted 'Sydney' to Chinese Simplified '悉尼' which saves perfectly in DB

But the output I got in the XML is "z6TE4Q==" which is not I need.

I will continue my research and request you to share your thoughts.

Regards

Ramendra

VolkerBarth
Active Participant
0 Kudos

Hm, my idea was to do it the other way, i.e. create the "SELECT ... FOR XML..." query as usually and then convert the result set via csconvert() to the according encoding.

Note, the return type of a FOR XML query is the XML data type which is a CHAR based type, as stated in my first posting, and which can therefore not encode all unicode characters when using a single-byte charset for CHAR. Using csconvert() within the FOR XML query will therefore not solve the problem.

When dealing with return types, I'd recommend the builtin exprtype() function or the sa_describe_query() procedure, here for a simple query from the SA demo database:

    select exprtype('select * from products where ID >= 600 for xml auto', 1); -- returns XML

    call sa_describe_query('select * from products where ID >= 600 for xml auto') -- returns xml as domain_name

Here's a snippet of code to share my idea - as the sample data do not contain non-English characters, I can't tell whether it will work but I hope you get the idea.

begin

   declare result xml;

   declare resultAsNchar long nvarchar;

   set result = (select * from products where ID >= 600 for xml auto);

   -- that may or may not require an explicit call of csconvert(xml, 'char_charset', 'nchar_charset');

   set resultAsNchar = result;

end;

Former Member
0 Kudos

Thanks Volker

I just tried this option too but this time it's give me the same error of 'control character in XML output' SQLCODE -911

But the same suggestion gives me perfect output in UTF-08 DB.

VolkerBarth
Active Participant
0 Kudos

Ah, I see.

Is there a chance to build the XML result "manually", i.e. by concatenating the results of a normal query (i.e. without FOR XML) with according tags, hereby using a return data type of long nvarchar (or varbinary), so it is not restricted to the CHAR datatype?

(Apparently, I don't know how complex your XML results are compared to the effort to change the database collation, so I'm limited to guesswork here.)

It might be helpful if you could share a "real" sample as part of another question here (or on the other forum)...

Former Member
0 Kudos

Hi Volker

The XML we are generating is pretty complex and may not be possible to handle that way.

I am really thankful for your valuable suggestions but we created a giant and have handle it either with Non-English or without it.

If something SAP can add to handle as part of enhancements for XML, this will make life easy.

The Enhancement like : with NCHAR conversation of a column, I can now store and read the output in ISQL but not in XML.

If XML can able to do the same thing, its so simple to manage.

Thanks for all your help.

Regards,

Ramendra

Message was edited by: Ramendra Singh

former_member186998
Contributor
0 Kudos

Hi Ramendra,

I recommend the following methods.

https://wiki.scn.sap.com/wiki/display/SQLANY/Changing+the+Collation+and+Codepage+When+Rebuilding+an+...


I do not know the person who tried a different method.


Thanks,

Atsushi

Former Member
0 Kudos

Hi Atsushi,

Thanks for sharing the link.

I tried these steps but as I explained my position, it's pretty difficult for me to follow these steps to change the collation.

Regards,

Ramendra

former_member186998
Contributor
0 Kudos

Hi Ramendra,

I understood your explanation.

However, you cannot change the collation so that there is explanation in the following manual.


Changing a database from one collation to another

Changing a database to another collation requires a rebuild of the database. Collations are chosen at database creation time and cannot be changed.

http://dcx.sap.com/index.html#1101/en/dbadmin_en11/changing-collation-natlang.html

Thanks,

Atsushi

Former Member
0 Kudos

Thanks Atsushi for clarifying me..

I have 2 tasks to perform..

1. Migrate from SQLA10 to SQLA16

2. Change the collation to 'UTF-08' in the new DB

Although by using below command, I am able to migrate successfully from SQLA10 to SQLA16 but not the collation.

"dbunload -v -c "UID=DBA;PWD=sql;DBF=C:\ASCII.db" -an "C:\UTF-08.db" -ap 32768 -ea None -ii"

My wish was if SQLA had given a feature to add some new option to change the collation also during migration, it would have made our life easy.

Something like this

"dbunload -v -c "UID=DBA;PWD=sql;DBF=C:\ASCII.db" -an "C:\UTF-08.db" -z "UTF8BIN" -zn "UTF8BIN" -ap 32768 -ea None -ii"

where -z and -zn is not currently part of dbunload rather of dbinit.exe.

Otherwise it's a nightmare to do this migration in 1000 DBs.

I hope this explains my frustration and I wish some day SQLA have such a feature.

Regards

Ramendra

former_member186998
Contributor
0 Kudos

Hi Ramendra,

With this method....

For example:

dbinit -dba dba,sql -z UTF8BIN -zn UTF8BIN c:\mynewdata.db

dbunload -c "UID=xxx;PWD=zzz;DBF=c:\original.db" -ac UID=dba;PWD=sql;DBF=c:\mynewdata.db"

You can create batch file by these commands.

dbunload -ac option:

Connects to an existing database and reloads the data directly into it, combining the operations of unloading a database and reloading the results into an existing database.

http://dcx.sap.com/index.html#sa160/en/dbadmin/dbunload.html

Thanks,

Atsushi

Former Member
0 Kudos

Thanks Atsushi for your inputs.

I just tried it and getting the same error.

Steps

1. Created a new DB using

dbinit -p 32768 -z "UTF8BIN" -zn "UTF8BIN" -pd -t "UTF-08.log" "C:\UTF-08.db"

2. executed the dbunload command

dbunload -c "UID=DBA;PWD=sql;DBF=C:\ASCII.db" -ac "UID=DBA;PWD=sql;DBF=C:\UTF-08.db"

3. getting the attached error..

But if I am doing it as per below, command it works fine.

dbunload -v -c "UID=DBA;PWD=sql;DBF=C:\ASCII.db" -an "C:\UTF-08.db"  -ap 32768 -ea None -ii

It looks to me that because of changing the collation in NEW DB is creating problems while loading the data from OLD to NEW DB.

To prove my theory

I created a NEW DB using below command

dbinit -z "1252LATIN1" -zn "UCA" -t "UTF-08.log" "C:\UTF-08.db" (with the same collation as OLD DB)

then followed the step 2 and it worked very fine.

I also tried changing the ENCODING 'windows-1252' and replace with ENCODING 'UTF-8' in LOAD Table commands but the result is the same.

former_member186998
Contributor
0 Kudos

Hi Ramendra,

I think that this problem be difficult to resolve.

When you have an official account for support, you should ask this question using the account. (You must present the repro step of this phenomenon)

You may obtain useful information.

I am sorry that I couldn't help you.

Thanks,

Atsushi

Former Member
0 Kudos

Hi Atsushi,

I do have an official account for support.

I will create the steps and share during the support ticket.

Thanks again for your help.

Regards

Ramendra

Former Member
0 Kudos

Hi Atsushi,

Thanks for sharing your inputs.

I did given another try by switching off string_rtruncation and able to unload the data successfully with 'UTF-08' collation.

set option public.string_rtruncation = 'Off'

But referring to Volker's point below, I think, I also should be considering the actual storage in char data type.

I will verify the overall impact before migrating.

Regards

Ramendra