cancel
Showing results for 
Search instead for 
Did you mean: 

Performance problems after Update from 7.6.00.37 to 7.6.03.15

thomasschulz2
Participant
0 Kudos

Hi,

after the Update from 7.6.00.37 to 7.6.03.15 in our live system we noticed a lot of performance problems. We have tested the new version before on our test system, but there don't noticed effects like this.

We have 2 identical systems (Opteron 64 bit, openSuse 10.2) with log shipping between them. We updated first the standby system, switched from online to standby (with copy of cold log) and started the new server as online system. After that we run a complete backup (runtime: 1 hour) for starting a new backup history and for activating autolog. Then

With the update we changed USE_OPEN_DIRECT to YES, but the performance of the system was very slow afterwards. After the backup it remains at a high load average (> 10, previous system had about 2-4), with nearly 100% of CPU usage for the db kernel process.

Next day we switched USE_OPEN_DIRECT back to NO. The system first runs better, but periodically rises up to a load average of 6 and slow down the performance of various applications (somebody says about 10 times slower). Here we also noticed a high usage (now 200-300%) of the db kernel process.

Our questions are:

1. Has something basically changed from 7.6.00.37 to 7.6.03.15, so that our various applications (JDBC, ODBC and Perl/SQLDBC partially on old linux systems with drivers from 7.5.00.23) don't reach same performance as before?

2. Are there any other (new) parameters, which can help? Maybe reducing MAXCPU from 4 to 3 for reserving capacities for the system (there is only one maxdb instance running)?

3. Is the a possibility to switch back to 7.6.00.37 (only for worst case)?

I have made some first steps with x_cons, but don't see any anomalies on the first look.

Regards,

Thomas

Accepted Solutions (0)

Answers (1)

Answers (1)

lbreddemann
Active Contributor
0 Kudos

Hi Thomas,

> With the update we changed USE_OPEN_DIRECT to YES, but the performance of the system was very slow afterwards. After the backup it remains at a high load average (> 10, previous system had about 2-4), with nearly 100% of CPU usage for the db kernel process.

Setting this parameter to YES is usually a good idea - but I don't see the relation to your problems.

> Next day we switched USE_OPEN_DIRECT back to NO. The system first runs better, but periodically rises up to a load average of 6 and slow down the performance of various applications (somebody says about 10 times slower). Here we also noticed a high usage (now 200-300%) of the db kernel process.

What is it about this parameter that lets you think it may be the cause for your problems?

> Our questions are:

>

> 1. Has something basically changed from 7.6.00.37 to 7.6.03.15, so that our various applications (JDBC, ODBC and Perl/SQLDBC partially on old linux systems with drivers from 7.5.00.23) don't reach same performance as before?

Yes - of course. Changes are what Patches are all about!

> 2. Are there any other (new) parameters, which can help? Maybe reducing MAXCPU from 4 to 3 for reserving capacities for the system (there is only one maxdb instance running)?

Maybe it helps, maybe not - as long as there is just guessing going on - nobody will know!

> 3. Is the a possibility to switch back to 7.6.00.37 (only for worst case)?

No - this is not supported and will lead to problems that will make you forget perfomance questions...

> I have made some first steps with x_cons, but don't see any anomalies on the first look.

Ok, looking into what the system does when it uses CPU is a first good step.

But what would be "anomalies" to you?

Do you use DBAnalyzer? If not -> activate it!

Does it gives you any warnings?

What about TIME_MEASUREMENT? Is it activated on the system? If not -> activate it!

What parameters have changed due to the patch installations (check the parameter history file)?

What queries take longer now? What is the execution plan of them?

BTW: how exactly do the tests look like that you've done on the testsystem?

Was the testsystem a 1:1 copy of the productive machine before the upgrade test?

How did you test the system performance with multiple parallel users?

KR Larr

thomasschulz2
Participant
0 Kudos

>

> > Next day we switched USE_OPEN_DIRECT back to NO. The system first runs better, but periodically rises up to a load average of 6 and slow down the performance of various applications (somebody says about 10 times slower). Here we also noticed a high usage (now 200-300%) of the db kernel process.

>

> What is it about this parameter that lets you think it may be the cause for your problems?

After changing it back to NO, the system runs better (lower load average) than with YES (but much slower than with old version!)

> > Our questions are

> >

> > 1. Has something basically changed from 7.6.00.37 to 7.6.03.15, so that our various applications (JDBC, ODBC and Perl/SQLDBC partially on old linux systems with drivers from 7.5.00.23) don't reach same performance as before?

>

> Yes - of course. Changes are what Patches are all about!

Are there any known problems with updating from 7.6.00.37 to 7.6.03.15?

> > I have made some first steps with x_cons, but don't see any anomalies on the first look.

>

> Ok, looking into what the system does when it uses CPU is a first good step.

> But what would be "anomalies" to you?

Good question! I don't really know.

> Do you use DBAnalyzer? If not -> activate it!

> Does it gives you any warnings?

> What about TIME_MEASUREMENT? Is it activated on the system? If not -> activate it!

OK, that will be our next steps.

> What parameters have changed due to the patch installations (check the parameter history file)?

>

> What queries take longer now? What is the execution plan of them?

It seems to happen for all selects on tables with a lot of rows (>10.000, partially without indexes because automatically generated). With the old version we had no problems with missing indexes or the general performance. Unfortunatly it is very difficult to extract some sql statements out of the JBoss applications. But even simple queries (without any join) runs slower, when the load average rises over 4-5.

> BTW: how exactly do the tests look like that you've done on the testsystem?

Usage over 6 weeks with our JDBC development environment (JBoss), backup and restore with various combinations of USE_OPEN_DIRECT and USE_OPEN_DIRECT_FOR_BACKUP.

> Was the testsystem a 1:1 copy of the productive machine before the upgrade test?

No - smaller hardware (32 bit), only 20% of data of the live system, few db users and applications.

> How did you test the system performance with multiple parallel users?

Only while permanent development with the 2-3 developers and some parallel tests of backup/restore. Unfortunately no tests with many users/applications.

>

> KR Larr

An UPDATE STATISTICS over all db users seems to change nothing. At the moment the system remains markedly slow and we are searching for reasons and solutions. Another attempt will be the change of MAXCPU from 4 to 3.

Regards,

Thomas

lbreddemann
Active Contributor
0 Kudos

>

> > > Next day we switched USE_OPEN_DIRECT back to NO. The system first runs better, but

> >

> > What is it about this parameter that lets you think it may be the cause for your problems?

> After changing it back to NO, the system runs better (lower load average) than with YES (but much slower than with old version!)

Hmm... that is really odd. When using USE_OPEN_DIRECT there is actually less work to do for he operating system.

> > > Our questions are

> > >

> > > 1. Has something basically changed from 7.6.00.37 to 7.6.03.15, so that our various applications (JDBC, ODBC and Perl/SQLDBC partially on old linux systems with drivers from 7.5.00.23) don't reach same performance as before?

> >

> > Yes - of course. Changes are what Patches are all about!

>

> Are there any known problems with updating from 7.6.00.37 to 7.6.03.15?

Well of course there are bugs that have been found inbetween the release of both versions, but I am not aware of something like the performance killer.

We will have to check this in detail here.

> > > I have made some first steps with x_cons, but don't see any anomalies on the first look.

> >

> > Ok, looking into what the system does when it uses CPU is a first good step.

> > But what would be "anomalies" to you?

>

> Good question! I don't really know.

Well - then I guess 'looking at the system' won't bring you far...

> > Do you use DBAnalyzer? If not -> activate it!

> > Does it gives you any warnings?

> > What about TIME_MEASUREMENT? Is it activated on the system? If not -> activate it!

>

> OK, that will be our next steps.

Great - let's see the warnings you get.

Let us also see the DB Parameters you set.

> > What parameters have changed due to the patch installations (check the parameter history file)?

> >

> > What queries take longer now? What is the execution plan of them?

>

> It seems to happen for all selects on tables with a lot of rows (>10.000, partially without indexes because automatically generated). With the old version we had no problems with missing indexes or the general performance. Unfortunatly it is very difficult to extract some sql statements out of the JBoss applications. But even simple queries (without any join) runs slower, when the load average rises over 4-5.

Hmm... the question here is still, if the execution plans are good enough to meet your expectations.

E.g. for tables that you access via the primary key it actually doesn't matter how many rows a table has (not for MaxDB at least).

> > BTW: how exactly do the tests look like that you've done on the testsystem?

> Usage over 6 weeks with our JDBC development environment (JBoss), backup and restore with various combinations of USE_OPEN_DIRECT and USE_OPEN_DIRECT_FOR_BACKUP.

Sounds like the I/O is your most suspect aspect for overall system performance...

> > Was the testsystem a 1:1 copy of the productive machine before the upgrade test?

> No - smaller hardware (32 bit), only 20% of data of the live system, few db users and applications.

>

> > How did you test the system performance with multiple parallel users?

> Only while permanent development with the 2-3 developers and some parallel tests of backup/restore. Unfortunately no tests with many users/applications.

Ok - so this is next to no testing at all when it comes to performance.

> An UPDATE STATISTICS over all db users seems to change nothing. At the moment the system remains markedly slow and we are searching for reasons and solutions. Another attempt will be the change of MAXCPU from 4 to 3.

Why do you want to do that? Have you observed any threads that don't get a CPU because all 4 cores are used by the MaxDB kernel?

regards,

Lars

thomasschulz2
Participant
0 Kudos

Hi Lars,

sorry for the long time without any answers, but in face of our actual situation we tried and discussed a lot of solutions to get our systems running with the old performance.

After some tries with MAXCPU = 3 (clear reduction of cpu load, but also bad performance) and USE_OPEN_DIRECT (same effects as described in my first reports) without any positive effect we noticed, that some select statements with simple joins needs very long time for execution, although an EXPLAIN told us, that the indexes will be used and the costs should be very low.

Example (simple abstractions from real tables, the relevant ids are primary keys or have an index):

select * from tab1, tab2
where tab1.id1 = tab2.id1
and tab2.id2 = 100

=> runs very slow (some minutes!)

select * from tab1
where id1 in (select id1 from tab2 where id2 = 100)

=> runs very fast (0.3 sec)

As a last trial I changed some support parameters in out test db, tested it with an slow example and find the solution in OPTIMIZE_OPERATOR_JOIN = NO! After changing it in our live system, the systems runs with old performance and an constant load average under 1 for now about 2 hours.

Our problems apparently were caused by a change from version 7.6.00.37 to 7.6.03.15 in calculating the execution plans. What disadvantages do we have to await with OPTIMIZE_OPERATOR_JOIN = NO?

Regards,

Thomas

lbreddemann
Active Contributor
0 Kudos

Hi Thomas,

of course you can continue to follow a try-and-error approach to solve your issue.

Good luck with that.

I specifically prompted to deliver information, like the DBAnalyzer warnings, the execution plans of the actual queries etc.

With the information you've provided I could also take a deep look into a glass bowl as well ...

Therefore I don't want to spend more time on guessing - it doesn't help you or me.

To get an overview over the changes between the two builds of MaxDB you may want to use the "My Release Notes" feature on http://maxdb.sap.com/webpts.

Information on the parameter meaning can be found in the [MaxDB Wiki|https://www.sdn.sap.com/irj/sdn/wiki?path=/display/maxdb/main], the [documentation|http://maxdb.sap.com/documentation/] and the [Internals Course Materials|http://maxdb.sap.com/training/].

KR

Lars

Melanie
Advisor
Advisor
0 Kudos

Hello Thomas,

we do not know any general problems regarding parameter OPTIMIZE_OPERATOR_JOIN in version 7.6.03.15. So we would be interested in finding the cause of the problems on your system. To be able to do that we would need:

- the table definitions including existing primary keys and indexes

- the statistics values for all relevant tables - which sample size did you use to get the statistics?

- the execution plan (for both parameter settings)

- the 'explain sequence' command (for both parameter settings)

- if possible it would be good to have the table content as well.

Then we can try to reproduce the problem on our systems.

Please let me know if you can provide this data.

Thanks, Melanie

thomasschulz2
Participant
0 Kudos

Hi Lars,

>

> Hi Thomas,

>

> of course you can continue to follow a try-and-error approach to solve your issue.

> Good luck with that.

>

> I specifically prompted to deliver information, like the DBAnalyzer warnings, the execution plans of the actual queries etc.

>

> With the information you've provided I could also take a deep look into a glass bowl as well ...

> Therefore I don't want to spend more time on guessing - it doesn't help you or me.

>

sorry about don't delivering the needed informations to you. As you can imagine we had a lot of stress and searched in various directions for a quick solution. I understand, that you can't help us without this information and now will try to catch this up.

Because the usage of the analyzer is new for me, I needed some time to start it. I activated it on thursday with 2 different configurations for about 10 minutes and can provide the results from that. How can I submit them to the forum? I found no way to attach it to this forum post. Is the way as described under "How to contribute" the right way?

Unfortunately I forgot the activate TIME MEASUREMENT. I hope that this does not have any negative effects for the results.

I'm not sure how to get the execution plan. Is the output of an EXPLAIN enough?

> To get an overview over the changes between the two builds of MaxDB you may want to use the "My Release Notes" feature on http://maxdb.sap.com/webpts.

>

I extracted the differences from PTS already in march and found some interesting changes for us. If I look now into the list, patch no. 1128545 sounds a little bit like our problem.

> Information on the parameter meaning can be found in the [MaxDB Wiki|https://www.sdn.sap.com/irj/sdn/wiki?path=/display/maxdb/main], the [documentation|http://maxdb.sap.com/documentation/] and the [Internals Course Materials|http://maxdb.sap.com/training/].

>

The materials from the Internals Course are already completely downloaded on my workstation and found it very helpful in some situations. Until now I unfortunately don't found the time to work through it intensively.

> KR

> Lars

Regards,

Thomas

thomasschulz2
Participant
0 Kudos

Hello Melanie

>

> Hello Thomas,

>

> we do not know any general problems regarding parameter OPTIMIZE_OPERATOR_JOIN in version 7.6.03.15. So we would be interested in finding the cause of the problems on your system. To be able to do that we would need:

> - the table definitions including existing primary keys and indexes

> - the statistics values for all relevant tables - which sample size did you use to get the statistics?

> - the execution plan (for both parameter settings)

> - the 'explain sequence' command (for both parameter settings)

> - if possible it would be good to have the table content as well.

> Then we can try to reproduce the problem on our systems.

>

> Please let me know if you can provide this data.

> Thanks, Melanie

I'm very interested in helping to explore the problems that occured at our system. I try to collect all the data and then send it to you. I think that I can reproduce the effect with our test system, so that I don't have to switch our live system again and can also test other configurations if needed.

Therefore I have here the same two questions as in my previous answer to Lars about how to get the "execution plan" and how to submit the data.

Regards,

Thomas

Melanie
Advisor
Advisor
0 Kudos

Hello Thomas,

as long as the collected data does not exceed 4 MB you can send it to me via email.

Maybe we should continue the discussion about this via email and only post the results in this thread.

The 'execution plan' is the result of the EXPLAIN statement (EXPLAIN SELECT ...).

For the EXPLAIN SEQUENCE command just add these two words in front of your SQL Statement and execute it.

Thanks,

Melanie

Melanie
Advisor
Advisor
0 Kudos

Hello,

these are the results of the analysis:

the problem is caused by a bug in the optimizer which will be solved with PTS

http://maxdb.sap.com/webpts?wptsdetail=yes&ErrorType=0&ErrorID=1156104

Instead of the key column an index was used for the joining of the two tables.

Setting parameter OPTIMIZE_OPERATOR_JOIN to NO would not be necessary.

Instead, the used index could have been disabled or the access strategy could have been set with hints.

Best regards,

Melanie

thomasschulz2
Participant
0 Kudos

Hello Melanie,

I've now tested new community version 7.6.06.03 and can approve that the bug is fixed.

Regards,

Thomas