Stored procedure runs extremely slow in certain scenarios
In our batch scheduler, we call the same stored procedure (let's call it "my_proc") twice.
batch job 1a: execute my_proc("dinosaurs")
batch job 1b: execute my_proc("humans")
Now I list it as 1a and 1b because they can run in any sequence (normally 1a always runs first, followed by 1b 2 minutes later in reality).
When 1a runs first, followed by 1b, the result is:
1) "dinosaurs" ends up dealing with about 30k records and runs in about 6 seconds
2) "humans" ends up dealing with about 180k records and runs in about 20 minutes
It's been this way for a long time, so we tried a couple things to improve the situation to confusing results:
Experiment 1) Recompiled the stored procedure and then executed batch job 1b, so that it'd store the query plan for this first run and it finished in about 35 seconds, which is a vast improvement from 20 minutes. Next, batch job 1a ran that night in its normal 6 seconds. About 24 hours later, we let the batch run on its own so 1a runs first in 6 seconds, but 1b runs in its normal 20 minutes again!
Experiment 2) Force job 1b to always run first, followed by 1a. This resulted in job 1b always running in 35 seconds, followed by 1a running in 6 seconds!
Now one solution we have is to force a dependency so that 1a depends on 1b before execution, however we'd like to decouple these jobs as they really shouldn't depend on each other from a data perspective. Does anyone know what the problem is and what the possible solutions may be?
One theory I have is that the procedure cache is coming into play here, but, if that is the case, I don't know how to fix this unless I create a new stored procedure maybe with the same name? (Before you ask me to try this, we can't replicate this in a test environment as they always run fast with the same data set for some reason, trying various combination of execution sequences. Production is where we are experiencing this performance issue.)
Thank you in advance!
Mark A Parsons replied
When a proc is executed Sybase will attempt to re-use any query plans that still reside in cache. If no such plans reside in cache then the proc is (re)compiled and a new set of query plans is generated/used.
Over time if a proc is not used its query plans can be flushed from cache, thus requiring a (re)compilation the next time the proc runs. (This would explain what you saw in Experiment #1, ie, after 24 hours the query plans were flushed from cache so 1a generated a new set of plans when it ran, followed by 1b re-using a 'bad' set of plans.)
If you don't plan on running this proc very often (eg, thousands of times a day) you can tell Sybase to recompile the proc when it's run. You'll incur a small amount of cpu/compilation overhead each time the proc is compiled, but said compilation time is likely much smaller than the savings you'll get from making sure 1b always gets a 'good' query plan.
You have a couple (relatively) easy options for forcing the proc to recompile when it's run:
1 - explicitly force a recompilation when you execute the proc; this example assumes 'isql' command line (I'm not sure how you'd add the 'with recompile' clause to your application):
execute my_proc "humans" with recompile
NOTE: If you do run this proc a lot, you could supply the 'with recompile' clause for just those instances where you know you need to generate a new/good set of query plans.
2 - force a recompilation every time the proc is executed
create proc my_proc (@arg varchar(100)) with recompile as ... snip ...
As for why test always runs fast but prod shows slowness, could be a few different issues ...
1 - different index structures in the 2 environments
2 - columns with different datatypes in the 2 environments
3 - different volume or distribution of data in the 2 environments
4 - different sets of stats in the 2 environments (eg, test has had 'update index stats' run, while prod had 'update stats' run)
5 - test always has a copy of the fast/1b query plan in cache
Addressing any/all of these issues may allow 1b to run fast all the time in prod (without the 'with recompile' clause), but that'll require digging into the 2 environments to see where the performance difference is coming from ...