on 08-05-2016 7:30 PM
Version : Adaptive Server Enterprise/16.0 SP02 PL02/EBF 25318 SMP/P/RS6000/AIX 7.1/ase160sp02plx/2492/64-bit/FBO/Sun Nov 22 18:44:08 2015
OS : AIX xxxxxx 1 7 00CDEF534C00
We have upgraded ASE from 15.5 to ASE 16 .
As per sysmon report . After upgrade "I/O Busy" under Kernel Utilization is creased 5 time compare to version 15.5 during same duration..
Is this known issue ... So most if the now IQ busy showing
Sysmon in ASE 15.5 : 12.1 %. ( before upgrade 10-20% Average) . Only on DB Backup it was used to go high .
Engine Busy Utilization CPU Busy I/O Busy Idle
------------------------ -------- -------- --------
Engine 0 39.2 % 23.0 % 37.9 %
Engine 1 39.4 % 19.4 % 41.2 %
Engine 2 38.4 % 15.9 % 45.7 %
Engine 3 38.7 % 15.7 % 45.6 %
Engine 4 37.6 % 16.4 % 46.0 %
Engine 5 34.8 % 16.7 % 48.5 %
Engine 6 36.9 % 17.4 % 45.8 %
Engine 7 34.7 % 13.4 % 51.9 %
Engine 8 37.7 % 16.7 % 45.6 %
Engine 9 33.2 % 19.2 % 47.6 %
Engine 10 40.7 % 8.1 % 51.2 %
------------------------ -------- -------- --------
Summary Total 411.2 % 181.8 % 907.0 %
Average 27.4 % 12.1 % 60.5 %
ASE 16 : (72.2 %) -- most of the time (70-90%)
Engine Utilization (Tick %) User Busy System Busy I/O Busy Idle
------------------------- ------------ ------------ ---------- ----------
ThreadPool : syb_default_pool
Engine 0 26.7 % 0.2 % 72.8 % 0.3 %
Engine 1 32.8 % 0.0 % 67.1 % 0.2 %
Engine 2 25.4 % 0.5 % 73.8 % 0.3 %
Engine 3 28.2 % 0.5 % 71.2 % 0.2 %
Engine 4 27.5 % 0.2 % 72.0 % 0.3 %
Engine 5 31.5 % 0.0 % 68.5 % 0.0 %
Engine 6 30.1 % 0.2 % 69.7 % 0.0 %
Engine 7 26.7 % 0.3 % 72.7 % 0.3 %
Engine 8 28.7 % 0.3 % 70.8 % 0.2 %
Engine 9 23.7 % 0.3 % 75.6 % 0.3 %
Engine 10 27.8 % 0.3 % 71.8 % 0.0 %
Engine 11 25.5 % 0.5 % 73.8 % 0.2 %
Engine 12 29.3 % 0.5 % 70.0 % 0.2 %
Engine 13 23.4 % 0.0 % 76.6 % 0.0 %
Engine 14 24.5 % 0.0 % 75.3 % 0.2 %
Engine 15 25.9 % 0.5 % 73.6 % 0.0 %
------------------------- ------------ ------------ ---------- ----------
Pool Summary Total 437.7 % 4.3 % 1155.4 % 2.6 %
Average 27.4 % 0.3 % 72.2 % 0.2 %
Thanks
Ajay Pandey
https://scn.sap.com/thread/3764504
Guys can you go to this link . seems there is some change request to fix #757246 in sysmon report .
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi
Check this query
select top 10 * from master..monSpinlockActivity order by Contention desc
and read
Spinlocks and CPU usage in SAP ASE - SAP ASE - SCN Wiki
Hans
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Spin lock monitoring configuration is not configured . But from othee MDA tables . We are seeing a lot of processes showing from monProcess and monWaitEventInfo .. not sure this is spin lock or cache issue .
Event 124: wait for mass read to finish when getting page
Event 124 occurs when a process attempts to perform a physical read but another process has already performed the read request (this also counts as a “cache miss”).
A wait caused by a physical read (most likely a cache miss) which occurs when Adaptive Server does not find a page in the data cache and must read it from disk. The number of Waits is the number of physicalreads that occurred because of a cache miss. Use the monSysWaits.WaitTime value to derive I/O response times
Before you get too worried about increased physical io, have you compared your overall throughput performance? If throughput has greatly improved, increased physical io may not be a bad thing - it could indicate that ASE has become more efficient at processing queries and so is processing more transactions per minute and so simply hitting more cases where a physical io is needed per minute.
An increase in the two wait events listed above seems consistent with an increased rate of physical io.
-bret
Hi Ajay,
Forget the i/o busy comparison to the earlier version, it is meaningless on the threaded kernel.
A few years ago I tried to come up with a way of massaging the i/o busy percentage into a useful metric in terms of determining whether a server is i/o bound. The conclusion was you simply can't. There was no predictable correlation between the amount of i/o and the legacy i/o busy percentages.
There is no concept of a user engine thread being i/o busy (that is apart from synchronous i/o which you almost certainly won't be using).
You need to look at the number of i/os over a given period, and whether there are any polls returning max events. This combined with the thread utilisation of the DiskController are the key metrics.
If you compare the actual number of i/os over a given period, that will give you something more comparable.
In practice, out of our 100 or so production servers, only 2 of them have very small periods where an extra disk i/o thread is needed.
This is from our wiki site.
And a little more:
...
This method is still sound, but there is a key difference when it comes to @@io_busy. In the old process mode kernel then I/O was issued by an engine itself, therefore when the OS sends an interrupt every tick, an engine would register as i/o busy if indeed it was in the scheduler and has i/o pending. This i/o busy was counted against that engine (as it should be of course)
So say you had 10 engines and you were capturing 10 seconds worth of activity, so that's '100 ticks', if engines 1 and 2 registered i/o for 98 of those ticks each, cpu busy for 2 of them, no idle, and all other engines were 50% cpu, 50% idle, then this would equate to:-
10x100 ticks =1000 ticks to be accounted for.
196 ticks i/o
404 ticks cpu busy
400 ticks idle
So over the period, we have 19.6% io busy, 40.4% cpu, 40% idle
The i/os in threaded kernel whilst initially issued by an engine thread will be actually be returned via the disk i/o thread in syb_system_pool. For this reason the concept of i/o busy for an engine thread is a misnomer.
The real key point that needs to be understood with threaded kernel diagnostics, is that if there is ANY outstanding i/o being waited on by the i/o thread then for ANY engine that is 'officially' idle, it will be reported as i/o busy. The theory is the alternative is to have zero i/o busy, which would have been worse (maybe), those outstanding i/os could be for any task which could be run any engine's run queue, so I guess what it is saying is that if that it's only idle as the i/o hasn't come back.
Anyway, what this means in terms of diagnostics is that for the situation above the io_busy could report significantly higher.
Let's say there was outstanding i/o on half of the ticks.
Now, the 2 engines before that were 98% i/o busy are in theory now 98% idle, but of course they wouldn't be as they would now likely be doing a little more cpu as the task(s) requiring the i/o could be processed anywhere.
So, let's say they are now (this is an entirely arbitrary figure for the purposes of illustration) 50% cpu, that leaves 50% idle, but we have i/o outstanding 75% of the time, to that's 37.5% i/o busy and 12.5% idle.
The other 8 engines were 50% idle, so they are now 50% cpu, 12.5% idle, 37.5% i/o (as the other engines are)
So now, we have:-
500 ticks cpu busy
375 ticks i/o
125 ticks idle
So now the overall utilisation is:-
37.5% i/o busy, 50% cpu, and 12.5% idle.
If you see high I/O busy for engine utilisation via sp_sysmon or indeed via the global variables then yes the indication is that the task or tasks are still heavily i/o bound. However, if there is a bottleneck, you need to dig a little deeper to see where it is.
Actually, your wiki needs some correction - engines *still* submit the aio_read()/aio_write() calls. The only thing the engines do not do is the AIO polling for completion and subsequent run queue manipulation. That is what the disk task does - and the only thing it does.
IO Busy being reported by the engines is the relative percentage of time spent doing the IO calls to issue the reads/writes. There can be a number of factors that influence this - the famous Solaris bug was one such that could drive IO busy to 100%. We have seen others - and in particular when folks insist on using journaled file systems in which the journaling is for more than the metadata (journaled file systems for DBMS storage is useless) or when the file system isn't tuned very well (e.g. we have seen issues on AIX without CIO being forced or when the ASE page size is smaller than the filesystem block size). Linux is especially atrocious as the IO scheduler can get involved....and its rather lightweight desktop tuned IO subsystem can be rather anemic unless aggressively tuned.
Having said all of that, CPU%'s in monEngine and sp_sysmon (engine area) are computed and can be a bit off due the fuzziness of trying to interpret values from the ticks. monThread is the best way to go, but it lacks ticks for IO - merely busy, idle, sleep
As far as some of the other points, WaitEvents are not spinlocks - so high WaitEvents around normal reads (29) and APF's (124) would be in line with the IO busy percent.
What I would suggest is that person investigate:
1) device attributes (make sure dsync is OFF and directio is ON)
2) check if using raw devices or filesystems. In either case, make sure the OS tuning is done with respect to number of AIO's permitted, fastpath enabled, etc.
3) If filesystems, make sure that:
a) filesystem block size is smaller or same size as ASE page size
b) file system is mounted with force cio option
c) check on JFS tuning - normally JFS on AIX is not an issue other than (a) and (b) but sysadmins might have done something weird
d) check HW - there is an issue with AIX when LPARS are using VIO servers and the VIO has iSCSI enabled (I think it was this) and no iSCSI in use.....
However, I also agree with Marcin - this could be a problem of a lot more query issues and resulting cache misses due to not using indexes or using different indexes than previously.
"Actually, your wiki needs some correction - engines *still* submit the aio_read()/aio_write() calls. The only thing the engines do not do is the AIO polling for completion and subsequent run queue manipulation. That is what the disk task does - and the only thing it does"
No real modification needed with respect to that, I have made it quite clear as you can see from these sections taken directly from my post. I am not sure how much more this could agree with what you've just said 🙂
The DiskController thread is entirely dedicated to polling for I/O completions.
The i/os in threaded kernel whilst initially issued by an engine thread will be actually be returned via the disk i/o thread in syb_system_poo
An actual I/O is still issued by a task that would be running on an engine thread at a particular instance (which on the last i/o may have been on another different thread), but it is key to understand that this 'source' thread is irrelevant as it gives no indication of how busy a particular engine is with i/o as when an i/o completes the task can be scheduled to run any other engine thread
I guess it was the starting aspect that said "Disk I/O is processed via the DiskController...." that is a bit misleading. It is correct later, but unless you are already aware of the distinction, someone can be a bit confused. I only mention this because I have seen this a lot where despite it being made clear later in docs, etc., the earlier mentions of "disk I/O being done by the DiskController" seems to be the only thing that sticks in peoples minds. As a result, I always start out by making it clear that the engines actually submit the IOs and the only thing the disk task does is the polling - and try to avoid any statements around disk tasks "handling" the IO's.
It made sense from an engineering perspective to stress the disk task as read/write requests normally should be miniscule time whereas the disk polling was always a heavier load (especially with non-blocking async polling vs. the threaded kernel blocking poll).
However, given that the individual is showing 100% cpu (30% user and 70% IO), unlike others, I don't believe there is not a problem, and this is sort of normal behavior for threaded kernel to show 100% CPU. I think it needs looked into. First thing I would do would be to get a real picture of true CPU usage from monThread. If it shows 100%, then there definitely is an issue. If it shows 50%, then there is an issue with the way IO Busy is being computed for engines in that person's release.
I think maybe you misread his sp_sysmon output due the columns not being aligned, it is 27% CPU and 72% I/O. Not sure where 100% CPU has come from?
It is quite common (in fact fairly standard) for the idle to be very low now compared to the old process mode, it is just the inverse of the whole problem with the i/o busy.
We'll end up with the bad formatting again but I don't think you are correct unless we are talking about different things. We are talking here about engine utilisation aren't we?
When an engine is idle and there is any outstanding i/o this will be registered as i/o busy across all engines. It isn't the relative percentage of time spent actually issuing the read/write calls. This is the whole situation which I explained above. The 'new' idle is effectively idle+i/o busy. For 95% of our business day, User + I/O will be approaching 100%, this is expected for a system that does any reasonable amount of I/O and cpu bound work together.
Hi.
I think sysmon is not good enough to compare performance between ASE versions.
You should check in monitoring tables (MDA) which processes performs most physical I/O operations and find some SQLs / exec. plans. Mabye it is an optimizer issue or just bad query plan... Did you modify any configuration parameters between 15.5 and 16.0 (especially those responsible for optimizer behaviour eg. "optimizer level") ?
Best Regards.
--
Marcin Najs
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Did you change the kernel mode from "process" in 15.5 to "threaded" in 16.0 ?
If so, the comparison of 15.5 Sysmon vis a vis 16.0 Sysmon may not be as expected.
Check the P and T manual section(s) for Sysmon for threaded kernel mode and start interpreting the figures in that light.
HTH
Avinash
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
98 | |
11 | |
11 | |
10 | |
10 | |
8 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.