on 04-22-2015 3:52 PM
Hi folks,
I just got back to work after a 3 week break and proceeded to monitor HANA system administration as I normally would and I noticed that both Index Server/CPU and Index Server/System CPU KPI stopped updating several days ago when looking at admin console/performance/load tab. We are currently on REV 82. For example I can see CPU graph up and down every day until April 16 when it suddenly just stops graphing. I tried different computers and different rev of HANA studio and still no chart so I know it's not specific to my computer or some sort of interface glitch. I was guessing that this is updated by statistics server (now integrated into indexserver process) and I can indeed see these threads in performance/thread tab so I'm not clear why this stopped updating. Any clues?
Thanks,
-Patrick
Attached screenshot;
This information is collected by the nameserver and it is based on the nameserver_history.trc file. In order to pinpoint the problem I would at first do the following:
- Check if nameserver_history.trc is still growing
- Check if other performance figures (e.g. ping time) is also not showing any changes
- Check in the threads overview if a nameserver thread is permanently stuck
- Check if there are specific SAP HANA parameter settings that could explain that the normal load graph behavior is changed
- Check if the host itself is really still busy (not that there was a failover and now it is completely idle)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks guys. Martin I'm looking for errors in the nameserver trace files but i'm not sure they are related yet as I see similar errors in older trace files. Such as;
trex.... read from channel failed
trexnet....reading failed with timeout
indexserver not answering (I did not see this one in older traces though)
Amerjit, I tried your select and indeed was able to read and get result back.
-Patrick
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Without knowing the context it is hard to confirm that the trace entries are linked, but in fact the main task of the nameserver in this context is, to get in touch with the indexserver in order to get the load data. So if the indexserver is not answering, it could explain some missing data.
In any case I would be worried if the indexserver doesn't answer a nameserver request. How often do you find this information in the trace?
Unfortunately today we had major performance issues and were forced to restart production. It fixed the performance issue and no surprise but it fixed the charting issue too! I wouldn't be surprised if the two events were somehow related and there was something funky going on with indexserver but now lost the opportunity to troubleshoot further. Anyway thanks for your help everyone!
-Patrick
Thanks guys. Ok I have looked at nameserver_history.trc and it is indeed growing. Also PING TIME is indeed updating. I'm not seeing any indication of host being too busy. Looking into parameter settings next...
-Patrick
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Okay, this indicates that the nameserver history collection itself is still working, but for some reasons it has problems to retrieve some information. I doubt that this is related to SAP HANA parameters. Instead I would do the following:
- Check the nameserver trace files for related error messages
- Check which figures of the load graph are no longer displayed properly. If it's e.g. OS related figures like CPU and host memory, there may be a problem for nameserver to retrieve this information via OS calls.
Hello Patrick,
As you've already determined that the information is being written to the nameserver trace file.
In addition to what Martin has already suggested, could you try this from Studio to see if you can actually read the file.
select content from PUBLIC.M_TRACEFILE_CONTENTS where host=? and file_name=?
host = <your host>
file_name = namserver_history.trc
The above statement should at least prove that you are able to read the file.
Cheers,
Amerjit
Hey Patrick,
I realised afterwards that maybe I wasn't clear enough.
Let's try this.
1. select count(*) from PUBLIC.M_TRACEFILE_CONTENTS where host=? and file_name=?
Run the above periodically to see if the count value increments.
2. On the OS, run a tail -f on the indexserver trace file.
3. Go to the load tab and set the refresh interval to 15 secs and then see what happens in the window with the tail command. (do you see trexnet errors or any other errors) ?
4. See if the count value from step (1) has increased.
It's not easy to advise you as for certain things you just have to be in front of the screen looking at various logs and something you see triggers a thought.
Let us know how you get on.
Cheers,
Amerjit
Hi Patrick, Was there an Embedded Statistics Server(ESS) migration done on this system any time?
Can you please check if some statistics collection objects are in disbaled state using the below query?
select * from "_SYS_STATISTICS"."STATISTICS_SCHEDULE" where status = 'Disabled' and statusreason = 'timeout' You can resolve this by updating the collectors by using the below Update statement update "_SYS_STATISTICS"."STATISTICS_SCHEDULE" set status = 'Idle' where status = 'Disabled' and statusreason = 'timeout'
Let me know if it helps
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.