cancel
Showing results for 
Search instead for 
Did you mean: 

Performance/Load/CPU stopped charting

patrickbachmann
Active Contributor
0 Kudos

Hi folks,

I just got back to work after a 3 week break and proceeded to monitor HANA system administration as I normally would and I noticed that both Index Server/CPU and Index Server/System CPU KPI stopped updating several days ago when looking at admin console/performance/load tab.  We are currently on REV 82.  For example I can see CPU graph up and down every day until April 16 when it suddenly just stops graphing.  I tried different computers and different rev of HANA studio and still no chart so I know it's not specific to my computer or some sort of interface glitch.  I was guessing that this is updated by statistics server (now integrated into indexserver process) and I can indeed see these threads in performance/thread tab so I'm not clear why this stopped updating.  Any clues?

Thanks,

-Patrick

Attached screenshot;

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

This information is collected by the nameserver and it is based on the nameserver_history.trc file. In order to pinpoint the problem I would at first do the following:

- Check if nameserver_history.trc is still growing

- Check if other performance figures (e.g. ping time) is also not showing any changes

- Check in the threads overview if a nameserver thread is permanently stuck

- Check if there are specific SAP HANA parameter settings that could explain that the normal load graph behavior is changed

- Check if the host itself is really still busy (not that there was a failover and now it is completely idle)

Answers (3)

Answers (3)

patrickbachmann
Active Contributor
0 Kudos

Thanks guys.  Martin I'm looking for errors in the nameserver trace files but i'm not sure they are related yet as I see similar errors in older trace files.  Such as;

trex.... read from channel failed

trexnet....reading failed with timeout

indexserver not answering (I did not see this one in older traces though)

Amerjit, I tried your select and indeed was able to read and get result back.

-Patrick

Former Member
0 Kudos

Without knowing the context it is hard to confirm that the trace entries are linked, but in fact the main task of the nameserver in this context is, to get in touch with the indexserver in order to get the load data. So if the indexserver is not answering, it could explain some missing data.

In any case I would be worried if the indexserver doesn't answer a nameserver request. How often do you find this information in the trace?

patrickbachmann
Active Contributor
0 Kudos

I'm seeing this error several times per day.

lbreddemann
Active Contributor
0 Kudos

Hi Patrick,

I think this should be checked via a support incident.

Make sure to link this thread to it as it appears that the preliminary analysis in here is pointing into the right direction of cause of the problem.

Cheers, Lars

patrickbachmann
Active Contributor
0 Kudos

Unfortunately today we had major performance issues and were forced to restart production.  It fixed the performance issue and no surprise but it fixed the charting issue too!  I wouldn't be surprised if the two events were somehow related and there was something funky going on with indexserver but now lost the opportunity to troubleshoot further.  Anyway thanks for your help everyone!

-Patrick

patrickbachmann
Active Contributor
0 Kudos

Thanks guys.  Ok I have looked at nameserver_history.trc and it is indeed growing.  Also PING TIME is indeed updating.  I'm not seeing any indication of host being too busy.  Looking into parameter settings next...

-Patrick

Former Member
0 Kudos

Okay, this indicates that the nameserver history collection itself is still working, but for some reasons it has problems to retrieve some information. I doubt that this is related to SAP HANA parameters. Instead I would do the following:

- Check the nameserver trace files for related error messages

- Check which figures of the load graph are no longer displayed properly. If it's e.g. OS related figures like CPU and host memory, there may be a problem for nameserver to retrieve this information via OS calls.

Former Member
0 Kudos

Hello Patrick,

As you've already determined that the information is being written to the nameserver trace file.

In addition to what Martin has already suggested, could you try this from Studio to see if you can actually read the file.

select content from PUBLIC.M_TRACEFILE_CONTENTS where host=? and file_name=?

host = <your host>

file_name = namserver_history.trc

The above statement should at least prove that you are able to read the file.

Cheers,

Amerjit

Former Member
0 Kudos

Hey Patrick,

I realised afterwards that maybe I wasn't clear enough.

Let's try this.

1. select count(*) from PUBLIC.M_TRACEFILE_CONTENTS where host=? and file_name=?

Run the above periodically to see if the count value increments.

2. On the OS, run a tail -f on the indexserver trace file.

3. Go to the load tab and set the refresh interval to 15 secs and then see what happens in the window with the tail command. (do you see trexnet errors or any other errors) ?

4. See if the count value from step (1) has increased.

It's not easy to advise you as for certain things you just have to be in front of the screen looking at various logs and something you see triggers a thought.

Let us know how you get on.

Cheers,

Amerjit

patrickbachmann
Active Contributor
0 Kudos

According to our administrator who has access at the OS level I had him look into your questions and he replied with this;

Count is increasing 2-4 .. no load graph and no changes in indexserver trace file.

Former Member
0 Kudos

Hi Patrick, Was there an Embedded Statistics Server(ESS) migration done on this system any time?

Can you please check if some statistics collection objects are in disbaled state using the below query?                                                                                                     

select * from  "_SYS_STATISTICS"."STATISTICS_SCHEDULE" where status = 'Disabled' and statusreason = 'timeout'                                                                                                                                                                                                                                  You can resolve this by updating the collectors by using the below Update statement                                                                                                                                                                                                                                 update "_SYS_STATISTICS"."STATISTICS_SCHEDULE" set status = 'Idle' where status = 'Disabled' and statusreason = 'timeout'                                                                                                                                                                                                                                            

Let me know if it helps

Former Member
0 Kudos

Hmm, probably my previous update wasn't precise enough. To state it clearly: The load graph is not related to the statistics server, it's related to the nameserver. So there is no need to start statistics server investigations.

Former Member
0 Kudos

Hi Martin,

I did not notice your comment before

I did some tests to confirm your point about the nameserver_history.trc, so it is not necessary to investigate in the direction of statistics server data collection

Thank you for your input

patrickbachmann
Active Contributor
0 Kudos

To answer your question yes we did the statistics migration so we no longer have standalone statistics service, rather it runs under indexserver (process type statistics thread).

-Patrick