on 10-13-2014 4:28 AM
Hi everyone,
I have an issue with the log backups. I want to trigger this alert in my system.
The information of this alert is as follow.
Check name | Runtime of the log backups currently running |
---|---|
User Action | Investigate why the log backup runs for too long, and resolve the issue. |
Threshold | 30 s, 300 s, 900 s |
Tests | YES |
Stake Holder | Backup & Recovery |
Source of information | system view SYS.M_BACKUP_CATALOG |
Motivation | Inform about problems with log backup. |
Interval | 60 s |
Internal Name | BACKUP_LONG_LOG_BACKUP |
ID | 65 |
Description | Determines whether or not the most recent log backup terminates in the given time. |
Alert Text | A log backup with ID <id> has been running for longer than <seconds> seconds. |
However, my test system doesn't have so much data, so the log backups is very tiny, and the duration is usually less than 1 second. Hence this alert can't show up in my system.
I have tried to import the row table into the system to increase the data volume, the log backups increased. But it didn't trigger the alert.
And also i change the position of the log backup file by using the nfs server, to increase runtime of log backups by influence of internet at the same time. Unfortunately, it didn't work, either.
The attachments are the pictures about the information of my test and my system.
I really don't know what i should do to trigger it, so i don't know how to fix it,either. And now i try to seek the help from you.
Look forward to you kindly replying.
Thank you all very much in advance.
Best regards,
Geo
Message was edited by: Tom Flanagan
Hi Geo,
Usually I don't want a alert poping 😉
Do you have a system production, maybe, with this and you want to reproduce it in DEV landscape?
The alert just say that there's a backup running and not finished yet, if this is the case you need to check where it are being raised.
Best regards, Fernando Da Rós
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Fernando,
Because recently i'm writing the KBAs, I need to trigger these alerts in my own system, and then find the method to solve this alert.
However, the systems that i own are all the test system, with not showing so many alerts, so it's a little hard to trigger this alert. I don't know the detailed operation about how to make the log backup running for a long time beyond the setting time. So i come here to share with and ask for help.
Thanks.
Best regards.
Geo
hmm... this approach seems a little running backwards to me.
First you want to induce a situation that triggers a specific alert and then you want to find ways to stop the alert.
Well, easy, stop inducing the situation in the first place
What I say is: you will most likely end up with the correct solution for your artificial alert but with a wrong (e.g. not working) solution for what usually would cause the alert to appear.
For example you could easily go on create let's say a "lock being kept too long" kind of alert.
The "solution" would likely be to kill the lock holding transaction,
However, in real life this is barely ever the root cause. Something made the transaction run a long time. Maybe some other transaction blocked it. Maybe some internal coding was inefficient. But this will be overlooked since the only concern was to switch off the alarm again.
my 2 cts on this approach.
Lars
Hi Geo,
I agree with Lars, a KBA based on faking root causes will only be usefull to user see the same alert screen shoot as you can't predict what will happen for him.
And as a matter of fact the best answer is already inside the alert. "Take a look to see why the backup is taking so long". Maybe network, maybe locks, maybe writing a tape.
I belive in live KBA's, that supporting time can help enhancing it taking from customer messages what is usually happen and how solve it.
Best regards, Fernando Da Rós
Hi Lars,
I agree with you in some degree.
However, first I think the process of the triggering is just the process of the solving. And although this solution may not be the wrong one for the real life in many situations, it's not impossible. So it has some value in some way.
Undeniably, as what Fernando says, a KBA based on faking root causes will only be useful to user see the same alert screen shoot as you can't predict what will happen for him. But now it's my task to write the KBA, that I must do. So what I can do is try my best to make the solution to be close to the one that customers need.
And Sincerely thank you and Fernando for your kind reply.
PS: my thought to the alert
Normally there are 2 reasons for this alert:
Usually if the system runs for a certain long time, maybe
half year, one year or longer, so because of automatics log backup, the backup
catalog will become very large, backup performance is bad.
For this reason, it usually contains 3 situations:
Best regards,
Geo
Really?
The time required for a log backup is a function of the size of the backup catalog file?
The first thing that comes to my mind here is "Why would that be the case?".
The log backup history file, just as the other log files, is an append only file. We always write to the end of it.
So, the actual size of the file surely shouldn't matter.
Anyhow - I did not mean to discourage you from writing KBAs or anything.
Keep going!
- Lars
Hi Geo,
Ok. Understood your points.
If I were you I would do two things:
1) Create a view that abuses of paralellism doing joings againts joins to burn the system
2) Reduce the threhoulds. Follow Configuration -> statisticsserver.ini -> statisticsserver_monitor_BACKUP_LONG_LOG_BACKUP_RUNNING -> warning 1
currently: BACKUP_LONG_LOG_BACKUP[] > THRESHOLD_BACKUP_LONG_LOG_BACKUP_RUNNING_WARNING_LEVEL_1
maybe if you TEMPORARILY change to : BACKUP_LONG_LOG_BACKUP[] > 1.
Don't know if this will work but if works and this is in seconds, you have a chance to get tired of seein such alert.
Regards, Fernando Da Rós
Every log backup doesn't only back up a single log, it also backs up the catalog. If the catalog size is 1 MB or 5 MB, it will not massively impact the runtime. But if the backup size is e.g. 200 MB, it can have a big impact, because the log sizes can then be smaller than the catalog (e.g. 64 MB for statisticsserver log, 8 MB for xsengine log). I have seen a system with a backup catalog having a size of 23 GB (and 54 million records). You can imagine that each individual log backup took ages. So I agree with Geo that the catalog size is an important factor for the log backup performance.
The easiest way to slow down the log backup due to a large backup catalog is to force an error at the beginning of the log backup (e.g. due to a erroneously set mountpoint). Then SAP HANA will permanently try to start the log backup and fail. Every failure will be recorded in the backup catalog and so after some hours you will have millions of entries. If you do this you have to make sure that you are at least on revision 74.01, because with older revisions the cleanup of the catalog will take ages due to an inefficient record-wise deletion strategy.
Hi Fernando,
I agree with you badly. And I use this way in triggering the alert, I made it. I have attached my result picture in the attachment.
And Martin take the same point with me. According to my test and his experience on this alert, it indicates my thought is right.
I change the interval of the log backup into 0, which means logs are backed up only when a log segment is full and when services are restarted(e.g. .jpg,655.jpg). Then we can trigger the alert.(e.g.657.jpg & 658.jpg). The default value of the log segment is global 1024MB, nameserver 64MB, statistics server 64MB. So in the 658.jpg, we can see the running time increased with the changing of the log backup. Of course at the same time i change the threshold into 1s, 2s, 3s.
As the catalog's test, i think it's the same with this. And Martin has given a very detailed explaining, it's great.
Finally thanks all your help for me for this. I was inspired by talking with all of you.
Thanks and thank you very much.
Best regards,
Geo
Hi Lars,
Of course I see. It's fine.
We both aim to solve the problem and learn more about the technique, only share our all opinions with others, can we get promoted and SAP HANA can become better, can we make the world better.
I'm glad to share my question and view with you and thanks for your kind replying.
Thank you.
PS: I have give my point in the dialogue of replying Fernando. Pleasure to discuss the problem with all of you.
Best regards,
Geo
Gosh - forgot about that one...
Ok, yes, creating large log_backup_0_0_0_0* files will add to the overall time taken for the backup.
Not sure though, why this is implemented as a sequential action.
I think it would be easier to start the copy of the backup catalog in to the file together with the actual log/data backup and once both IO jobs are finished to simply update both backup catalogs via append.
Anyway, we've got to work with what we have right now, I guess.
Thanks for pointing this out Martin!
- Lars
This message was moderated.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thank you Geo, Martin and Lars.
and I end up learning something new 😉
The pleasure was mine too
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
90 | |
10 | |
10 | |
10 | |
7 | |
7 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.