cancel
Showing results for 
Search instead for 
Did you mean: 

How to trigger alert 65 runtime of the log backups currently running?

Former Member
0 Kudos

Hi everyone,

I have an issue with the log backups. I want to trigger this alert in my system.

The information of this alert is as follow.


Check name

Runtime of the log backups currently running

User Action

Investigate why the log backup runs for too long, and resolve the issue.

Threshold

30 s, 300 s, 900 s

TestsYES

Stake Holder

Backup & Recovery

Source of information

system view SYS.M_BACKUP_CATALOG

Motivation

Inform about problems with log backup.

Interval

60 s

Internal Name

BACKUP_LONG_LOG_BACKUP

ID

65

Description

Determines whether or not the most recent log backup terminates in the given time.

Alert Text

A log backup with ID <id> has been running for longer than <seconds> seconds.

However, my test system doesn't have so much data, so the log backups is very tiny, and the duration is usually less than 1 second. Hence this alert can't show up in my system.

I have tried to import the row table into the system to increase the data volume, the log backups increased. But it didn't trigger the alert.

And also i change the position of the log backup file by using the nfs server, to increase runtime of log backups by influence of internet at the same time. Unfortunately, it didn't work, either.

The attachments are the pictures about the information of my test and my system.

I really don't know what i should do to trigger it, so i don't know how to fix it,either. And now i try to seek the help from you.

Look forward to you kindly replying.

Thank you all very much in advance.

Best regards,

Geo

Message was edited by: Tom Flanagan

Accepted Solutions (1)

Accepted Solutions (1)

former_member182114
Active Contributor
0 Kudos

Hi Geo,

Usually I don't want a alert poping 😉

Do you have a system production, maybe, with this and you want to reproduce it in DEV landscape?

The alert just say that there's a backup running and not finished yet, if this is the case you need to check where it are being raised.

Best regards, Fernando Da Rós

Former Member
0 Kudos

Hi Fernando,

Because recently i'm writing the KBAs, I need to trigger these alerts in my own system, and then find the method to solve this alert.

However, the systems that i own are all the test system, with not showing so many alerts, so it's a little hard to trigger this alert. I don't know the detailed operation about how to make the log backup running for a long time beyond the setting time. So i come here to share with and ask for help.

Thanks.

Best regards.

Geo

lbreddemann
Active Contributor
0 Kudos

hmm... this approach seems a little running backwards to me.

First you want to induce a situation that triggers a specific alert and then you want to find ways to stop the alert.

Well, easy, stop inducing the situation in the first place

What I say is: you will most likely end up with the correct solution for your artificial alert but with a wrong (e.g. not working) solution for what usually would cause the alert to appear.

For example you could easily go on create let's say a "lock being kept too long" kind of alert.

The "solution" would likely be to kill the lock holding transaction,

However, in real life this is barely ever the root cause. Something made the transaction run a long time. Maybe some other transaction blocked it. Maybe some internal coding was inefficient. But this will be overlooked since the only concern was to switch off the alarm again.

my 2 cts on this approach.

Lars

former_member182114
Active Contributor
0 Kudos

Hi Geo,

I agree with Lars, a KBA based on faking root causes will only be usefull to user see the same alert screen shoot as you can't predict what will happen for him.

And as a matter of fact the best answer is already inside the alert. "Take a look to see why the backup is taking so long". Maybe network, maybe locks, maybe writing a tape.

I belive in live KBA's, that supporting time can help enhancing it taking from customer messages what is usually happen and how solve it.

Best regards, Fernando Da Rós

Former Member
0 Kudos

Hi Lars,

I agree with you in some degree.

However, first I think the process of the triggering is just the process of the solving. And although this solution may not be the wrong one for the real life in many situations, it's not impossible. So it has some value in some way.

Undeniably, as what Fernando says, a KBA based on faking root causes will only be useful to user see the same alert screen shoot as you can't predict what will happen for him. But now it's my task to write the KBA, that I must do. So what I can do is try my best to make the solution to be close to the one that customers need.

And Sincerely thank you and Fernando for your kind reply.

PS: my thought to the alert

Normally there are 2 reasons for this alert:

  • Backup catalog is too large.

          Usually if the system runs for a certain long time, maybe
          half year, one year or longer, so because of automatics log backup, the backup
          catalog will become very large, backup performance is bad.

  • Log segment is too large.

           For this reason, it usually contains 3 situations:

    1. Parameter configuration problem
    2. Huge workload problem
    3. I/O problem

Best regards,

Geo

Former Member
0 Kudos

Hi Fernando,

Thanks for your kind guidance first.

I think both you and Lars are right. And I have give my thought in the reply to Lars.

Sincerely hope to become good friends with you and Lars.

Best regards,

Geo


lbreddemann
Active Contributor
0 Kudos

Really?

The time required for a log backup is a function of the size of the backup catalog file?

The first thing that comes to my mind here is "Why would that be the case?".

The log backup history file, just as the other log files, is an append only file. We always write to the end of it.

So, the actual size of the file surely shouldn't matter.

Anyhow - I did not mean to discourage you from writing KBAs or anything.

Keep going!

- Lars

former_member182114
Active Contributor
0 Kudos

Hi Geo,

Ok. Understood your points.

If I were you I would do two things:

1) Create a view that abuses of paralellism doing joings againts joins to burn the system

2) Reduce the threhoulds. Follow Configuration -> statisticsserver.ini -> statisticsserver_monitor_BACKUP_LONG_LOG_BACKUP_RUNNING -> warning 1

currently: BACKUP_LONG_LOG_BACKUP[] > THRESHOLD_BACKUP_LONG_LOG_BACKUP_RUNNING_WARNING_LEVEL_1

maybe if you TEMPORARILY change to : BACKUP_LONG_LOG_BACKUP[] > 1.

Don't know if this will work but if works and this is in seconds, you have a chance to get tired of seein such alert.

Regards, Fernando Da Rós

Former Member
0 Kudos

Every log backup doesn't only back up a single log, it also backs up the catalog. If the catalog size is 1 MB or 5 MB, it will not massively impact the runtime. But if the backup size is e.g. 200 MB, it can have a big impact, because the log sizes can then be smaller than the catalog (e.g. 64 MB for statisticsserver log, 8 MB for xsengine log). I have seen a system with a backup catalog having a size of 23 GB (and 54 million records). You can imagine that each individual log backup took ages. So I agree with Geo that the catalog size is an important factor for the log backup performance.

The easiest way to slow down the log backup due to a large backup catalog is to force an error at the beginning of the log backup (e.g. due to a erroneously set mountpoint). Then SAP HANA will permanently try to start the log backup and fail. Every failure will be recorded in the backup catalog  and so after some hours you will have millions of entries. If you do this you have to make sure that you are at least on revision 74.01, because with older revisions the cleanup of the catalog will take ages due to an inefficient record-wise deletion strategy.

Former Member
0 Kudos

Hi Fernando,

I agree with you badly. And I use this way in triggering the alert, I made it. I have attached my result picture in the attachment.

And Martin take the same point with me. According to my test and his experience on this alert, it indicates my thought is right.

I change the interval of the log backup into 0, which means logs are backed up only when a log segment is full and when services are restarted(e.g. .jpg,655.jpg). Then we can trigger the alert.(e.g.657.jpg & 658.jpg). The default value of the log segment is global 1024MB, nameserver 64MB, statistics server 64MB. So in the 658.jpg, we can see the running time increased with the changing of the log backup. Of course at the same time i change the threshold into 1s, 2s, 3s.


As the catalog's test, i think it's the same with this. And Martin has given a very detailed explaining, it's great.


Finally thanks all your help for me for this. I was inspired by talking with all of you.

Thanks and thank you very much.



Best regards,

Geo

Former Member
0 Kudos

Hi Martin,

Of course it is. I agree with you.

Now i have succeeded in triggering this alert. I have give my point in the dialogue of replying Fernando. Before this, Hendrik had mailed me and told something about your thought, it's very helpful indeed.

Thanks a lot.

Best regards,

Geo

Former Member
0 Kudos

Hi Lars,

Of course I see. It's fine.

We both aim to solve the problem and learn more about the technique, only share our all opinions with others, can we get promoted and SAP HANA can become better, can we make the world better.

I'm glad to share my question and view with you and thanks for your kind replying.

Thank you.

PS: I have give my point in the dialogue of replying Fernando. Pleasure to discuss the problem with all of you.

Best regards,

Geo

lbreddemann
Active Contributor
0 Kudos

Gosh - forgot about that one...

Ok, yes, creating large log_backup_0_0_0_0* files will add to the overall time taken for the backup.

Not sure though, why this is implemented as a sequential action.

I think it would be easier to start the copy of the backup catalog in to the file together with the actual log/data backup and once both IO jobs are finished to simply update both backup catalogs via append.

Anyway, we've got to work with what we have right now, I guess.

Thanks for pointing this out Martin!

- Lars

Answers (2)

Answers (2)

Jochinnabathini
Contributor
0 Kudos

This message was moderated.

former_member182114
Active Contributor
0 Kudos

Thank you Geo, Martin and Lars.

and I end up learning something new 😉

The pleasure was mine too