cancel
Showing results for 
Search instead for 
Did you mean: 

question about Agents on the fly and shared NAS mounts..

bernie_krause
Participant
0 Kudos

ok, so we're starting to look at moving to Agents on the Fly configuration, and on the system I've tested this with I'm very much liking the results.  Had a brief conversation with our Unix guy however, and this raised a question about how we have our NAS mounts configured and if this will work with AotF.

We have /usr/sap mounted locally to our hosts, and then we have /<sid>/ whatever on a NAS mount that is shared between hosts.  We have up to 4 hosts (primary/secondary and then second failover set) that all share the NAS.  Since the AotF is installed on the shared mount, how does it pick up the new host name if the primary fails and the host moves to the secondary or failover servers?  These are active/passive, so only one server at a time is actively using the connection.

When the Agent is initially installed, it identifies the host name where it is installed, but when the NAS flips to another server, that host name no longer matches the agent.  Do we simply need to create profiles for all possible hosts in the profile directory?  But then how does runtime.properties fit into the picture, since it only contains 1 host name??

I know we've had problems with the host_profile being corrupted because of being installed on a shared drive - does AotF suffer from similar issues?  Or do we need to install them to a truly local drive?

So many questions... lol

Thanks.
Bernie Krause

Accepted Solutions (0)

Answers (1)

Answers (1)

bxiv
Active Contributor
0 Kudos

To paint a picture in my mind you have the following:

ServerA - /usr/sap is local - /SID/ is a soft link to NAS1

ServerB - /usr/sap is local - /SID/ is a soft link to NAS1

If this is correct, are ServerA and B setup in a cluster?

I think the issue is that you are install DAA to the NAS, it should be local to the server/system; there is really not much benefit to having it sit on a NAS for redundancy or HA as it can easily be installed. The real pita is within SolMan having to re-associate things to the correct agent( s ).

bernie_krause
Participant
0 Kudos

Billy - You're correct, that's how we have it configured.  Unfortunately I have zero influence over that, someone decided long ago that this was the way to go.

So does AotF even work or make sense in this scenario?  Or would simply installing the agent on all the hosts then set up the correct profiles so that they would switch automatically, or does this not work?  I'm trying to get our Basis team to install at host level, but now I'm not sure if it buys us anything.

Thanks.

Bernie

After further discussion internally it looks like this really won't work. Since there is only one runtime.properties file, there is no way to correctly identify the host server using the configuration.  We end up with far more configuration headaches than we currently have.  So our options are to continue on with installing agents on our logical vips or to convince someone to give us a permanent local mount on each server to install the agents (and the host agents which keep getting corrupted due to shared mount failover issues)..

grr.....

bxiv
Active Contributor
0 Kudos

AotF does make sense as you have multiple SIDs that you need to account for without needing to install multiple SMD agents to both systems.

Have you thought about monitoring the failover system of the cluster?  If the SMD agent is on a NAS share and clustered to move with the resources you can't monitor the other OS via SolMan (which also means no reporting on performance); having both would ensure a failover won't fall on its face because a service died prior to the cluster move. 

I am willing to bet that your infrastructure team will also come back with something along the lines of, "We already monitor the systems to know if its available or not", which is a valid point the counter argument is that you can not provide a complete picture of the environment via SolMan without SMD agents reporting the status and health of the system( s ); I have not come across a way to tie a 3rd party monitoring system's metric information into SolMan's metrics.

Perhaps your EWAs reaching your CIO/CTO and having blank data (for the OS info section) in that would be a good starting point. 

bernie_krause
Participant
0 Kudos

We do actually have failover monitoring currently in place - because we installed with the logical vip the vip follows the failover, and the cluster has startup scripts that "should" start the agent on the failover app server without requiring any reconfiguring .  However, those scripts have not proven to be very reliable, and neither is our network.  The clustering software panics quickly and often and causes weird disconnects with the agents - I was hoping that moving to host based agents would be more stable, not to mention that it picks up the logical vips running on the host beautifully.  Normally we have 3+ app servers per SID, with the primary app server having a domain fssxxx as well as app server vip fssxxxap00 defined - we've not had agents assigned to both in the past and AotF picked both up and made managed system config much simpler. 

And since we need to do that for Oracle Rac anyhow it would be a consistent install.  Oh well, back to the drawing board..

Don't get me started on the infrastructure team, they only monitor after the fact (maybe, if we yell loud enough), nor our CIO as no one knows when he is actually supposed to start... rofl

bernie_krause
Participant
0 Kudos

ok, going to rehash this some more.

Given 4 clustered hosts a/b/c/d

all sharing /usr/sap/SMD

would installing 4 instances 98/97/96/95 uniquely identified for each host work in as a AotF scenario?  It would seem to me to meet the qualifications for AotF - unique instances, nothing shared.  Each instance would be auto started with the host, so there should be no contention.

The bigger problem to me seems to be the fact that we're also sharing the hostctrl mount, so that only one host agent can run at once.  Do the host agents always need to be running on all hosts to allow the failover to work correctly??

more questions coming.. 

Former Member
0 Kudos

Have you read the following wiki about Diagnostics Agent configuration in HA environment:

Diagnostics Agent and HA Support - SAP Solution Manager Setup - SCN Wiki? It contains all required information about installation strategy in HA environment with AotF concept explained.

bxiv
Active Contributor
0 Kudos

This would account for different folders on the NAS being created for the SMD agents, but you would need to ensure that the NAS connection wouldn't migrate upon a cluster issue (not sure if your cluster setup for the NAS is a shared disk with other SIDs or each SID has their own NAS "disk").

I have Windows at my disposal so I can tell you that it installs hostctrl to C:\ with no prompts within the installer to set a destination; since you mentioned a Unix guy earlier I assume that is your OS and I'm not sure where hostctrl is installed by default.

Do they mount the hostctrl folder on you also!?  One could speculate that if the hostctrl is mounted and the mounted connection was unique your hostctrl would be different for each cluster node.  You would run the risk of having OScol data loss if there is a network connection problem to the mount point and the service would die (speculation) and I'm not sure if the service would start backup on its own when the mount point issue was resolved.

bernie_krause
Participant
0 Kudos

Many times. And no, it does not contain all required information, not even close.

bernie_krause
Participant
0 Kudos

Unix is our platform, correct.  We had a smaller SAP installation on Windows with none of these shared mount issues, but that instance is going away. The shared Nas connection would migrate, which is the problem (actually not “migrate” so much as just being there when the failover happened.  Additionally (as you guessed), the hostctrl folder is also shared and migrates, which causes no end of headaches with corrupted host_profile files.  I think this is the first thing I’m going to need to address – probably with a symbolic link to a local server location since as you said it installs to /usr/sap/hostctrl without any prompts (unless there’s some way to override the default?).  The hostctrl agents actually start up properly sometimes on failover, but in a lot of cases the host_profile has all the header information stripped out of it so the host agent cannot start properly.  This is a known SAP issue with shared files for host agents, but someone went ahead and did it that way anyhow. With shared mount NAS – if we had separate SMD instances installed from each host to that instance, would the correct agent start when the failover occurs?  The runtime properties of each would have the host name embedded.  Yes, all agents would be on a shared drive and I realize that a single instance of an agent cannot be shared between different hosts, but if we had 4 instances they would be uniquely defined to each server in the failover environment.  Is that sufficient?  Or do they have to be installed as separate SIDs as indicated in the wiki?  (DAA, DAB, etc).  If installed as separate sids, then the instance numbers could be the same within the agent sids.  From my pov what the wiki needs is more of an explanation of what actually needs to be running for this process to work.  Does the host agent need to be running on each host at all times?  Does it start up when the failover happens?  I would think that it would be running at all times on all hosts.  But that isn’t clear.  And in our case, crucial.

Former Member
0 Kudos

The hostctrl agents actually start up properly sometimes on failover, but in a lot of cases the host_profile has all the header information stripped out of it so the host agent cannot start properly.  This is a known SAP issue with shared files for host agents, but someone went ahead and did it that way anyhow

About SAP Host Agent installation recommendation (from help.sap.com):

In high availability (HA) environments, SAP recommends installing the SAP Host Agent locally on every cluster node (host), because the installation procedure places the SAP Host Agent files into the SAP system-independent directory path /usr/sap/hostctrl. Make sure that this path is a local file system on every host of a high availability environment. Installing the SAP Host Agent into a clustered file system is not supported.


With shared mount NAS – if we had separate SMD instances installed from each host to that instance, would the correct agent start when the failover occurs? 

In case of AotF scenario you need to install Diagnostics Agent on each physical/virtual host and enable AotF feature. They must not be a part of failover scenario therefore it have sense to install it locally on each physical/virtual host. From wiki: "Whenever a new (Logical) hostname is visible on the underlying (Physical or Virtual) host, the Diagnostics Agent (installed initially), will automatically create one additional Agent On-the-fly. However when this Logical hostname is no longer associated with that underlying (Physical or Virtual) host, the Diagnostics Agent will stop and remove again the associated Agent On-the-fly."


Or do they have to be installed as separate SIDs as indicated in the wiki?  (DAA, DAB, etc). 

From comments on wiki:

Q. Hello experts, undortunatelly I do not clearly understant. Could we use the same DAA sid on node A and node B ? If , no, Why ?

A. Yes, you can use the same SID for the Agents (on node A and node B).

The present proposal - to use different SIDs while installating the Diagnsotics Agents on the different physical hosts part of a cluster group - is only a hint to later offer a way to visually distinguish in the Agent Administration UI the Agents On-the-fly running on node A from those running on node B. But this is not mandatory.


Does the host agent need to be running on each host at all times?  Does it start up when the failover happens? 

SAP Host Agent is started automatically when the host is booted.

bxiv
Active Contributor
0 Kudos

The best way I have found to think of SMD and hostctrl agents is that they need to be unique per host, stay with the host, and always be running.

I would also speculate that you could change the hostctrl profile and service file to point to a different folder location, but I would also be willing to bet that it would cause problems with updates down the road and SAP telling you to set it the way they want if you call in with an issue.

Also how does OScol look from SAPGUI, if you were to compare a system with hostctrl vs one that doesn't have hostctrl working?

bernie_krause
Participant
0 Kudos

Billy - between hacking away at it on some test servers and your conversation points, (and reading/rereading/reinterpreting the wiki about 100x) I think I'm starting to see the light.  I'm going to work with our Unix guys to get a local mount for the host agent - that's point 1 without which nothing else works.  Should have done that a while ago - the corrupted host_profile files were getting to be a royal pita anyhow. Then get 1 SMD agent instance per host installed on our shared mount.  Each one will be installed from the appropriate host with a different instance number under the same SID, but this should fulfill the requirement of uniqueness of SMD/Host relationship.  The shared mount is always mounted to each host , not moved during failover, so this should work  as well.  The only time this would break would be if the NAS completely failed, in which case we're hosed anyhow. Once that's done I can start each host agent on each host and also the corresponding SMD agent and they should stay running.  Hopefully.  lol    Thanks! (and Roman too..  😉  )

bxiv
Active Contributor
0 Kudos

Something to think about when you get things in working order is to perhaps post a doc/blog on your experience( s ) to help others demystify the management software with unique setups. 

I know when I first came into monitoring and was confused between CCMS/hostctrl/sapccm4x/sapccmsr/SMD; it wasn't until E2E100 that things clicked together for me.  Lots of SCN readings on the older generation of monitoring, but not a lot of info on the new methods.

bernie_krause
Participant
0 Kudos

I had already thought about doing that, if and when I get this all fixed.  So much lack of information and unclear "official" direction..     So many different possible configurations.

Wonder of wonders, I may even be getting a local mount for all our host agents.    Will wonders never cease.