SQL30081N connection errors when DNS server goes off-line
Currently have a strange problem with some (but not all) database connections when one of our DNS servers goes offline.
We have multiple DNS servers configured in the /etc/resolv.conf file and at least one of these servers is always available to the AIX server and SAP system. If we take down a single DNS server for maintenance the following errors start to occur in some of our SAP systems ->
Database error -30081 requires database administrator to intervene
Database error -30081 at INS access to table SWT_LOGCAT
> SQL30081N A communication error has been detected.
> Communication protocol being used: "TCP/IP". Communication
> API being used: "SOCKETS". Location where the error was
> detected: "10.26.4.41". Communication function detecting th
> error: "selectForRecvTimeout". Protocol specific error
> code(s): "78", "", "". SQLSTATE=08001
Only SAP systems with patch level equal or greater than Kernel 7.01 patch 55 (DBSL patch 53) get the problem, all other systems (on the same AIX host!) which have 7.00 kernels and even a 7.01 patch 23 kernel do not get the problem at all.
To make matters more confusing we adjusted the /etc/resolv.conf file on the production server so it did not reference the DNS server that needs maintenance, out of four systems on that host the one running a 7.01 patch 55 kernel continued to have SQL30081 errors when the DNS host was down until it was restarted. So it took a SAP restart to pick-up an OS DNS change....
We don't believe there is an underlying DNS problem, since the problem only occurs with kernels above 7.01 patch 55. Wondering if anyone else has had similar problems or happens to have a similar mix of kernels and might be able to try and replicate it.
DB2 V9.5 FP5
Kernel 7.01 patch 55 above = bad
Frank-Martin Haas replied
the SQL30081N error is caused by a timeout of a connection request. We have introduced a default timeout for connection requests in a DBSl patch ( see note 1364372).
The main reason for this patch was to prevent that a workprocess gets blocked for a long time trying to open a connection to a server that may be currently unreachable. This has caused trouble for example in solution manager systems that monitor a large number of remote databases.
The default timeout that has been implemented currently is 5 seconds. I have to admit that this setting has been too aggessive since we have seen a number of customer messages reporting SQL30081 errors in the past. Therefore we currently consider to change the default to 20-30 seconds in one of the next DBSl patches.
If your DNS server gets offline a connection request may take longer than 5 seconds because the DB2 client is not able to resolve the database host name during this time. You can circumvent your problem by increasing the connection timeout by setting the SAP profile variable ( see note 1364372).
You may try 30 seconds first. Let me know if this helps.
P.S.: Feedback from other customers regarding the connection timeout is also welcome to determine a good default value.