cancel
Showing results for 
Search instead for 
Did you mean: 

hwcct hardware check for storage hang there.

Former Member
0 Kudos

Hello Experts,

HWCCT version: 110, the latest version.

installation file: HWCCT_110_0-20011536.sar

When I run the filesystem check json script, it would stuck there and hangs. there was an error message in red line popped out as below.

./hwval -V -f fs-check.json

Server loop running answering on 0.0.0.0:33639

Checking SSH configuration for blades

Skipping check for myHANA

Starting up clients

Host myHANA is alive on port 39419

Running test 1/2 ...

Configuring blades for test FilesystemTest.DataVolumeIO

All clients configured

Preparing test FilesystemTest.DataVolumeIO on clients

All clients prepared

Starting test FilesystemTest.DataVolumeIO.runTest on clients with timeout 0s

Error in resolving the additional parameter for fsperf

'parameter'

Error in resolving the additional parameter for fsperf

'parameter'

======It stucked here for more than 1 hour, then I precessed CTRL+C to force quite. .====

^C

Aborting.....!

Error in resolving the additional parameter for fsperf

'parameter'

Exception SystemExit: SystemExit() in <module 'threading' from '/hana/shared/hwcct/lib/Python/lib/python2.7/threading.pyc'> ignored

myHANA:/hana/shared/hwcct # Exception KeyboardInterrupt in <module 'threading' from '/hana/shared/hwcct/lib/Python/lib/python2.7/threading.pyc'> ignored


Then I checked the SAP NOTEs 2161344 - HWCCT patch note, and I replace the  FilesystemTest.py with the latest attached file as well. but still I got the same error as above.


can you provide any insight? how to move further and if you can share a workable json template,  it would be deeply appreciated!!


Regards,


Linda Wang

Accepted Solutions (1)

Accepted Solutions (1)

0 Kudos

Hello all,

this is a bug in Pyhton-script.

If you are not passing any additional parameter via the json-config-file then it throws this error message.

You can ignore this, because the script is running anyway, or you add an additional parameter with a default value, something like this:

----------------------------------------------------------------------------------------------------------------------------------------

   {

   "package": "FilesystemTest",

   "test_timeout": 0,

   "id": 4,

   "config":

      {"mount":

         {"servername":["/hana/data/SID"]

         },

       "parameter":{"async_read_submit":"off"},

       "duration":"short"

      },

   "class": "DataVolumeIO"

   },

----------------------------------------------------------------------------------------------------------------------------------------

On the other side, the script might realy run for a long, long time. It could run more than 1 hour, as it is creating big files and doing some operations on this file.

On one of my test servers it took 1.5 hours 🙂

If it is running in prinicipal you can do a manual small test with the following command (ensure that you have 1MB freespace in the filesystem):

/hana/shared/hwcct/lib/fsperf -i random -o verbose -m all -f 1M -b 16K /tmp

When running the real test with the hwval-program then you can check the execution, when you check the process running with the user executing the hwval-program.

Something like: ps -fu<sidadm>

Hope this will help you.

Regards,

Robert

Former Member
0 Kudos

Hi Robert,

You are correct!! Thanks for sharing with us and it helps a lot.

Regards,

Linda

Answers (4)

Answers (4)

markus_fehling
Discoverer
0 Kudos

Dear  SAP Team,

I am facing the same issue on LoP, download the package today.

By using the hostname I get the same err-msg:

Error in resolving the additional parameter for fsperf

'parameter'

And then the test does nothing ... hangs ....

By using "localhost", the script runs through in one or two seconds, but does not do any IO testing ....

>>>

./hwval -V -f V7000.json

Server loop running answering on 0.0.0.0:50365

Checking SSH configuration for blades

Skipping check for localhost

Starting up clients

Host localhost is alive on port 34380

Running test 1/2 ...

Configuring blades for test FilesystemTest.DataVolumeIO

All clients configured

Preparing test FilesystemTest.DataVolumeIO on clients

All clients prepared

Starting test FilesystemTest.DataVolumeIO.runTest on clients with timeout 0s

All clients finished test FilesystemTest.DataVolumeIO.runTest

Cleaning up test FilesystemTest.DataVolumeIO on clients

All clients finished cleanup

Getting results for test FilesystemTest.DataVolumeIO on clients

All clients finished

Test Process Summary:

---------------------

Host                         configured       prepared         runTest       gotresults       cleanedup

localhost:34380                  OK              OK              OK              OK              OK

Running test 2/2 ...

Configuring blades for test FilesystemTest.LogVolumeIO

All clients configured

Preparing test FilesystemTest.LogVolumeIO on clients

All clients prepared

Starting test FilesystemTest.LogVolumeIO.runTest on clients with timeout 0s

All clients finished test FilesystemTest.LogVolumeIO.runTest

Cleaning up test FilesystemTest.LogVolumeIO on clients

All clients finished cleanup

Getting results for test FilesystemTest.LogVolumeIO on clients

All clients finished

Test Process Summary:

---------------------

Host                         configured       prepared         runTest       gotresults       cleanedup

localhost:34380                  OK              OK              OK              OK              OK

Stopping clients

<<<

json file:

>>>

{

"report_id":"plnx01",

"use_hdb":false,

"blades":["localhost"],

"tests": [{

            "package": "FilesystemTest",

            "test_timeout": 0,

            "id": 1,

            "config": {"mount":{"plnx01":["/data"]

                                },

                        "duration":"short"

                      },

            "class": "DataVolumeIO"

        },

        {

            "package": "FilesystemTest",

            "test_timeout": 0,

            "id": 2,

            "config": {"mount":{"plnx01":["/log"]

                                },

                        "duration":"short"

                      },

            "class": "LogVolumeIO"

        }

]

}

<<<

Thanks, Markus Fehling@IBM

markus_fehling
Discoverer
0 Kudos

Update:

I change hostname to full qualified name,

from "plnx01" to "plnx01.isicc.de.ibm.com",

then the error msg disappeared, and the test runs through, but still within one or two seconds,

no IO Test is done, report is empty.

I plaid around with additional parameter, they have no effect:

"result_methods":[

   "consolidateResults",

   "formatResults"

]

"number": "42"

"sid": "ANA"

Former Member
0 Kudos

This message was moderated.

Former Member
0 Kudos

there is no result in the report with localhost

Former Member
0 Kudos

It worked, I did not see the parameter error again after I change the hostname  from crm03 to localhost..  But I don't know why. Is there anybody can  help explain why does it work like this?


Also, I want to know, would this hwcct check report would only reflects current system status? Let's say if there was slowly I/O sometimes, should I run the storage test when the slow I/O occurs?

/hana/shared/hwcct # ./hwval -V -f fs-check.json

Server loop running answering on 0.0.0.0:21342

Checking SSH configuration for blades

Skipping check for localhost

Starting up clients

Host localhost is alive on port 18144

Running test 1/2 ...

Configuring blades for test FilesystemTest.DataVolumeIO

All clients configured

Preparing test FilesystemTest.DataVolumeIO on clients

All clients prepared

Starting test FilesystemTest.DataVolumeIO.runTest on clients with timeout 0s

All clients finished test FilesystemTest.DataVolumeIO.runTest

Cleaning up test FilesystemTest.DataVolumeIO on clients

All clients finished cleanup

Getting results for test FilesystemTest.DataVolumeIO on clients

All clients finished

Test Process Summary:

---------------------

Host                         configured       prepared         runTest       gotresults       cleanedup  

localhost:18144                  OK              OK              OK              OK              OK      

Before:

"blades":["crm03"],

"tests": [{

            "package": "FilesystemTest",

            "test_timeout": 0,

            "id": 1,

            "config": {"mount":{"crm03":"/hana/data/"

                                },

                        "duration":"short"

                      },

            "class": "DataVolumeIO"

        },

        {

            "package": "FilesystemTest",

            "test_timeout": 0,

            "id": 2,

            "config": {"mount":{"crm03":"/hana/log/"

                                },

                        "duration":"short"

                      },

            "class": "LogVolumeIO"

        }

]

}

After:

"blades":["localhost"],

"tests": [{

            "package": "FilesystemTest",

            "test_timeout": 0,

            "id": 1,

            "config": {"mount":{"localhost":"/hana/data/"

                                },

                        "duration":"short"

                      },

            "class": "DataVolumeIO"

        },

        {

            "package": "FilesystemTest",

            "test_timeout": 0,

            "id": 2,

            "config": {"mount":{"localhost":"/hana/log/"

                                },

                        "duration":"short"

                      },

            "class": "LogVolumeIO"

        }

]

}

Thanks,

Linda Wang

former_member183326
Active Contributor
0 Kudos

Hello Linda

This issue may need an incident opened for it.

I could be wrong but this may be a bug or an issue on the OS/Hardware side of things.

Maybe someone else can clarify this?

I'm not sure if this is an issue on DB Level but if it is then running 2-3 runtime dumps whilst the error is occurring would be a good start, but again I can't be sure if this is DB related of not?

KR

Michael

Former Member
0 Kudos

Hi Michael,

Thanks for the reply. I got two additional questions:

1. Where can I find the dumps or more info while above error occurring? What I can confirm is that DB and other HANA services are running.  May I ask, will the storage check test access the database during the process?  As you mentioned it could be DB related..

2. Generally, how long would it take to proceed storage check ? Does one hour or two could be expected in some cases due to slowly I/O?

Thanks again for any comments!

BR,

Linda