on 12-01-2015 3:34 PM
Hello Experts,
HWCCT version: 110, the latest version.
installation file: HWCCT_110_0-20011536.sar
When I run the filesystem check json script, it would stuck there and hangs. there was an error message in red line popped out as below.
./hwval -V -f fs-check.json
Server loop running answering on 0.0.0.0:33639
Checking SSH configuration for blades
Skipping check for myHANA
Starting up clients
Host myHANA is alive on port 39419
Running test 1/2 ...
Configuring blades for test FilesystemTest.DataVolumeIO
All clients configured
Preparing test FilesystemTest.DataVolumeIO on clients
All clients prepared
Starting test FilesystemTest.DataVolumeIO.runTest on clients with timeout 0s
Error in resolving the additional parameter for fsperf
'parameter'
Error in resolving the additional parameter for fsperf
'parameter'
======It stucked here for more than 1 hour, then I precessed CTRL+C to force quite. .====
^C
Aborting.....!
Error in resolving the additional parameter for fsperf
'parameter'
Exception SystemExit: SystemExit() in <module 'threading' from '/hana/shared/hwcct/lib/Python/lib/python2.7/threading.pyc'> ignored
myHANA:/hana/shared/hwcct # Exception KeyboardInterrupt in <module 'threading' from '/hana/shared/hwcct/lib/Python/lib/python2.7/threading.pyc'> ignored
Then I checked the SAP NOTEs 2161344 - HWCCT patch note, and I replace the FilesystemTest.py with the latest attached file as well. but still I got the same error as above.
can you provide any insight? how to move further and if you can share a workable json template, it would be deeply appreciated!!
Regards,
Linda Wang
Hello all,
this is a bug in Pyhton-script.
If you are not passing any additional parameter via the json-config-file then it throws this error message.
You can ignore this, because the script is running anyway, or you add an additional parameter with a default value, something like this:
----------------------------------------------------------------------------------------------------------------------------------------
{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 4,
"config":
{"mount":
{"servername":["/hana/data/SID"]
},
"parameter":{"async_read_submit":"off"},
"duration":"short"
},
"class": "DataVolumeIO"
},
----------------------------------------------------------------------------------------------------------------------------------------
On the other side, the script might realy run for a long, long time. It could run more than 1 hour, as it is creating big files and doing some operations on this file.
On one of my test servers it took 1.5 hours 🙂
If it is running in prinicipal you can do a manual small test with the following command (ensure that you have 1MB freespace in the filesystem):
/hana/shared/hwcct/lib/fsperf -i random -o verbose -m all -f 1M -b 16K /tmp
When running the real test with the hwval-program then you can check the execution, when you check the process running with the user executing the hwval-program.
Something like: ps -fu<sidadm>
Hope this will help you.
Regards,
Robert
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Dear SAP Team,
I am facing the same issue on LoP, download the package today.
By using the hostname I get the same err-msg:
Error in resolving the additional parameter for fsperf
'parameter'
And then the test does nothing ... hangs ....
By using "localhost", the script runs through in one or two seconds, but does not do any IO testing ....
>>>
./hwval -V -f V7000.json
Server loop running answering on 0.0.0.0:50365
Checking SSH configuration for blades
Skipping check for localhost
Starting up clients
Host localhost is alive on port 34380
Running test 1/2 ...
Configuring blades for test FilesystemTest.DataVolumeIO
All clients configured
Preparing test FilesystemTest.DataVolumeIO on clients
All clients prepared
Starting test FilesystemTest.DataVolumeIO.runTest on clients with timeout 0s
All clients finished test FilesystemTest.DataVolumeIO.runTest
Cleaning up test FilesystemTest.DataVolumeIO on clients
All clients finished cleanup
Getting results for test FilesystemTest.DataVolumeIO on clients
All clients finished
Test Process Summary:
---------------------
Host configured prepared runTest gotresults cleanedup
localhost:34380 OK OK OK OK OK
Running test 2/2 ...
Configuring blades for test FilesystemTest.LogVolumeIO
All clients configured
Preparing test FilesystemTest.LogVolumeIO on clients
All clients prepared
Starting test FilesystemTest.LogVolumeIO.runTest on clients with timeout 0s
All clients finished test FilesystemTest.LogVolumeIO.runTest
Cleaning up test FilesystemTest.LogVolumeIO on clients
All clients finished cleanup
Getting results for test FilesystemTest.LogVolumeIO on clients
All clients finished
Test Process Summary:
---------------------
Host configured prepared runTest gotresults cleanedup
localhost:34380 OK OK OK OK OK
Stopping clients
<<<
json file:
>>>
{
"report_id":"plnx01",
"use_hdb":false,
"blades":["localhost"],
"tests": [{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 1,
"config": {"mount":{"plnx01":["/data"]
},
"duration":"short"
},
"class": "DataVolumeIO"
},
{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 2,
"config": {"mount":{"plnx01":["/log"]
},
"duration":"short"
},
"class": "LogVolumeIO"
}
]
}
<<<
Thanks, Markus Fehling@IBM
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Update:
I change hostname to full qualified name,
from "plnx01" to "plnx01.isicc.de.ibm.com",
then the error msg disappeared, and the test runs through, but still within one or two seconds,
no IO Test is done, report is empty.
I plaid around with additional parameter, they have no effect:
"result_methods":[
"consolidateResults",
"formatResults"
]
"number": "42"
"sid": "ANA"
there is no result in the report with localhost
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
It worked, I did not see the parameter error again after I change the hostname from crm03 to localhost.. But I don't know why. Is there anybody can help explain why does it work like this?
Also, I want to know, would this hwcct check report would only reflects current system status? Let's say if there was slowly I/O sometimes, should I run the storage test when the slow I/O occurs?
/hana/shared/hwcct # ./hwval -V -f fs-check.json
Server loop running answering on 0.0.0.0:21342
Checking SSH configuration for blades
Skipping check for localhost
Starting up clients
Host localhost is alive on port 18144
Running test 1/2 ...
Configuring blades for test FilesystemTest.DataVolumeIO
All clients configured
Preparing test FilesystemTest.DataVolumeIO on clients
All clients prepared
Starting test FilesystemTest.DataVolumeIO.runTest on clients with timeout 0s
All clients finished test FilesystemTest.DataVolumeIO.runTest
Cleaning up test FilesystemTest.DataVolumeIO on clients
All clients finished cleanup
Getting results for test FilesystemTest.DataVolumeIO on clients
All clients finished
Test Process Summary:
---------------------
Host configured prepared runTest gotresults cleanedup
localhost:18144 OK OK OK OK OK
Before:
"blades":["crm03"],
"tests": [{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 1,
"config": {"mount":{"crm03":"/hana/data/"
},
"duration":"short"
},
"class": "DataVolumeIO"
},
{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 2,
"config": {"mount":{"crm03":"/hana/log/"
},
"duration":"short"
},
"class": "LogVolumeIO"
}
]
}
After:
"blades":["localhost"],
"tests": [{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 1,
"config": {"mount":{"localhost":"/hana/data/"
},
"duration":"short"
},
"class": "DataVolumeIO"
},
{
"package": "FilesystemTest",
"test_timeout": 0,
"id": 2,
"config": {"mount":{"localhost":"/hana/log/"
},
"duration":"short"
},
"class": "LogVolumeIO"
}
]
}
Thanks,
Linda Wang
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Linda
This issue may need an incident opened for it.
I could be wrong but this may be a bug or an issue on the OS/Hardware side of things.
Maybe someone else can clarify this?
I'm not sure if this is an issue on DB Level but if it is then running 2-3 runtime dumps whilst the error is occurring would be a good start, but again I can't be sure if this is DB related of not?
KR
Michael
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Michael,
Thanks for the reply. I got two additional questions:
1. Where can I find the dumps or more info while above error occurring? What I can confirm is that DB and other HANA services are running. May I ask, will the storage check test access the database during the process? As you mentioned it could be DB related..
2. Generally, how long would it take to proceed storage check ? Does one hour or two could be expected in some cases due to slowly I/O?
Thanks again for any comments!
BR,
Linda
User | Count |
---|---|
81 | |
10 | |
10 | |
9 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.