I have to index a repository of 20000 documents. It took 5 hours to index 1000 documents. if it continues like this it will take 100 hrs to complete the indexing.
What could be the reason for this low performance? We are having the portal and Trex on different servers.
In the portal server's trace file I can see the following message:
#1#com.sap.engine.services.servlets_jsp.server.exceptions.WebIllegalStateException: The stream has already been taken by method [getOutputStream()].
In the TREX admin's trace I can see several messages like:
 2008-10-16 19:23:47.312 e preprocessor Preprocessor.cpp(00941) : HTTP-GET failed for URL http:// <file name>
with Errorcode -5 , but HTTP-HEAD worked, trying again
 2008-10-16 19:23:47.421 e HTTPData Preprocessor.cpp(04944) : HTTPGET: Stop retries after 5 rounds, skipping
 2008-10-16 19:23:47.421 e preprocessor Preprocessor.cpp(00951) : HTTPHEAD failed for URL http:// : <file name>
Errorcode -5 , Message Reader::readHeaderSkip100 failed, url=http://<file name>
The TREX server has 16GB RAM.
What can be done to improve the performance?
Thanks and Regards,
Raymond HENG replied
Not sure if it could help you but the below guide makes recommendations for configuring search and classification (using TREX 6.1) for efficient indexing. It covers the following topics: fast initial indexing of large data sets; fast updating of indexes; and fast index replication in distributed TREX systems.
1) How to Configure TREX 6.1 for Efficient Indexing
Hope that helps.