cancel
Showing results for 
Search instead for 
Did you mean: 

What are Crawlers(Task/Process) and how they take part during indexing

Former Member
0 Kudos

Hi,

I'm new to TREX search Engine. i want to know that what are Crawlers task and how & when they perform their duties after the DataSource is assigned to a newly created index. I mean to say, how they are closely related to search indexes.

Regards

Nitin Mishra

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi,

The crawler service allows crawlers to collect resources located in internal or external repositories, for example, for indexing purposes. A crawler returns the resources and the hierarchical or net-like structures of the respective repositories.

Services and applications that need repositories to be crawled (for example, the index management service) request a crawler from the crawler service.

http://help.sap.com/saphelp_nw2004s/helpdata/en/fb/38ef207d0a47ee9dc08deeed855392/frameset.htm

Patricio.

Answers (2)

Answers (2)

0 Kudos

Hi Nitin,

let me add an important architectural fact:

Architecturally, the Crawler Service is part of KM. Thus, its Java processes run on your Portal Server.

Architecturally, the Crawler Service is <b>not</b> part of the TREX engine. The TREX engine merely receives the lists of objects to be indexed from the Crawler Service of KM through the Index Management Service of KM (IMS).

Regards, Karsten

Former Member
0 Kudos

hi,

The crawler only searches Web sites or parts of Web sites that are not protected by robot instructions. Robot instructions are part of Internet standards. They allow Web site owners to permit or forbid the crawling of their sites or parts thereof

Depending on the type of repository, you may have to set up a crawler and a schedule.

For Web repositories, the index is updated using a crawler. If you are assigning a Web repository to an index for the first time, it is indexed immediately. You then need to regularly schedule the crawler so that the index is updated.

For hierarchical repositories, the index is updated by using events. Therefore, it is not absolutely necessary that the crawler be started at regular intervals. However, you can start the crawler at regular intervals in order to make changes in the index for which no event is triggered. This can be the case if documents have been created, changed, or deleted directly in the file system without using Knowledge Management.

Regards,

Ganesh N