site crawler
| < 1.2.1. Page TSconfig Reference (tx_crawler.crawlerCfg) | Table Of Content | 1.3.2. Submit URLs to queue > |
1.3. Tutorial
1.3.1. Configuration of crawler
When the crawler extension is installed the backend administration can be accessed through the Web > Info module:

This shows you the entries that can be submitted for crawling. Currently there are none. You need to configure how the page tree is crawled. This is done using Page TSconfig. So for the page "Testsite" you enter this configuration in the TSconfig field:
tx_crawler.crawlerCfg.paramSets {
language = &L=[|_TABLE:pages_language_overlay;_FIELD:sys_language_uid]
language.procInstrFilter = tx_indexedsearch_reindex, tx_cachemgm_recache
language.baseUrl = http://localhost:8888/typo3/dummy_4.0/
mininews = &L=[|_TABLE:pages_language_overlay;_FIELD:sys_language_uid]&tx_mininews_pi1[showUid]=[_TABLE:tx_mininews_news]
mininews.procInstrFilter = tx_indexedsearch_reindex
mininews.cHash = 1
mininews.baseUrl = http://localhost:8888/typo3/dummy_4.0/
}
This code contains two "sets", namely "language" and "mininews". The result is displayed in the "Start Crawling" screen:
Each set describes variations of the URL for each page. The "language" set will look if there are translations for a page and if so, ask to visit the page both with and without the L-get variable.
Its the same with the "mininews" set; It looks up mininews items on the page and if found will generate a number of URLs to be crawled corresponding to the number of mininews items that exists. This is even combined with the L-parameter so each news display is visited one time for each language!
In addition you can set the "baseUrl" for the request and whether a cHash value should be calculated (for "mininews" this is necessary to have it indexed or cached).
Summary: The configuration is supposed to describe which parameter variations you want to visit pages with during the crawling! The configuration defines the URLs to visit.
