The system extension Indexed Search is the engine which actually indexes content and provides a frontend plugin to let you search for content and show the results. The index search engine provides two major elements to TYPO3:
1. Indexing: An indexing engine which indexes TYPO3 pages on-the-fly as they are rendered by TYPO3’s frontend. Indexing a page means that all words from the page (or specifically defined areas on the page) are registered, counted, weighted and finally inserted into a database table of words. Then another table will be filled with relation records between the word table and the page. 2. Searching: A plugin you can insert on your website which allows website users to search for information on your website. By searching the plugin first looks in the word-table if the word exist and if it does all pages which has a relation to that word will be considered for the search result display. The search results are ordered based on factors like where on the page the word was found or the frequency of the word on the page.
This article will give you step by step instruction on how to install and configure those extensions to help efficiently index your typo3 content. Configuring Server If you want to index external documents referenced on your Web pages in addition to standard text elements, you will have to make sure you have properly installed a few third party binaries:
Configuring indexed search and Crawler Login to typo3 backend and then Admin Tools > Extension Manager and find the extension Indexed Search Engine. Go to extensions configuration section. Make sure Paths to PDF parsers, unzip, WORD parser, EXCEL parser, POWERPOINT parser and RTF parser all contain/usr/bin/
Make sure indexing of content is not performed automatically when showing a page in frontend and let use crawler to index external files.
Crawler requires a backend user _cli_crawler. Go to SYSTEM > Backend Users and create this backend user with a random password. This user must not be an administrator and should not be part of any backend user group.
Typoscript Setup
Open your typoscript template and add following lines.
config.index_enable = 1 config.index_externals = 1
How Crawler works ?
The crawler performs mainly two jobs, 1. Generate URLs of pages to be processed (with any GET parameter required, e.g., “L” for language or “tx_ttnews[tt_news]” to show the details of a tt_news record) and enqueue them for processing by the other job; 2. Process the queue of URLs and take the appropriate action (in our case invoke Indexed Search to index the page or the document). When generating URLs, the crawler will automatically be able to crawl your website and enqueue the different pages (with /index.php?uid=…). But if your site is multilingual, you will have to tell it to generate variations for each and every page (with /index.php?uid=…&L=0 and/index.php?uid=…&L=1 for instance). When a link to a document is encountered while indexing the content of a page, Indexing Search will not index it right away but instead will add it to the queue of pages and documents to be indexed (because option “Use crawler extension to index external files” was ticked in Indexed Search configuration). Crawler Configuration. We can check a basic crawler configuration which allows the whole page tree to be indexed. Step 1: Goto Web > List Step 2: Select Root Page of your site Step 3: Create a new record of type “Crawler Configuration” Which is under the section “Site Crawler”
Now we can use this configuration to index our website. Configure the scheduler to run different crawler tasks.
Adding Search Plugin To a Page
Select the page in which you want to integrate the search option. Create new content elemnt, under the tab ‘Plugin’ select ‘indexed serach’.
Indexing News Articles
Suppose we have latest news list section which contains news teaser and link to detail page. The details page contains att_news plugin whose output mode is SINGLE. As such, this plugin expects a GET parameter in the URL: &tx_ttnews[tt_news]= (id) Our test configuration:
We want crawler to dynamically generate a list of URLs with the additional tx_ttnews[tt_news] parameter when it crawls page #35. Crawler Configuration We are creating this configuration for the subtree of page #35. Step1: Go To Typo3 backend Web > List Step2: Click on page #35 Step 3: Create a new record of type “Crawler Configuration” Which is under the section “Site Crawler”
The _TABLE field in configuration defines the look up table (tt_new here). And _PID defines the news storage folder id (#19 here). While creating crawler configuration tick “Append cHash” otherwise you will end up having N times the first news being indexed due to TYPO3 caching mechanism.
You must be logged in to post a comment.