Configure indexed search and Crawler easily in typo3

Deutsch

Leistungen
- Beratung Digitalisierung Wir beraten Sie kompetent und unabhänging für digitalen Projekte. Zusammen mit uns finden Sie die richtigen Lösungen für Ihre Geschäftsprozesse.
- UI / UX Design Ein schönes UI/UX trägt entscheidend zum Erfolg Ihrer Software bei. Nutzen Sie unser Fachwissen in der Gestaltung von Anwendungen mit erstklassigem Nutzererlebnis.
- Entwicklung Wir entwicklen skalierbare Softwarelösungen mit hohen Qualitätsstandards.
- Testing/QA Unser Testing Team führt manuelle und automatisierte Tests aus, um das fehlerlose Funktionieren unserer Produkte sicherzustellen
- IT Services Unser technisches Team überwacht, wartet und stellt eine verlässliche Infrastruktur sicher, um Ihre Software jederzeit betriebsbereit zu halten
Lösungen
- E-Commerce Wir bauen moderne E-Commerce Shops, die den Kunden Komfort und Zuverlässigkeit bieten.
- Mobile Apps Wir entwickeln native oder hybride mobile Apps nach Mass. Dabei unterstützen unsere Spezialisten Sie bei Bedarf mit Wireframes und Designs.
- Business Applications Wir konzipieren, planen und entwickeln webbasierte Tools, Plattformen und Anwendungen nach Ihren Bedürfnissen.
- AI & Data Science Wir entwickeln innovative Lösungen, prädiktive Analysen und umsetzbare Erkenntnisse, um Ihr Wachstum zu fördern.
- Websites Wir beraten und unterstützen Sie bei der Realisierung einer skalierbaren E-Commerce Idee oder attraktiven Firmenwebsite.
Ihr Remote Team
Technologie
Referenzen
Unternehmen
- Über PITS Ein Schnappschuss über PITS, unsere Geschichte, Werte und unser Team ist immer interessant zu erfahren.
- Initiativen Wir haben in den letzten Jahren mehrere digitale Initiativen durchgeführt, die von der Entwicklung eigener Produkte und Projekte bis hin zu Investitionen in Startups reichen.
- Jobs PITS wurde in den letzten Jahren regelmäßig als großartiger Arbeitgeber ausgezeichnet. Schauen Sie sich unsere offenen Stellen an und bewerben Sie sich als Teil unseres Teams
- Kontakt Kontaktieren Sie unseren Bürostandort in der Nähe Ihrer Region. Wir freuen uns, Sie persönlich kennenzulernen und Ihre Projekte bei einer warmen Tasse Kaffee oder Tee zu besprechen.
Insights
- Case Studies Lassen Sie sich von unserer Sammlung aussagekräftiger Fallstudien auf dem Weg zum Erfolg begleiten.
- White paper PITS Whitepapers werden sowohl für Entwickler als auch für Kunden zu bestimmten Themen sorgfältig erstellt.
- Newsroom Willkommen in unserem Newsroom, wo wir die neuesten Updates, wichtige Unternehmensnachrichten, Veranstaltungshighlights, spannende Videos und mehr teilen.
- Blog Unser Blog versorgt Sie regelmässig mit aktuellen und spannenden Artikeln zu den unterschiedlichsten Themen aus der Online-Welt.

The system extension Indexed Search is the engine which actually indexes content and provides a frontend plugin to let you search for content and show the results. The index search engine provides two major elements to TYPO3:

1. Indexing: An indexing engine which indexes TYPO3 pages on-the-fly as they are rendered by TYPO3’s frontend. Indexing a page means that all words from the page (or specifically defined areas on the page) are registered, counted, weighted and finally inserted into a database table of words. Then another table will be filled with relation records between the word table and the page.
2. Searching: A plugin you can insert on your website which allows website users to search for information on your website. By searching the plugin first looks in the word-table if the word exist and if it does all pages which has a relation to that word will be considered for the search result display. The search results are ordered based on factors like where on the page the word was found or the frequency of the word on the page.

This article will give you step by step instruction on how to install and configure those extensions to help efficiently index your typo3 content.

Configuring Server
If you want to index external documents referenced on your Web pages in addition to standard text elements, you will have to make sure you have properly installed a few third party binaries:

catdoc for Microsoft Word documents (Will not support docx files)
xlhtml for Microsoft Excel spreadsheets (Will not support xlsx files)
ppthtml for Microsoft Powerpoint presentations (Will not support pptx files)
pdftotext and pdfinfo for PDF files
unzip for OpenOffice documents
unrtf for RTF

Configuring indexed search and Crawler
Login to typo3 backend and then Admin Tools > Extension Manager and find the extension Indexed Search Engine. Go to extensions configuration section.
Make sure Paths to PDF parsers, unzip, WORD parser, EXCEL parser, POWERPOINT parser and RTF parser all contain/usr/bin/

Indexed search configuration typo3

Make sure indexing of content is not performed automatically when showing a page in frontend and let use crawler to index external files.

Indexed search

Indexed Search

Crawler requires a backend user _cli_crawler. Go to SYSTEM > Backend Users and create this backend user with a random password. This user must not be an administrator and should not be part of any backend user group.

Backend user

Typoscript Setup

Open your typoscript template and add following lines.

config.index_enable = 1
config.index_externals = 1

How Crawler works ?

The crawler performs mainly two jobs,
1. Generate URLs of pages to be processed (with any GET parameter required, e.g., “L” for language or “tx_ttnews[tt_news]” to show the details of a tt_news record) and enqueue them for processing by the other job;
2. Process the queue of URLs and take the appropriate action (in our case invoke Indexed Search to index the page or the document).
When generating URLs, the crawler will automatically be able to crawl your website and enqueue the different pages (with /index.php?uid=…). But if your site is multilingual, you will have to tell it to generate variations for each and every page (with /index.php?uid=…&L=0 and/index.php?uid=…&L=1 for instance).
When a link to a document is encountered while indexing the content of a page, Indexing Search will not index it right away but instead will add it to the queue of pages and documents to be indexed (because option “Use crawler extension to index external files” was ticked in Indexed Search configuration).

Crawler Configuration.
We can check a basic crawler configuration which allows the whole page tree to be indexed.
Step 1: Goto Web > List
Step 2: Select Root Page of your site
Step 3: Create a new record of type “Crawler Configuration” Which is under the section “Site Crawler”

Crawler Configuration

Now we can use this configuration to index our website. Configure the scheduler to run different crawler tasks.

Typo3 scheduler

Adding Search Plugin To a Page

Select the page in which you want to integrate the search option. Create new content elemnt, under the tab ‚Plugin‘ select ‚indexed serach‘.

Indexing News Articles

Suppose we have latest news list section which contains news teaser and link to detail page. The details page contains att_news plugin whose output mode is SINGLE. As such, this plugin expects a GET parameter in the URL:
&tx_ttnews[tt_news]= (id)
Our test configuration:

Sysfolder [uid #19] is our tt_news storage folder
Page [uid #35] contains a tt_news plugin for SINGLE view

We want crawler to dynamically generate a list of URLs with the additional tx_ttnews[tt_news] parameter when it crawls page #35.

Crawler Configuration
We are creating this configuration for the subtree of page #35.
Step1: Go To Typo3 backend Web > List
Step2: Click on page #35
Step 3: Create a new record of type “Crawler Configuration” Which is under the section “Site Crawler”

Crawler configuration

The _TABLE field in configuration defines the look up table (tt_new here). And _PID defines the news storage folder id (#19 here).
While creating crawler configuration tick “Append cHash” otherwise you will end up having N times the first news being indexed due to TYPO3 caching mechanism.

Crawler Configuration

Schreibe einen Kommentar

Du musst angemeldet sein, um einen Kommentar abzugeben.