Skip to content

Setup Search Module

JuliusSenkus edited this page Sep 24, 2015 · 14 revisions

Installing the Lucene Search Module

Install the module with the package manager command: install-package BetterCms.Module.LuceneSearch.

The module will install with two workers, executed as asynchronous background processes that won't block the web application:

  • Index source watcher: This worker scans the Better CMS "pages" table and adds new pages to the indexing queue.
  • Indexing robot: This worker scans list of pages from the indexing queue and crawls specified URLS. At first, new pages are crawled, followed by failed pages and then already-crawled pages.

Use these parameters for configuring Lucene search module:

  • LuceneWebSiteUrl: web site URL (prefix, which will be added to scraping URLs)
  • LuceneFileSystemDirectory: Lucene files directory
  • LucenePagesWatcherFrequency: frequency time span, how often the worker should look for newly created pages. Set to 00:00:00 to disable the new pages watcher.
  • LuceneIndexerPageFetchTimeout: page fetching timeout (how long the system will wait for a page to respond). Default value: 00:01:00 (1 minute)
  • LuceneIndexerFrequency: frequency timespan, how often the content indexer should re-index a page's content. Set to 00:00:00 to disable the indexer.
  • LuceneMaxPagesPerQuery: maximum number of re-indexed pages per query. Default value: 1000
  • LucenePageExpireTimeout: indexed page expire timeout.
  • LuceneDisableStopWords: disables stop words such as ["a", "the", "of", ...] when indexing the content.
  • LuceneSearchForPartOfWords: if set to true, searches within words will be performed (similar to LIKE %query% in SQL)
  • LuceneIndexPrivatePages: if set to true, searches within private pages will be performed (authorization is required)
  • LuceneExcludedIds: a list of html ids who should be skipped and not indexed
  • LuceneExcludedClasses: a list of html classes who should be skipped and not indexed
  • LuceneExcludedNodes: a list of html nodes who should be skipped and not indexed (skipped by default: "noscript", "script", "button", "style")
  • LuceneAuthorizationUrl: authorization URL (where user credentials are sent using POST method). May be the same URL as log in form (for example, /login/).
  • LuceneAuthorizationForm: authorization form POST's parameters with values, e.g. LuceneAuthorizationForm.UserName, LuceneAuthorizationForm.Password, LuceneAuthorizationForm.CustomField

Example:

  <search>
    <add key="LuceneWebSiteUrl" value="http://bettercms.sandbox.mvc4.local/" />
    <add key="LuceneFileSystemDirectory" value="../../../Lucene.BetterCms" />
    <add key="LuceneIndexerFrequency" value="00:05:00" />
	<add key="LuceneIndexerPageFetchTimeout" value="00:01:00" />
    <add key="LucenePagesWatcherFrequency" value="00:05:00" />
    <add key="LuceneMaxPagesPerQuery" value="1000" />
    <add key="LucenePageExpireTimeout" value="00:00:00" />
    <add key="LuceneDisableStopWords" value="true" />
    <add key="LuceneSearchForPartOfWords" value="true" />
    <add key="LuceneIndexPrivatePages" value="true" />
    <add key="LuceneAuthorizationUrl" value="http://bettercms.sandbox.mvc4.local/login" />
    <add key="LuceneAuthorizationForm.UserName" value="admin" />
    <add key="LuceneAuthorizationForm.Password" value="admin" />
    <add key="LuceneAuthorizationForm.RememberMe" value="true" />
  </search>

Lucene Module Logging

It is possible to log Lucene workers to another log file. To do so, just use the Lucene search module namespace LuceneSearchModule in the log configuration files.

The following is an example of how all the information should be logged to the bettercms.log file and Lucene search module's information, to the file bettercms.search.log:

<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <targets>
    [...]
    <target name="log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.log" archiveFileName="${basedir}/logs/error_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />    
    [...]
    <target name="search_log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.search.log" archiveFileName="${basedir}/logs/search_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />    
    [...]
  </targets>
  <rules>
    <logger name="LuceneSearchModule" writeTo="search_log_file" minlevel="Trace" final="true" />    
    [...]
    <logger name="*" writeTo="log_file" minlevel="Trace" maxlevel="Fatal" />
 </rules>
</nlog>

Installing Google Search Module

Install the module with the package manager command: install-package BetterCms.Module.GoogleSiteSearch.

To enable Google Site search, create a Google Site Search account if you haven't already (can be registered here). Google Site Search is a paid service; pricing is available here.

Google search is done by using a URL query, such as: https://www.googleapis.com/customsearch/v1?key={0}&cx={1} (read more here). These parameters can be set within the cms.config file's search section:

  • GoogleSiteSearchApiKey: Your Google API key (key in the URL).
  • GoogleSiteSearchEngineKey: Search engine's ID (cx in the URL).

Example:

  <search>
    <add key="GoogleSiteSearchApiKey" value="[BETTERCMS_GOOGLE_SEARCH_API_KEY]" />
    <add key="GoogleSiteSearchEngineKey" value="[BETTERCMS_GOOGLE_SEARCH_ENGINE_KEY]" />
  </search>

Using Search Module Widgets

When the BetterCms.Module.GoogleSiteSearch or BetterCms.Module.LuceneSearch module is installed, the main search module BetterCms.Module.Search is also installed as a reference module. It creates two widgets within the Search category: Search input form and Search results.

Details for setting these widgets up is discussed here.

Installing Search Module API

To use search module API method, the BetterCms.Module.Search.Api module should be installed (with the package manager command install-package BetterCms.Module.Search.Api).

Clone this wiki locally