-
Notifications
You must be signed in to change notification settings - Fork 151
Setup Search Module
Install the module with the package manager command: install-package BetterCms.Module.LuceneSearch
.
The module will install with two workers, executed as asynchronous background processes that won't block the web application:
- Index source watcher: This worker scans the Better CMS "pages" table and adds new pages to the indexing queue.
- Indexing robot: This worker scans list of pages from the indexing queue and crawls specified URLS. At first, new pages are crawled, followed by failed pages and then already-crawled pages.
Use these parameters for configuring Lucene search module:
- LuceneWebSiteUrl: web site URL (prefix, which will be added to scraping URLs)
- LuceneFileSystemDirectory: Lucene files directory
-
LucenePagesWatcherFrequency: frequency time span, how often the worker should look for newly created pages. Set to
00:00:00
to disable the new pages watcher. - LuceneIndexerPageFetchTimeout: page fetching timeout (how long the system will wait for a page to respond). Default value: 00:01:00 (1 minute)
-
LuceneIndexerFrequency: frequency timespan, how often the content indexer should re-index a page's content. Set to
00:00:00
to disable the indexer. - LuceneMaxPagesPerQuery: maximum number of re-indexed pages per query. Default value: 1000
- LucenePageExpireTimeout: indexed page expire timeout.
- LuceneDisableStopWords: disables stop words such as ["a", "the", "of", ...] when indexing the content.
-
LuceneSearchForPartOfWords: if set to true, searches within words will be performed (similar to
LIKE %query%
in SQL) - LuceneIndexPrivatePages: if set to true, searches within private pages will be performed (authorization is required)
- LuceneExcludedIds: a list of html ids who should be skipped and not indexed
- LuceneExcludedClasses: a list of html classes who should be skipped and not indexed
- LuceneExcludedNodes: a list of html nodes who should be skipped and not indexed (skipped by default: "noscript", "script", "button", "style")
-
LuceneAuthorizationUrl: authorization URL (where user credentials are sent using POST method). May be the same URL as log in form (for example,
/login/
). -
LuceneAuthorizationForm: authorization form POST's parameters with values, e.g.
LuceneAuthorizationForm.UserName
,LuceneAuthorizationForm.Password
,LuceneAuthorizationForm.CustomField
Example:
<search>
<add key="LuceneWebSiteUrl" value="http://bettercms.sandbox.mvc4.local/" />
<add key="LuceneFileSystemDirectory" value="../../../Lucene.BetterCms" />
<add key="LuceneIndexerFrequency" value="00:05:00" />
<add key="LuceneIndexerPageFetchTimeout" value="00:01:00" />
<add key="LucenePagesWatcherFrequency" value="00:05:00" />
<add key="LuceneMaxPagesPerQuery" value="1000" />
<add key="LucenePageExpireTimeout" value="00:00:00" />
<add key="LuceneDisableStopWords" value="true" />
<add key="LuceneSearchForPartOfWords" value="true" />
<add key="LuceneIndexPrivatePages" value="true" />
<add key="LuceneAuthorizationUrl" value="http://bettercms.sandbox.mvc4.local/login" />
<add key="LuceneAuthorizationForm.UserName" value="admin" />
<add key="LuceneAuthorizationForm.Password" value="admin" />
<add key="LuceneAuthorizationForm.RememberMe" value="true" />
</search>
It is possible to log Lucene workers to another log file. To do so, just use the Lucene search module namespace LuceneSearchModule
in the log configuration files.
The following is an example of how all the information should be logged to the bettercms.log
file and Lucene search module's information, to the file bettercms.search.log
:
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<targets>
[...]
<target name="log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.log" archiveFileName="${basedir}/logs/error_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />
[...]
<target name="search_log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.search.log" archiveFileName="${basedir}/logs/search_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />
[...]
</targets>
<rules>
<logger name="LuceneSearchModule" writeTo="search_log_file" minlevel="Trace" final="true" />
[...]
<logger name="*" writeTo="log_file" minlevel="Trace" maxlevel="Fatal" />
</rules>
</nlog>
Install the module with the package manager command: install-package BetterCms.Module.GoogleSiteSearch
.
To enable Google Site search, create a Google Site Search account if you haven't already (can be registered here). Google Site Search is a paid service; pricing is available here.
Google search is done by using a URL query, such as: https://www.googleapis.com/customsearch/v1?key={0}&cx={1}
(read more here). These parameters can be set within the cms.config
file's search
section:
-
GoogleSiteSearchApiKey: Your Google API key (
key
in the URL). -
GoogleSiteSearchEngineKey: Search engine's ID (
cx
in the URL).
Example:
<search>
<add key="GoogleSiteSearchApiKey" value="[BETTERCMS_GOOGLE_SEARCH_API_KEY]" />
<add key="GoogleSiteSearchEngineKey" value="[BETTERCMS_GOOGLE_SEARCH_ENGINE_KEY]" />
</search>
When the BetterCms.Module.GoogleSiteSearch
or BetterCms.Module.LuceneSearch
module is installed, the main search module BetterCms.Module.Search
is also installed as a reference module. It creates two widgets within the Search category: Search input form and Search results.
Details for setting these widgets up is discussed here.
To use search module API method, the BetterCms.Module.Search.Api
module should be installed (with the package manager command install-package BetterCms.Module.Search.Api
).