This repository contains Bulgarian ispell
(affix
and dict
) and stopword
dictionaries for full text search in PostgreSQL.
The ispell
dictionary files (bulgarian.affix
and bulgarian.dict
) have been created by the bgOffice/БГ Офис project for use in OpenOffice and are licensed under LGPL 3.0.
This repository contains a modified version of those files (minor changes) to make them compatible with the format expected by PostgreSQL. The original ispell
files (bulgarian.aff
and bulgarian.dic
) can be downloaded from http://bgoffice.sourceforge.net/ispell/index.html
The stop words list used in this repository (bulgarian.stop
) is a modified version of the list published in article "Searching strategies for the Bulgarian language" (the list is in Table A.1) by Prof. Jacques Savoy.
-
Copy the three files
bulgarian.affix
,bulgarian.dict
andbulgarian.stop
to your$SHAREDIR/tsearch_data/
directory (eg.C:\Program Files\PostgreSQL\12\share\tsearch_data
). You can determine what your$SHAREDIR
is by runningpg_config --sharedir
. -
Execute the following SQL script:
CREATE TEXT SEARCH CONFIGURATION bulgarian (COPY = simple); CREATE TEXT SEARCH DICTIONARY bulgarian_ispell ( TEMPLATE = ispell, DictFile = bulgarian, AffFile = bulgarian, StopWords = bulgarian ); CREATE TEXT SEARCH DICTIONARY bulgarian_simple ( TEMPLATE = pg_catalog.simple, STOPWORDS = bulgarian ); ALTER TEXT SEARCH CONFIGURATION bulgarian ALTER MAPPING FOR asciiword, asciihword, hword, hword_part, word WITH bulgarian_ispell, bulgarian_simple;
-
Make sure its working by running a full text search query.
A query like this one:
SELECT to_tsvector('bulgarian', 'текстовете');
should output only the base of the word (
текст
):`"'текст':1"`