|
|
# This file contains a list of stemmers to include in the distribution.# The format is a set of space separated lines - on each line:# First item is name of stemmer.# Second item is comma separated list of character sets.# Third item is comma separated list of names to refer to the stemmer by.## Lines starting with a #, or blank lines, are ignored.
# List all the main algorithms for each language, in UTF-8, and also with# the most commonly used encoding.
arabic UTF_8 arabic,ar,aradanish UTF_8 danish,da,dandutch UTF_8 dutch,nl,dut,nldenglish UTF_8 english,en,engfinnish UTF_8 finnish,fi,finfrench UTF_8 french,fr,fre,fragerman UTF_8 german,de,ger,deugreek UTF_8 greek,el,gre,ellhindi UTF_8 hindi,hi,hinhungarian UTF_8 hungarian,hu,hunindonesian UTF_8 indonesian,id,inditalian UTF_8 italian,it,italithuanian UTF_8 lithuanian,lt,litnepali UTF_8 nepali,ne,nepnorwegian UTF_8 norwegian,no,norportuguese UTF_8 portuguese,pt,porromanian UTF_8 romanian,ro,rum,ronrussian UTF_8 russian,ru,russerbian UTF_8 serbian,sr,srpspanish UTF_8 spanish,es,esl,spaswedish UTF_8 swedish,sv,swetamil UTF_8 tamil,ta,tamturkish UTF_8 turkish,tr,tur
# Also include the traditional porter algorithm for english.# The porter algorithm is included in the libstemmer distribution to assist# with backwards compatibility, but for new systems the english algorithm# should be used in preference.porter UTF_8 porter english
# Some other stemmers in the snowball project are not included in the standard# distribution. To compile a libstemmer with them in, add them to this list,# and regenerate the distribution. (You will need a full source checkout for# this.) They are included in the snowball website as curiosities, but are not# intended for general use, and use of them is is not fully supported. These# algorithms are:## german2 - This is a slight modification of the german stemmer.#german2 UTF_8,ISO_8859_1 german2 german## kraaij_pohlmann - This is a different dutch stemmer.#kraaij_pohlmann UTF_8,ISO_8859_1 kraaij_pohlmann dutch## lovins - This is an english stemmer, but fairly outdated, and# only really applicable to a restricted type of input text# (keywords in academic publications).#lovins UTF_8,ISO_8859_1 lovins english
|