Open Source Information Retrieval systems
Bearbeiten- http://www.xapian.org/ - Open Source Probabilistic Information Retrieval library
- http://ils.unc.edu/tera/ - TeraScale Retrieval Project's (apply IR techniques to large scale datasets)
Quellen:
(z.Z: 2321 Dokumente zu "Stemming", 64 zu "Stemming AND spanish")
Un stemmer es un programma... de reduccion morfologica.
=> Sciencie experimental: invent an algorithmo and test it.
Búsqueda en Google
Bearbeiten- http://ir.iit.edu/~abdur/research/conflation/conflation.html: Corpus based KSTEM and Porter
http://ir.iit.edu/~abdur/research/conflation/AIRE-Stemming-System.html: "Pickens [9] later expanded that research by examining the effects of using a combination of kstem and porter with co-occurrence information on precision/recall metrics and found a statistical improvement." (also more background)
Software
BearbeitenOnly freely availabe software, prefered GPL:
http://snowball.tartarus.org/ : Snowball es una pequeña lenguaje de programación para el manejo de strings que permite más facil implementar algoritmos de stemming. Puede genear codigo en ANSI C y Java.
- http://www.xapian.org The Xapian Project: includes stemmers for many languages
- http://www.tartarus.org/~martin/PorterStemmer/ Official site of porter stemmer (en). Many implementations available
- Official site of the Lancaster (Paice/Husk) stemming algorithm (including bibliography)
- Muscat Stemmers: open.muscat.com moved to apr smartlogic. was available for free.
- SWISH-E contains stemmer(s?) too.
Artículos
Bearbeiten- A. Honrado, R. Leon, R. O'Donnel, D. Sinclair: A Word Stemming Algorithm for the Spanish Language
- Angel F. Zazo Rodríguez...: Term expansion using stemming and thesauri in spanish
Conferences:
Bearbeiten- Text REtrieval Conference (TREC): http://trec.nist.gov/ since 1993
- CLEF: Cross-Language Evaluation Forum: http://clef.iei.pi.cnr.it:2002/ since 2000?
- Symposium on String Processing Information Retrieval (SPIRE): since 1994?
¿Qué es stemming?
Bearbeiten- Más informaciónes en libros?
- http://alarcos.inf-cr.uclm.es/doc/ARI/trans/Tema4.pdf
- http://www.inf.udec.cl/~andrea/cursos/retrieval/texto.pdf
lgoritmo de stemming de Porter
Bearbeiten(facil de traducir para diferentes idiomas)
[C](VC)m[V]
- C: consonante
- VC: vocalicos, consonanticos
- http://www.tartarus.org/~martin/PorterStemmer/
n=2: digram Indice de similaridad: ISa,b=2(nº de digramas comunes)/[(nº de digramas en palabra a)+(nº de digramas en palabra a)], .
Si el indice de dos palabras superior un valor => son las mismas
- cojer una palabra => forma canonica
word bigrams ("home run"), character bigrams
- Phrase recognition:
- Statistical
- Part of speech tagging
- Syntactic parsing (parse tree)