It is built on top of the open-source Lucene package, http://jakarta.apache.org/lucene/ Though written in Java language, it can be used from command-line shells, and performs well that way (current uses include Perl CGI's calling lucegene).
It includes common text search features: booleans, phrases, word stemming, fuzzy and field range searches, relevance ranking. Lucene is comparable to the index/search methods used by web-indexing systems such as Glimpse, Exite, Alta-vista, and Google.
LuceGene additions include Data input adaptors for HTML; XML (e.g. MedLine); FlyBase flatfile; Biosequences (GenBank, EMBL, etc.) Basic output formats for XML, HTML via XSLT, Text, Spreadsheet Numeric Range search (** added April 2004) It has been tested with 100,000s of FlyBase Genes, References, Game and Chado XML annotations euGenes gene summaries & Daphnia Medline, Sequences, HTML documents Lucene is used by LuceGene un-changed, but LuceGene adds Lucene class overrides for biology data.
See the cvs.sourceforge.net repository for gmod/lucegene,
or alternatively
http://eugenes.org:8081/gmod/lucegene/
LuceGene is also available as part of the ARGOS genome database
replication system at
rsync://eugenes.org/argos/common/java/lucegene/