This is an open-source document/object search and retrieval system
specially tuned for bioinformatics text databases and documents.
LuceGene is similar in concept to the widely used, commercially
successful, bioinformatics program SRS (Sequence Retrieval System).
It is built with the
open-source Lucene package.
It includes common text search features: booleans, phrases, word
stemming, fuzzy and field range searches, relevance ranking. It supports
data field structure of many kinds. Lucene is comparable to
web-indexing systems such as Exite, Alta-vista, and Google.
LuceGene adds these bio-data methods to Lucene:
Indexing adaptors for formats such as XML, PDF Documents, Biosequences,
Spreadsheets, HTML, and others.
Configurations for bio-data include UniProt/Swiss-Prot, Fasta
and GenBank sequences, BIND protein interactions, NCBI Gene Expression Omnibus,
BLAST output tables, Medline.
Support for batch-list look-ups and searches is included, useful for data miners.
Web applications offer paged search results, batch downloads,
search refinement and search-linking among data libraries.
Web Services support for data mining is included with a SOAP interface.
Output support includes field selection and
formats such as Spreadsheet, XML, HTML via XSLT, and
others.
LuceGene is speedy with big data sets:
Searching the UniProt library of 1.7 million sequences
with LuceGene is a close equivalent to SRS in speed and content.
Gene Annotation object search and retrieval with LuceGene is 10x to 20x
faster than using a Postgres Chado database.
LuceGene has been tested and works well with millions of documents from
genome sequence, annotation and literature databases.
Demo & Screenshots
Demo Screenshots
Demonstration server is available at
http://eugenes.org/demolucegene/
FlyBase Search preview
http://preview.flybase.net/lucegene/
euGenes genome search
http://eugenes.org/lucegene/
Daphnia/wFleaBase search
http://wfleabase.org/search/
Requirements
LuceGene requires Java versions 1.4 or later to compile and run.
A Java/JSP web server like Jakarta Tomcat
is used for the web application.
Jakarta Lucene software is included with this package, as are other
required java libraries.
Documentation
LuceGene Readme
INSTALL.txt for demo webapp use
Indexing methods overview
Talk slides on Argos/LuceGene, Sept 2003:
PowerPoint
PDF
Downloads
Current distribution files are at
SourceForge and
http://eugenes.org/gmod/lucegene/
lucegene.war : web application archive
lucegene-*-src.jar : sources, documents, configurations
sample data
for lucegene.war as lucegene_demo*.zip
Contact
email: lucegene AT eugenes.org
Current developers: Don Gilbert, Paul Poole, and others