Generic Model Organism Database Construction Set

[an error occurred while processing this directive]

LuceGene: Document/Object Search and Retrieval for Genome Databases

Description
This is an open-source document/object search and retrieval system specially tuned for bioinformatics text databases and documents. It is part of the GMOD (Generic Model Organism Database) project, http://www.gmod.org/lucegene/, and also http://eugenes.org:8081/gmod/lucegene/ LuceGene is similar in concept to the widely used, commercially successful, bioinformatics program SRS (Sequence Retrieval System).

It is built on top of the open-source Lucene package, http://jakarta.apache.org/lucene/ Though written in Java language, it can be used from command-line shells, and performs well that way (current uses include Perl CGI's calling lucegene).

It includes common text search features: booleans, phrases, word stemming, fuzzy and field range searches, relevance ranking. Lucene is comparable to the index/search methods used by web-indexing systems such as Glimpse, Exite, Alta-vista, and Google.

LuceGene additions include Data input adaptors for HTML; XML (e.g. MedLine); FlyBase flatfile; Biosequences (GenBank, EMBL, etc.) Basic output formats for XML, HTML via XSLT, Text, Spreadsheet Numeric Range search (** added April 2004) It has been tested with 100,000s of FlyBase Genes, References, Game and Chado XML annotations euGenes gene summaries & Daphnia Medline, Sequences, HTML documents Lucene is used by LuceGene un-changed, but LuceGene adds Lucene class overrides for biology data.

Demo & Screenshots

Public services using LuceGene (Apr 2004)

Apollo Service notes:

Game XML object retrieval using Lucene is 10x to 20x faster than generating them from Postgres Chado db (Pg slows down more the larger the object set/region). Using LuceGene, one will get a gene query result in 10 to 20 seconds (much of that time is for network transfer), compared to a 3 to 10 minutes with Postgres. A 20 MB Game XML message for cytologic band took 66 seconds using Lucene (mostly transfer time) but took 20 minutes calling Postgres.

Requirements

Documentation

Contact
lucegene AT eugenes.org
Current developers: Don Gilbert, Paul Poole, and others

Downloads
Currently these alpha distribution files are available -
  • lucegene-1.2-src.jar : sources, documents, configuration for base lucegene software with indexing methods for biology data
  • lucegene.war : binary distribution, for webapp (Tomcat) uses
  • lusearch-1.2-src.jar : source, etc. for search web app (currently configured for eugenes.org)
  • lusearch.war : binary distribution for search web app

    See the cvs.sourceforge.net repository for gmod/lucegene, or alternatively http://eugenes.org:8081/gmod/lucegene/
    LuceGene is also available as part of the ARGOS genome database replication system at rsync://eugenes.org/argos/common/java/lucegene/

  • [an error occurred while processing this directive]