euGenes .. Fish .. Fly .. Human .. Mouse .. Mosquito .. Rat .. Weed .. Worm .. Yeast Help .. Preferences

Index of /gmod/genomeview-package2008

      Name                    Last modified       Size  Description

[DIR] Parent Directory 10-Mar-2008 14:13 - [   ] databases - [   ] ggb169-package.tgz -

Genome viewing package

Genome viewing package


This package has genome data sets from several projects and GBrowse software to view it in a complete, configured bundle.


  Fetch and run:
  1.  curl | gtar -zxf -
  2.  cd ggb169; java -jar start.jar &
  3.  View at http://localhost:8080/gbrowse/
  Add Daphnia pulex genome dataset
    rsync -L -auv rsync:// databases/
    cp databases/daphnia_pulex/daphnia_pulex.conf  conf/gbrowse.conf/
  View at http://localhost:8080/gbrowse/cgi-bin/gbrowse/daphnia_pulex/

  Add DrosMel genome dataset
    rsync -L -auv rsync:// databases/
    cp databases/drosmel/drosmel5dg.conf  conf/gbrowse.conf/
  View at http://localhost:8080/gbrowse/cgi-bin/gbrowse/drosmel5dg/ 


Ready to run GBrowse software package (see below):

Data sets for this package include: 
    daphnia_pulex (daphnia genome data from 
    nasonia  (wasp gene predictions, homology, EST)
    tribcas (tribolium; basic gene set from NCBI genomes)
    drosmel : DrosMel rel 5.5 genome data and Affymetrix transcriptome databases
     (lucene and wiggle data files).

Both rsync and ftp can be used with same URL paths. See also here for notes and updates


This package uses a 'java' installation approach, where all needed software is packaged together in a way that doesn't require additional installs, and can be run in place.

There are caveats to this, in particular the GD graphics library must be installed on your computer. You many need to add other compiled Perl libraries depending on the version of your computer system.

If you run this on a Mac OSX 10.5-Intel or a Solaris10-Intel computer, this GD graphics library is included in the package, in folders lib/darwin-thread-multi-2level/auto/GD/ lib/i86pc-solaris/auto/GD/

NOTE: This prebuilt for Mac OSX 10.5-Intel does not work on Mac OS 10.4, you will need to install the GD library from other packages.

See for tutorials on installing and using GBrowse in the recommended way, which generally involves installing packages in one unix (or mswin) system path (e.g. /usr/local/...)


Given prerequisites, this package installs and runs by

  1. Copy this package and unpack
    ftp .
    gtar -zxf ggb169-package.tgz  # creates folder ggb169
    cd ggb169
  2. Start the Jetty embedded web server
    java -jar start.jar &
    (or double click start.jar on MacOSX to start)
  3. View on your compter in web browser

Current contents

    bin/              : scripts to process genome data
    cgi-bin/          : gbrowse web programs
    conf/gbrowse.conf/: configurations for new data sets, features
    databases/        : add new data here (gff, wiggle, lucene indices of data)
    htdocs/gbrowse/   : web files
    lib/              : GBrowse, Bioperl and java (lucene, jetty) library files
    etc/              : jetty.xml and webdefault.xml configuration
    logs/             : web logs
    start.jar         : jetty web server startup program


Adding pre-made databases

You can add databases others have built such as this one for DrosMel genome + transcriptome data. Fetch the data files into the databases/ folder, then copy out the included gbrowse.conf file for this data set

  rsync -auv rsync:// databases/
  #old# rsync -auv rsync://  databases/drosmel/
  cp databases/drosmel/drosmel5dg.conf conf/gbrowse.conf

This is a largish data set ( 5 GB currently, about 4.5 GB as wiggle dense data). You can preview this more quickly the data using ``rsync {above} --exclude='*.wig''', then add all without the --exclude.

There may be other genome data sets available, and/or updates over time, that you can install with rsync. To view updates, use '-n' with above rsync (-n or --dry-run : show only files to update) Check at for notes.

Adding your own data

  * add any GFF v.3 using GBrowse lucene adaptor, as below, or use a MySQL database.

Lucene is java-based, and all software is included here (lucene version 2) given a Java version of 1.4 or 1.5 on your system. Lucene indices are platform independent so you can share the full database/xxx/ file set with other users. Lucene and mysql databases are about equally fast.

  * use the wiggle2gff scripts to turn genome-wide high density data such
  as tile array signals into .wiggle data + locations in gff.

The wiggle data files created by wiggle[23] are not platform-independent. They depend on byte-order (little/big-endian). I believe they can be shared across operating systems with the same byte-order. E.g. I used same wiggle data files on Mac-Intel and Solaris-Intel systems, but not on Mac-Intel and Mac-PPC.

  * add new feature types to conf/gbrowse.conf/{mygbrowse.conf} configuration
  and view.

Sample Yeast Chr1

Create sample database using lucene adaptor

  perl -Ilib bin/ --java lib/java/ --create \
  --data databases/yeast/chr1  \
  --fasta htdocs/gbrowse/databases/yeast_chr1/*.fa \

By default lucene_bulk_load_gff writes also 'mygbrowse.conf', or --conf newname.conf Edit this to suit and copy this into conf/gbrowse.conf/ to view results on web.

  cp mygbrowse.conf conf/gbrowse.conf/testyeast.conf

The lucene adaptor allows multiple data indices in configuration, so you can add new data without combining as one index. E.g. this lists 4 database directories of features on the same drosmel genome: genome, chrs, wig0, tf0

    db_adaptor    = Bio::DB::GFF
    db_args       = -adaptor lucene
       -dsn ../databases/drosmel/genome;chrs;wig0;tf0

You can also add to an existing lucene index; remove the '--create' flag: --java lib/java/ --data mylucenedatabase new.gff

Adding Affymetrix wiggle data

This step and the next assume you have available the transcriptome expression array data for DrosMel modENCODE from See the script bin/ that uses these basic steps

  # 1. signal to .wiggle data file + main gff
  perl -Ilib bin/ -log -format affy -span 38 -base databases/drosmel/$dname \
      $sg/$gr*_${bw} > databases/drosmel/$dname-$gr.gff
  # 2. correct main .gff
  cat databases/drosmel/$dname-*.gff | \
  perl -ne's/^chr//; s,wigfile=databases,wigfile=../databases,; print if(/^\S/);' |\
  sort -k1,1 | uniq > databases/drosmel/$dname.gff
  # 3. load gff into lucene database
  perl -Ilib bin/ --java lib/java/ --create \
     --data databases/drosmel/$dname databases/drosmel/$dname.gff
  Note the mygbrowse.conf can be edited and put into
  a common conf/gbrowse.conf/drosmel.conf
  But see instead bin/ that may better write these signal feature configs

Adding Affymetrix transfag data

  set tf=$sc/dmel5/modenc/38bp-arrays/transfrags/
  set tdir=bandwidth50_maxgap90_minrun90
  set dname=tf50_90_90
  unzip $tf/$
  mkdir databases/drosmel/$dname
  # 1. convert .bed to .gff
  perl -Ilib bin/ -asgff -format=affy -base=databases/drosmel/$dname $tdir/*.bed
  # 2. fix chromosome ref names to match flybase chr names (no 'chr')
  perl -pi -e's/^chr//;'  databases/drosmel/$dname/*.gff
  # 3. load gff into lucene database
  perl -Ilib bin/ --java lib/java/ --create \
      --data databases/drosmel/$dname databases/drosmel/$dname/*.gff
  See bin/ to write transfrag feature configs to conf/gbrowse.conf/drosmel.conf


  If the above URLs fail to show, read thru the web logs
  1. If you have no web pages, likely jetty failed.
     See etc/jetty.readme.txt for details on this.
  2. If you see the /gbrowse/ static web pages, but not
  the cgi-bin/gbrowse/ pages, likely the perl libraries
  are incomplete and log messages can be helpful.
  See also the helt documents in htdocs/gbrowse/

Install Options

  You may want to change web configurations in 
    etc/jetty.xml, etc/webdefaults.xml
  such as port # 8080, or access controls.

Apache web config

You can run this package in another web server like Apache. Here is what I used, with appropriate paths to the package files.

  ## apache (version 1) configuration for port-based virtual host
  Listen 8091
  <VirtualHost _default_:8091>
  DocumentRoot /bio/argos/common/perl/gbrowsetest/htdocs
  # ServerName
  # ServerAdmin
  ScriptAlias /gbrowse/cgi-bin/ "/bio/argos/common/perl/gbrowsetest/cgi-bin/"
  Alias /gbrowse /bio/argos/common/perl/gbrowsetest/htdocs/gbrowse
  <Directory "/bio/argos/common/perl/gbrowsetest/">
    Options Indexes FollowSymLinks MultiViews
    Order Allow,Deny
    Allow from all
    Deny from  env=is_nasty env=is_robot
  ## probably dont need such as this if Jetty package worked right
  # SetEnv PERL5LIB "/bio/argos/common/system-local/perl/lib/:/bio/argos/common/perl/lib/"


  Don Gilbert,
  2008 March

Send comments to us at
euGenes uses Argos: A Replicable Genome infOrmation System