euGenes .. Fish .. Fly .. Human .. Mouse .. Mosquito .. Rat .. Weed .. Worm .. Yeast Help .. Preferences

Index of /gmod/genomeview-package2008

      Name                    Last modified       Size  Description

[DIR] Parent Directory 10-Mar-2008 15:13 - [   ] ggb169-package.tgz 31-Dec-1969 19:00 - [   ] databases 31-Dec-1969 19:00 -

Genome viewing package

Genome viewing package

ABOUT

This package has genome data sets from several projects and GBrowse software to view it in a complete, configured bundle.



SYNOPSIS

  Fetch and run:
  1.  curl ftp://eugenes.org/eugenes/gbrowse/ggb169-package.tgz | gtar -zxf -
  2.  cd ggb169; java -jar start.jar &
  3.  View at http://localhost:8080/gbrowse/
  Add Daphnia pulex genome dataset
    rsync -L -auv rsync://eugenes.org/eugenes/gbrowse/databases/daphnia_pulex databases/
    cp databases/daphnia_pulex/daphnia_pulex.conf  conf/gbrowse.conf/
  View at http://localhost:8080/gbrowse/cgi-bin/gbrowse/daphnia_pulex/

  Add DrosMel genome dataset
    rsync -L -auv rsync://eugenes.org/eugenes/gbrowse/databases/drosmel databases/
    cp databases/drosmel/drosmel5dg.conf  conf/gbrowse.conf/
  View at http://localhost:8080/gbrowse/cgi-bin/gbrowse/drosmel5dg/ 


PACKAGE URLS

Ready to run GBrowse software package (see below):

    ftp://eugenes.org/eugenes/gbrowse/ggb169-package.tgz

Data sets for this package include:

    ftp://eugenes.org/eugenes/gbrowse/databases/ 
    daphnia_pulex (daphnia genome data from wflebase.org) 
    nasonia  (wasp gene predictions, homology, EST)
    tribcas (tribolium; basic gene set from NCBI genomes)
    drosmel : DrosMel rel 5.5 genome data and Affymetrix transcriptome databases
     (lucene and wiggle data files).

Both rsync and ftp can be used with same URL paths. See also here for notes and updates

  http://eugenes.org/gmod/genomeview-package2008/


REQUIREMENTS

This package uses a 'java' installation approach, where all needed software is packaged together in a way that doesn't require additional installs, and can be run in place.

There are caveats to this, in particular the GD graphics library must be installed on your computer. You many need to add other compiled Perl libraries depending on the version of your computer system.

If you run this on a Mac OSX 10.5-Intel or a Solaris10-Intel computer, this GD graphics library is included in the package, in folders lib/darwin-thread-multi-2level/auto/GD/ lib/i86pc-solaris/auto/GD/

NOTE: This prebuilt for Mac OSX 10.5-Intel does not work on Mac OS 10.4, you will need to install the GD library from other packages.

See http://gmod.org/ for tutorials on installing and using GBrowse in the recommended way, which generally involves installing packages in one unix (or mswin) system path (e.g. /usr/local/...)


INSTALL

Given prerequisites, this package installs and runs by

  1. Copy this package and unpack
    ftp  ftp://eugenes.org/eugenes/gbrowse/ggb169-package.tgz .
    gtar -zxf ggb169-package.tgz  # creates folder ggb169
    cd ggb169
  2. Start the Jetty embedded web server
    java -jar start.jar &
    (or double click start.jar on MacOSX to start)
  3. View on your compter in web browser
    http://localhost:8080/gbrowse/
    http://localhost:8080/gbrowse/cgi-bin/gbrowse/

Current contents

    bin/              : scripts to process genome data
    cgi-bin/          : gbrowse web programs
    conf/gbrowse.conf/: configurations for new data sets, features
    databases/        : add new data here (gff, wiggle, lucene indices of data)
    htdocs/gbrowse/   : web files
    lib/              : GBrowse, Bioperl and java (lucene, jetty) library files
    etc/              : jetty.xml and webdefault.xml configuration
    logs/             : web logs
    start.jar         : jetty web server startup program


ADDING GENOME DATA

Adding pre-made databases

You can add databases others have built such as this one for DrosMel genome + transcriptome data. Fetch the data files into the databases/ folder, then copy out the included gbrowse.conf file for this data set

  rsync -auv rsync://eugenes.org/eugenes/gbrowse/databases/drosmel databases/
  #old# rsync -auv rsync://eugenes.org/eugenes/genomes/dmel5/gbrowse/  databases/drosmel/
  cp databases/drosmel/drosmel5dg.conf conf/gbrowse.conf

This is a largish data set ( 5 GB currently, about 4.5 GB as wiggle dense data). You can preview this more quickly the data using ``rsync {above} --exclude='*.wig''', then add all without the --exclude.

There may be other genome data sets available, and/or updates over time, that you can install with rsync. To view updates, use '-n' with above rsync (-n or --dry-run : show only files to update) Check at http://eugenes.org/gmod/genomeview-package2008/ for notes.

Adding your own data

  * add any GFF v.3 using GBrowse lucene adaptor, as below, or use a MySQL database.

Lucene is java-based, and all software is included here (lucene version 2) given a Java version of 1.4 or 1.5 on your system. Lucene indices are platform independent so you can share the full database/xxx/ file set with other users. Lucene and mysql databases are about equally fast.

  * use the wiggle2gff scripts to turn genome-wide high density data such
  as tile array signals into .wiggle data + locations in gff.

The wiggle data files created by wiggle[23]gff3.pl are not platform-independent. They depend on byte-order (little/big-endian). I believe they can be shared across operating systems with the same byte-order. E.g. I used same wiggle data files on Mac-Intel and Solaris-Intel systems, but not on Mac-Intel and Mac-PPC.

  * add new feature types to conf/gbrowse.conf/{mygbrowse.conf} configuration
  and view.

Sample Yeast Chr1

Create sample database using lucene adaptor

  perl -Ilib bin/lucene_bulk_load_gff.pl --java lib/java/ --create \
  --data databases/yeast/chr1  \
  --fasta htdocs/gbrowse/databases/yeast_chr1/*.fa \
  htdocs/gbrowse/databases/yeast_chr1/*.gff

By default lucene_bulk_load_gff writes also 'mygbrowse.conf', or --conf newname.conf Edit this to suit and copy this into conf/gbrowse.conf/ to view results on web.

  cp mygbrowse.conf conf/gbrowse.conf/testyeast.conf

The lucene adaptor allows multiple data indices in configuration, so you can add new data without combining as one index. E.g. this lists 4 database directories of features on the same drosmel genome: genome, chrs, wig0, tf0

    db_adaptor    = Bio::DB::GFF
    db_args       = -adaptor lucene
       -dsn ../databases/drosmel/genome;chrs;wig0;tf0

You can also add to an existing lucene index; remove the '--create' flag: lucene_bulk_load_gff.pl --java lib/java/ --data mylucenedatabase new.gff

Adding Affymetrix wiggle data

This step and the next assume you have available the transcriptome expression array data for DrosMel modENCODE from transcriptome.affymetrix.com See the script bin/affysig2wig.sh that uses these basic steps

  # 1. signal to .wiggle data file + main gff
  perl -Ilib bin/wiggle3gff3.pl -log -format affy -span 38 -base databases/drosmel/$dname \
      $sg/$gr*_${bw}.sig.gr > databases/drosmel/$dname-$gr.gff
  # 2. correct main .gff
  cat databases/drosmel/$dname-*.gff | \
  perl -ne's/^chr//; s,wigfile=databases,wigfile=../databases,; print if(/^\S/);' |\
  sort -k1,1 | uniq > databases/drosmel/$dname.gff
  # 3. load gff into lucene database
  perl -Ilib bin/lucene_bulk_load_gff.pl --java lib/java/ --create \
     --data databases/drosmel/$dname databases/drosmel/$dname.gff
  Note the mygbrowse.conf can be edited and put into
  a common conf/gbrowse.conf/drosmel.conf
  But see instead bin/gff2conf.pl that may better write these signal feature configs

Adding Affymetrix transfag data

  set tf=$sc/dmel5/modenc/38bp-arrays/transfrags/
  set tdir=bandwidth50_maxgap90_minrun90
  set dname=tf50_90_90
  unzip $tf/$tdir.zip
  mkdir databases/drosmel/$dname
  # 1. convert .bed to .gff
  perl -Ilib bin/wiggle3gff3.pl -asgff -format=affy -base=databases/drosmel/$dname $tdir/*.bed
  # 2. fix chromosome ref names to match flybase chr names (no 'chr')
  perl -pi -e's/^chr//;'  databases/drosmel/$dname/*.gff
  # 3. load gff into lucene database
  perl -Ilib bin/lucene_bulk_load_gff.pl --java lib/java/ --create \
      --data databases/drosmel/$dname databases/drosmel/$dname/*.gff
  See bin/gff2conf.pl to write transfrag feature configs to conf/gbrowse.conf/drosmel.conf


PROBLEMS AND OPTIONS

  If the above URLs fail to show, read thru the web logs
  logs/*.stderrout.log
  1. If you have no web pages, likely jetty failed.
     See etc/jetty.readme.txt for details on this.
  2. If you see the /gbrowse/ static web pages, but not
  the cgi-bin/gbrowse/ pages, likely the perl libraries
  are incomplete and log messages can be helpful.
  See also the helt documents in htdocs/gbrowse/

Install Options

  You may want to change web configurations in 
    etc/jetty.xml, etc/webdefaults.xml
  such as port # 8080, or access controls.

Apache web config

You can run this package in another web server like Apache. Here is what I used, with appropriate paths to the package files.

  ## apache (version 1) configuration for port-based virtual host
  Listen 8091
  <VirtualHost _default_:8091>
  DocumentRoot /bio/argos/common/perl/gbrowsetest/htdocs
  # ServerName   insects.eugenes.org
  # ServerAdmin  eugenes@eugenes.org
  ScriptAlias /gbrowse/cgi-bin/ "/bio/argos/common/perl/gbrowsetest/cgi-bin/"
  Alias /gbrowse /bio/argos/common/perl/gbrowsetest/htdocs/gbrowse
  <Directory "/bio/argos/common/perl/gbrowsetest/">
    Options Indexes FollowSymLinks MultiViews
    Order Allow,Deny
    Allow from all
    Deny from  env=is_nasty env=is_robot
  </Directory>
  ## probably dont need such as this if Jetty package worked right
  # SetEnv PERL5LIB "/bio/argos/common/system-local/perl/lib/:/bio/argos/common/perl/lib/"
  </VirtualHost>


AUTHOR

  Don Gilbert, gilbertd@indiana.edu
  2008 March

Send comments to us at eugenes@iubio.bio.indiana.edu
euGenes uses Argos: A Replicable Genome infOrmation System