Name Last modified Size Description
Parent Directory 10-Mar-2008 15:13 - databases - ggb169-package.tgz -
This package has genome data sets from several projects and GBrowse software to view it in a complete, configured bundle.
Fetch and run: 1. curl ftp://eugenes.org/eugenes/gbrowse/ggb169-package.tgz | gtar -zxf - 2. cd ggb169; java -jar start.jar & 3. View at http://localhost:8080/gbrowse/
Add Daphnia pulex genome dataset rsync -L -auv rsync://eugenes.org/eugenes/gbrowse/databases/daphnia_pulex databases/ cp databases/daphnia_pulex/daphnia_pulex.conf conf/gbrowse.conf/ View at http://localhost:8080/gbrowse/cgi-bin/gbrowse/daphnia_pulex/ Add DrosMel genome dataset rsync -L -auv rsync://eugenes.org/eugenes/gbrowse/databases/drosmel databases/ cp databases/drosmel/drosmel5dg.conf conf/gbrowse.conf/ View at http://localhost:8080/gbrowse/cgi-bin/gbrowse/drosmel5dg/
Ready to run GBrowse software package (see below):
ftp://eugenes.org/eugenes/gbrowse/ggb169-package.tgz
Data sets for this package include:
ftp://eugenes.org/eugenes/gbrowse/databases/ daphnia_pulex (daphnia genome data from wflebase.org) nasonia (wasp gene predictions, homology, EST) tribcas (tribolium; basic gene set from NCBI genomes) drosmel : DrosMel rel 5.5 genome data and Affymetrix transcriptome databases (lucene and wiggle data files).
Both rsync and ftp can be used with same URL paths. See also here for notes and updates
http://eugenes.org/gmod/genomeview-package2008/
This package uses a 'java' installation approach, where all needed software is packaged together in a way that doesn't require additional installs, and can be run in place.
There are caveats to this, in particular the GD graphics library must be installed on your computer. You many need to add other compiled Perl libraries depending on the version of your computer system.
If you run this on a Mac OSX 10.5-Intel or a Solaris10-Intel computer, this GD graphics library is included in the package, in folders lib/darwin-thread-multi-2level/auto/GD/ lib/i86pc-solaris/auto/GD/
NOTE: This prebuilt for Mac OSX 10.5-Intel does not work on Mac OS 10.4, you will need to install the GD library from other packages.See http://gmod.org/ for tutorials on installing and using GBrowse in the recommended way, which generally involves installing packages in one unix (or mswin) system path (e.g. /usr/local/...)
Given prerequisites, this package installs and runs by
1. Copy this package and unpack
ftp ftp://eugenes.org/eugenes/gbrowse/ggb169-package.tgz . gtar -zxf ggb169-package.tgz # creates folder ggb169 cd ggb169
2. Start the Jetty embedded web server java -jar start.jar & (or double click start.jar on MacOSX to start)
3. View on your compter in web browser http://localhost:8080/gbrowse/ http://localhost:8080/gbrowse/cgi-bin/gbrowse/
bin/ : scripts to process genome data cgi-bin/ : gbrowse web programs conf/gbrowse.conf/: configurations for new data sets, features databases/ : add new data here (gff, wiggle, lucene indices of data) htdocs/gbrowse/ : web files lib/ : GBrowse, Bioperl and java (lucene, jetty) library files etc/ : jetty.xml and webdefault.xml configuration logs/ : web logs start.jar : jetty web server startup program
You can add databases others have built such as this one for DrosMel genome + transcriptome data. Fetch the data files into the databases/ folder, then copy out the included gbrowse.conf file for this data set
rsync -auv rsync://eugenes.org/eugenes/gbrowse/databases/drosmel databases/ #old# rsync -auv rsync://eugenes.org/eugenes/genomes/dmel5/gbrowse/ databases/drosmel/ cp databases/drosmel/drosmel5dg.conf conf/gbrowse.conf
This is a largish data set ( 5 GB currently, about 4.5 GB as wiggle dense data). You can preview this more quickly the data using ``rsync {above} --exclude='*.wig''', then add all without the --exclude.
There may be other genome data sets available, and/or updates over time, that you can install with rsync. To view updates, use '-n' with above rsync (-n or --dry-run : show only files to update) Check at http://eugenes.org/gmod/genomeview-package2008/ for notes.
* add any GFF v.3 using GBrowse lucene adaptor, as below, or use a MySQL database.
Lucene is java-based, and all software is included here (lucene version 2) given a Java version of 1.4 or 1.5 on your system. Lucene indices are platform independent so you can share the full database/xxx/ file set with other users. Lucene and mysql databases are about equally fast.
* use the wiggle2gff scripts to turn genome-wide high density data such as tile array signals into .wiggle data + locations in gff.
The wiggle data files created by wiggle[23]gff3.pl are not platform-independent. They depend on byte-order (little/big-endian). I believe they can be shared across operating systems with the same byte-order. E.g. I used same wiggle data files on Mac-Intel and Solaris-Intel systems, but not on Mac-Intel and Mac-PPC.
* add new feature types to conf/gbrowse.conf/{mygbrowse.conf} configuration and view.
Create sample database using lucene adaptor
perl -Ilib bin/lucene_bulk_load_gff.pl --java lib/java/ --create \ --data databases/yeast/chr1 \ --fasta htdocs/gbrowse/databases/yeast_chr1/*.fa \ htdocs/gbrowse/databases/yeast_chr1/*.gff
By default lucene_bulk_load_gff writes also 'mygbrowse.conf', or --conf newname.conf Edit this to suit and copy this into conf/gbrowse.conf/ to view results on web.
cp mygbrowse.conf conf/gbrowse.conf/testyeast.conf
The lucene adaptor allows multiple data indices in configuration, so you can add new data without combining as one index. E.g. this lists 4 database directories of features on the same drosmel genome: genome, chrs, wig0, tf0
db_adaptor = Bio::DB::GFF db_args = -adaptor lucene -dsn ../databases/drosmel/genome;chrs;wig0;tf0
You can also add to an existing lucene index; remove the '--create' flag: lucene_bulk_load_gff.pl --java lib/java/ --data mylucenedatabase new.gff
This step and the next assume you have available the transcriptome expression array data for DrosMel modENCODE from transcriptome.affymetrix.com See the script bin/affysig2wig.sh that uses these basic steps
# 1. signal to .wiggle data file + main gff
perl -Ilib bin/wiggle3gff3.pl -log -format affy -span 38 -base databases/drosmel/$dname \ $sg/$gr*_${bw}.sig.gr > databases/drosmel/$dname-$gr.gff
# 2. correct main .gff cat databases/drosmel/$dname-*.gff | \ perl -ne's/^chr//; s,wigfile=databases,wigfile=../databases,; print if(/^\S/);' |\ sort -k1,1 | uniq > databases/drosmel/$dname.gff
# 3. load gff into lucene database perl -Ilib bin/lucene_bulk_load_gff.pl --java lib/java/ --create \ --data databases/drosmel/$dname databases/drosmel/$dname.gff
Note the mygbrowse.conf can be edited and put into a common conf/gbrowse.conf/drosmel.conf But see instead bin/gff2conf.pl that may better write these signal feature configs
set tf=$sc/dmel5/modenc/38bp-arrays/transfrags/ set tdir=bandwidth50_maxgap90_minrun90 set dname=tf50_90_90
unzip $tf/$tdir.zip mkdir databases/drosmel/$dname
# 1. convert .bed to .gff perl -Ilib bin/wiggle3gff3.pl -asgff -format=affy -base=databases/drosmel/$dname $tdir/*.bed
# 2. fix chromosome ref names to match flybase chr names (no 'chr') perl -pi -e's/^chr//;' databases/drosmel/$dname/*.gff
# 3. load gff into lucene database perl -Ilib bin/lucene_bulk_load_gff.pl --java lib/java/ --create \ --data databases/drosmel/$dname databases/drosmel/$dname/*.gff
See bin/gff2conf.pl to write transfrag feature configs to conf/gbrowse.conf/drosmel.conf
If the above URLs fail to show, read thru the web logs logs/*.stderrout.log
1. If you have no web pages, likely jetty failed. See etc/jetty.readme.txt for details on this.
2. If you see the /gbrowse/ static web pages, but not the cgi-bin/gbrowse/ pages, likely the perl libraries are incomplete and log messages can be helpful. See also the helt documents in htdocs/gbrowse/
You may want to change web configurations in etc/jetty.xml, etc/webdefaults.xml such as port # 8080, or access controls.
You can run this package in another web server like Apache. Here is what I used, with appropriate paths to the package files.
## apache (version 1) configuration for port-based virtual host Listen 8091 <VirtualHost _default_:8091> DocumentRoot /bio/argos/common/perl/gbrowsetest/htdocs # ServerName insects.eugenes.org # ServerAdmin eugenes@eugenes.org
ScriptAlias /gbrowse/cgi-bin/ "/bio/argos/common/perl/gbrowsetest/cgi-bin/"
Alias /gbrowse /bio/argos/common/perl/gbrowsetest/htdocs/gbrowse <Directory "/bio/argos/common/perl/gbrowsetest/"> Options Indexes FollowSymLinks MultiViews Order Allow,Deny Allow from all Deny from env=is_nasty env=is_robot </Directory>
## probably dont need such as this if Jetty package worked right # SetEnv PERL5LIB "/bio/argos/common/system-local/perl/lib/:/bio/argos/common/perl/lib/"
</VirtualHost>
Don Gilbert, gilbertd@indiana.edu 2008 March