euGenes .. Fish .. Fly .. Human .. Mouse .. Mosquito .. Rat .. Weed .. Worm .. Yeast Help .. Preferences

About GFF files

The .gff and .fasta files located in the files download area of the GMOD web site correspond to feature and dna information for the model organism systems drosophila, C. elegans, and yeast. They are designed to be loaded into the Generic Genome Browser (GBrowse) for browsing. You can think of them as a starter kit for your own genome browser.

These files are *not* necessarily kept up to date, but are imported from the model organism databases at irregular intervals. You are strongly advised to generate your own versions of these files if you want the most current data.

To assist in updating, the GBrowse distribution comes with several scripts for converting the data downloaded from the model organism databases into .gff format. These are:      Import C. elegans annotations from WormBase           Import S. cerevisiae annotations from SGD        Import D. melanogaster annotations from Flybase

Here is a brief description of the process for importing these files:

  1. WormBase

    The GFF files distributed at WormBase are actually useable as is. The script adds some useful information to the GFF files, most notably the positions of genetically mapped genes. However you will need the Ace module (available at to use it.


    Go to and download the current.gff3.gz file that you find there. Put them all into one local directory named ``wormbase_orig''.

  2. b)

    While you're there, go to and download the current.dna.fa.gz file that you find there. Put them into wormbase_orig too.


    Create a new directory called wormbase_new``.


    Convert the WormBase GFF files into gbrowse GFF files:

   wormbase_orig > wormbase_new/wormbase.gff

    Copy the DNA files to wormbase_new

            mv wormbase_orig/*.fa wormbase_new

    Load everything -- see gbrowse instructions for how this works. -d elegans -f wormbase_new wormbase_new/wormbase.gff
  3. FlyBase

    The FlyBase files are maintained in a Berkeley database called GadFly. They must be processed before they can be used in gbrowse.


    Go to and download the files named RELEASEXXgff.2L.tar.gz, RELEASEXXgff.3L.tar.gz and so on, where XX corresponds to the latest release. These are annotation files.

  4. b)

    Go to and get the file na_arms.dros.RELEASEXX.Z. This contains the sequence in FASTA format. Make sure to use the same release number as the annotation files!


    Unpack the annotation files to yield a directory named after the release, e.g. RELEASE2, containing a directory named after the chromosome arm. Do this repeatedly in order to create a directory that contains each of the chromosome arms, i.e.:


    Run the script to convert into gbrowse GFF format: ./RELEASE2 > fly.gff

    Run the following script to put the fly FASTA files into a loadable format:

       uncompress -c na_arms.dros.RELEASEXX.Z  | \
            perl -pe 's/^>Chromosome_arm_(S+)/>/' > fly.fa

    Run the GFF loader -d fly -f fly.fa fly.gff
  5. SGD (yeast)

    Go to and download the files

  6. b)

    Go to and download all the .fsa files.


    Run the script to create a loadable GFF file. > yeast.gff

    Run the following script to put the FASTA files into a loadable format:

       perl -pe 's/>.+chromosome=(\w+)/>$1//' *.fsa > yeast.fa

    Run the GFF loader -d yeast -f yeast.fa yeast.gff


File formats and paths change all the time. These recipes worked as of 11/07/02, but are not guaranteed for the future!

Send comments to us at
euGenes uses Argos: A Replicable Genome infOrmation System