euGenes .. Fish .. Fly .. Human .. Mouse .. Mosquito .. Rat .. Weed .. Worm .. Yeast Help .. Preferences

FlyBase Detailed Reference for Data Fields

See also this short table of all fields
Field key and label
Description: -- detailed explanation of field data
Brief help: -- short help as shown in reports
Data classes: -- which FlyBase data sections the field is used in
Star code: -- old field key
Java class: -- name of Java class that handles this field

ARGS    Annotated ref. sequence
Description: Thumbnail sketch of gene structure and the DNA sequence accession that was used as the reference sequence. A click on the graphic brings up the detailed annotated reference sequence. Reference sequences are based on sequence generated by the Drosophila Genome Projects, where possible. When no genome project sequence is available, the reference sequence is based on published genomic sequence information. Experimental evidence from published literature and from the DNA databanks (DDBJ/EMBL/GenBank) is placed on the reference sequence. Features shown on the annotated reference gene sequence include exons, mRNA, coding sequence, transposon insertion sites, aberration breakpoints, rescue constructs, mutations, and regulatory elements.
Brief help: none
Data classes: Gene
Java class: meow.genes.AnnoRefSeq (field)    Inherits from:

CEL    Cellular location
Description: This field, which uses a controlled vocabulary, describes the subcellular localization of the gene product, e.g., nucleus, mitochondrion, plasma membrane.

Brief help: none
Data classes: Gene, HUgn, MGgn, SGgn
Java class: meow.genes.CellLocation (field)    Inherits from:

CHR    Chromosome
Description: none
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.Chromosome (field)    Inherits from:

CLA    Class of gene
Description: classification of the genetic element, if other than gene, including mitochondrial gene, transposon, non coding RNA gene, and others; a controlled vocabulary This field holds information about the class of the genetic element. Allowed values are:
Brief help: none
Data classes: CEgn, Gene, HUgn, SGgn
Java class: meow.genes.GeneClass (field)    Inherits from:,

DBA    DNA/RNA accessions
Description: In these fields FlyBase stores pointers to nucleic acid sequence data, usually in the form of EMBL/Genbank/DDBJ/NCBI accession (AC) numbers. If a sequence has been published but is not yet in one of these data banks a brief journal reference is given instead (the full reference will be found inReferences). Data from the three nucleic sequence databases are received on a daily basis by FlyBase.

FlyBase is also cross-referenced to a number of other sequence databases. These are stored in the *g line (if nucleic acid) or *m line (if protein). Database codes(external-databases.txt) and database versions (versions.txt) are listed in the Allied-data/External-databases section.The EMBL/NCBI/DDBJ sequence accession numbers have no code prefix.
Syntax: *g <database_code/>accession_number
e.g., *g X12345 *g EPD/23023

If the nucleic acid sequence accession includes coding regions then each coding region has a unique PID number. These are appended to the nucleic acid sequence accession number, following a semi-colon, e.g.,
*g U42989; g1150983

Note that the number of PIDs attached to a sequence record may be more than one for two reasons. The first is that the EBI and NCBI often assign PID numbers independently to the same object; the other is that there is more than one protein product from a single gene (as the result, for example, of alternative splicing).

Brief help: none
Data classes: HUgn, MGgn, ZFgn
Java class: meow.genes.DBAccessions (field)    Inherits from:

DBL    Database accessions
Description: none
Brief help: none
Data classes: CEgn, Gene, HUgn, MGgn, ZFgn
Java class: meow.genes.DatabaseAcc (field)    Inherits from:

DID    Ref. Database
Description: none
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.DatabaseID (field)    Inherits from:

DT    Date
Description: Dating of records and updates. All gene records have two date fields.The first, 'Date entered', is the date a gene record was entered into the Sybase tables. The second is 'Last updated', the date the record was last updated. Whenentered the two dates will be the same. The 'zero' date of all records then extant was 16 May 1994. FlyBase dates arerepresented as dd mm yy, mm being the initial 3-letter abbreviation of the month, and yy being the last two digits of theyear (e.g., 01 Jul 94).

Brief help: none
Data classes: Gene
Java class: meow.genes.Dated (field)    Inherits from:

ENZ    Function
Description: If a gene product is an enzyme then the enzyme name and the corresponding Enzyme Commission number (EC number) are in the *F field. Enzyme names and EC numbers are from the ENZYME database (Bairoch, 1996, Nucleic Acids Research 24: 221- 222). If an enzyme is not included in this database then the most appropriate name is used. A gene may have more than one *F field if its product carries out more than one different reaction (an obvious example is rudimentary).
Syntax: *F enzyme_name == EC_number
e.g. *F serine-protease == EC

Controlled terms may be modified by free text following the \ symbol, e.g.:
*F enzyme_name <== EC_number> \?
*F enzyme_name <== EC_number> \like

Brief help: none
Data classes: CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.Enzyme (field)    Inherits from:

FNC    Process
Description: The terms used in this field are from a controlled vocabulary which aims to be descriptive of the function(s) of gene products. See Reference manual section B.8 for details on gene function. Controlled terms may be modified by free text following the \ symbol, e.g.:
*d controlled_term \?
*d controlled_term \like

Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn
Java class: meow.genes.GeneFunction (field)    Inherits from:

GENR    Gene Record
Description: Gene record, contains all the fields and subrecords needed to describe the gene.
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.GeneStore (record)    Inherits from:

GPD    Gene product
Description: none
Brief help: none
Data classes: Gene
Java class: meow.genes.Product (field)    Inherits from:

HG    Similar genes
Description: Cross-reference to non-Drosophila homolog(s)/analogs. This field is used to indicate genes in organisms (i.e., foreign species, species other than members of the family Drosophilidae) that are said to be 'similar' to genes in Drosophila. The data for this field come from two sources; publications and other databases. The data are very heterogeneous, since different authors use different criteria for deciding upon the similarity (or 'homology') between a Drosophila gene product and one in a foreign species. By and large these criteria are structural, that is to say they are based upon sequence comparison; on occasion, however, they may be functional, being based on the ability of a gene from Drosophila to complement a mutation in a gene in a foreign species (or vice versa).

Authors often only report that a Drosophila gene is similar (or 'homologous') to a gene from a broad group, e.g., mammals or vertebrates. On other occasions, they report similarity with a gene from a relatively poorly studied organism, e.g., sheep. FlyBase attempts to make the link with genes known from human or mouse, and will only report a match to another vertebrate if either this cannot be done, or such a link is clearly more appropriate.

The purpose of these links in FlyBase is as an aid to discovering more about genes from foreign species with which Drosophila genes are said to be similar.

Links are usually established to genes of Drosophila melanogaster. If FlyBase knows about homologous genes in other drosophilids then the links to genes in foreign species are automatically propagated to these other drosophilid genes.
Syntax: HG species == <foreign_species>; gene == <gene_symbol> [(<synonym_symbol>)]; database:ID[; database:ID] [[Reference]]

Wherever possible the gene symbols used in this field have been checked with the appropriate database, and the field includes the unique identifier used by that database. Gene symbols that have not been checked with another genetic database are enclosed within single quotation marks. In the case of human gene symbols FlyBase uses only those approved by the HUGO Nomenclature Committee. The [Reference] field is only used when the source of the link is not in the FlyBase bibliography. The default is a MEDLINE ID, although a 'mini-reference' or Mouse Genome Database identifier (MGD:JNUM) may be used as a temporary expedient.

Database abbreviations are:

Those marked with an asterisk appear to have no public gene identifier numbers.

If a foreign species database is not available, or a gene symbol cannot be verified, then the gene symbol is enclosed within single quotation marks '<symbol>'. In this cases a cross-reference is given to what is considered to be the appropriate SWISS-PROT (SWP:) or, failing that,EMBL (EMBL:) records.

Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.Homologue (field)    Inherits from:

ID    Database ID
Description: Each gene, allele or other data record in FlyBase has a unique identifier number (see section F.1. of FlyBase Reference Manual F: Links To and From FlyBase). The primary identifier number is in the ID(*z) field, secondary identifier numbers are in ID2(*y) fields.
Syntax: ID Fbgn_integer
e.g., ID FBgn0001234

Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.FBid (field)    Inherits from:,

MAP    Map location
Description: none
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn
Java class: meow.genes.GeneLoc (field)    Inherits from:

NAM    Full name
Description: This is the full name of the gene or allele. FlyBase takes a minimalist definition of a gene. As an example, Notch is regarded as a gene, but facet, Confluens, split etc. are not. These phenotypically distinct allelic forms that have, in the past, been named as if they were genetic loci are included as gene synonyms.

FlyBase is not entirely consistent in the way directly duplicated genes are handled: for example the five HSP70 encoding genes at Hsp70A and Hsp70B and the five larval cuticle protein encoding genes at 44D are all listed independently but the five major histone protein coding regions, tandemly repeated at the base of 2L, are each listed as a separate gene, but only once.
Syntax: *e <Nnnn>name
e.g., *e bobbed
*e DhydMinos

Some loci have only been identified by molecular methods, not having been mapped. Such loci are included in genes. Other "loci" included in this file have not been genetically mapped or characterized but are assumed to exist on the basis of, for example, a purified protein. Some loci have been impossible to name in any logical way, due to a lack of data. As a temporary expedient these are named as anon-*, where the * indicates a code. These loci will be renamed as and when more data becomes available.

Both the European and Berkeley Drosophila Genome Projects are now generating a considerable number of STS sequences. These all appear in the nucleic acid sequence data archive, and in the NCBI dbSTS database. These short sequences are routinely matched against the universe of public sequence data and often have 'significant' matches to genes identified in species other than Drosophila. Such matches are clues that similar genes may occur in D. melanogaster. For this reason STS sequences with significant matches are identified as 'genes' in this file, and have the temporary name ESTSn (for STS sequences from the European project) or BSTSn (for those from Berkeley), where n is the code used by the Genome Proect (e.g., ESTS100F7T, BSTSDm0092.) The nature of the most 'significant' match will be indicated in the *d or *F (function or enzyme) fields. STS sequences that match known Drosophila genes will be linked to the relevant gene record by their accession numbers in the GenBank/EMBL/DDBJ and dbSTS data archives. STS sequences that have no matches whatsoever are only linked to their parental clone in the clones tables. All STSs with matches are similarly linked to their parental clones in these tables.

FlyBase includes data from all species of the family Drosophilidae.

Brief help: none
Data classes: Gene, HUgn, MGgn, ZFgn
Java class: meow.genes.GeneName (field)    Inherits from:,

ORG    Organism
Description: none
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.Species (field)    Inherits from:

PAC    Protein accessions
Description: This field stores pointers to protein sequence data, usually in the form of SWISS-PROT/TREMBL/PIR protein sequence databank accession (AC) numbers. Because of potential clashes between the accession numbers between databases the AC numbers are prefixed "SWP/", "TREMBL/" or "PIR/".

These fields are also used for cross-references between FlyBase and structural data onDrosophila proteins held on PDB (Protein Data Bank, Brookhaven), the NRL_3D databank and the G protein-coupled receptor database (GCRDb). These records have the prefixes PDB/, NRL_3D/ and GCR/ respectively. Cross-references to the 'factors' table of the TRANSFAC database (E. Wingender, J. Biotechnol. 35:273-280, 1994) have the prefix TF/.
Syntax: *m database_code/accession_number
e.g. *m SWP/P12428

Brief help: none
Data classes: ZFgn
Java class: meow.genes.ProtAcc (field)    Inherits from:

PDOM    Protein domains
Description: Description of the structural features of gene products. These are derived from the PROSITE database (Bairoch et al., 1996, Nucleic Acids Research 24: 189-196) if the data are available therein. If not, then a similar term is invented (but will be replaced once the gene product is included in PROSITE). Terms invented by FlyBase always end in the word 'protein' (rather than 'domain'. 'signature', 'site', 'domain', 'profile', 'repeat' or 'pattern').

Syntax for PROSITE cross references:
Prosite_number == Prosite_accession_name
e.g., PS00018 == EF-hand calcium-binding domain.

Controlled terms may be modified by free text following the \ symbol, e.g.:
controlled_term \?
controlled_term \like

Brief help: none
Data classes: Gene, ZFgn
Java class: meow.genes.ProtDomain (field)    Inherits from:

PHI    Phenotypic info.
Description: Mutant phenotype. This holds the phenotypic description of the mutant allele. See also PHC (Phenotypic class), PHM (Phenotype manifest in).
This field is free text, except for Mode of assay. This field is mandatory for all alleles that have "MU in vitro construct". The possible entries in this Mode of assay field are:
In transgenic Drosophila
Whole-organism transient assay
Drosophila cell culture
In transgenic Drosophila (allele of one species in genome of another)
Whole-organism transient assay (allele from one species assayed in another)
Cell-free system
Yeast assay
Xenopus oocyte

Brief help: none
Data classes: ATgn, CEgn, SGgn
Java class: meow.genes.PhenotypicInfo (field)    Inherits from:

PHP    Phenotypic info.
Description: This field holds phenotypic information about a gene (or, as explained above, about its mutant alleles in some cases). This field is free text and, by and large, has not yet been standardized with respect to its vocabulary. [Free text.]

Brief help: none
Data classes: CEgn, HUgn, SGgn
Java class: meow.genes.PhenotypeP (field)    Inherits from:

PRD    Protein data
Description: none
Brief help: none
Data classes: CEgn, HUgn, MGgn
Java class: meow.genes.ProtData (field)    Inherits from:

REAB    Summary
Description: none
Brief help: none
Data classes: CEgn, HUgn
Java class: (field)    Inherits from:

RETE    Table Entry
Description: none
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: (field)    Inherits from:

RPA    Ref. protein
Description: none
Brief help: none
Data classes: ATgn, CEgn, Gene, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.RefProtein (field)    Inherits from: meow.genes.ProtAcc

RSQ    Ref. sequence
Description: none
Brief help: none
Data classes: Gene, HUgn, MGgn, ZFgn
Java class: meow.genes.RefSequence (field)    Inherits from: meow.genes.DBAccessions

SYM    Symbol
Description: This is the standard abbreviation (gene symbol) for the name of the gene. In the genes file, gene records are sorted alphabetically. The order of precedence is: all-greek symbols (in alphabetical order), symbols that begin with a number (in numerical order, secondarily sorted on suffix, i.e., 1, 2, 2a, 2b, 3), symbols that begin with a letter, lower case having precedence over upper, and numerals precedence over letters, i.e., b, B, b1, ba).
Syntax: SYM <Nnnn>\symbol
e.g., SYM bb
SYM Dhyd\Minos

Nnnn is an abbreviation for the species. The default species is D. melanogaster, in which case there is no species abbreviation. If a gene is from another species of drosophilid then this is indicated by Nnnn, where N is normally the initial letter of the genus, and nnn are normally the first three letters of the specific epithet. A list of species abbreviations is in the Nomenclature section of FlyBase.

Genes encoded by the mitochondrial genome all have the prefix Nnnnmt:. The D. melanogaster gene encoding the cytochrome oxidase subunit II is, therefore, mt:CoII, the D. simulans gene encoding the mitochondrial proline tRNA is Dsimmt:tRNA:P. The record MT:DNA is used for data concerning the mitochondrial genome and its products that cannot be assigned to any single mitochondrial gene. The symbol mt:ori is used for the non-coding A+T rich region of the mitochondrial origin of replication.

FlyBase includes data on artificial gene constructs, for example fusions between different genes. Fusion genes are named using the gene symbols of their components separated by a double colon, e.g., Antp::Scr. The components are listed in alphabetical order. When a component of a construct is from a species other than D. melanogaster then its symbol is prefixed by Nnnn to indicate the species of origin. For example the LexA protein from E. coli has the symbol Ecol. A list of the species abbreviations used is to be found in the Nomenclature section of FlyBase.

Some 'foreign' genes very frequently used in constructs are regarded as being 'honorary genes' of Drosophila and the species prefix is omitted. Examples include lacZ for the beta-galactosidase gene of E. coli and FRT for the S. cerevisiae FLP recombinase target. To see a list of 'honorary genes', use the Genes complex query form with the "Class" switched from the default "all" to "honorary_gene". Highlight both "foreign_gene" and "honorary_gene" (ctrl+click to add choices) to see a list of all 'foreign' genes.

Brief help: none
Data classes: ATgn, CEgn, Gene, FBgo, HUgn, MGgn, SGgn, ZFgn
Java class: meow.genes.GeneSymbol (field)    Inherits from:,

SYN    Synonyms
Description: Synonyms. As mentioned above FlyBase takes a very liberal view of synonyms, and the table gene-synonyms.txt in the Genes section is provided as a tool to allow the identification of the name, and symbol, that FlyBase uses for each gene or allele. In Genes these data are kept in the SYN field, for both gene and allele synonyms. See Reference manual section B.2 for more on synonyms.
Syntax: SYN synonym_symbol: synonym name <text, e.g. a reference>
e.g., SYN ho: heldout
Brief help: none
Data classes: ATgn, CEgn, Gene, FBgo, HUgn, SGgn
Java class: meow.genes.Synonyms (field)    Inherits from:

URL    Database URL
Description: none
Brief help: none
Data classes: HUgn
Java class: meow.genes.URLField (field)    Inherits from:

Send comments to us at
euGenes uses Argos: A Replicable Genome infOrmation System