euGenes
.. Fish
.. Fly
.. Human
.. Mouse
.. Mosquito
.. Rat
.. Weed
.. Worm
.. Yeast
Help
.. Preferences
euGenes: Summary of eukaryote genomes
July 2002
| Genes reported [1] | Located on
genome | Predicted genes | Protein available |
Homology [2] | GO data | Genome kilobases | Genome features [3] |
|
| Fruitfly | 25,728 | 51% | 35% | 52% | 61% | 27% | 116,109 | 100,309 |
| Human | 53,210 | 84% | 61% | 87% | 39% | 38% | 3,118,900 | 258,835 |
| Mouse | 36,433 | -- | -- | 25% | 94% | 22% | -- | -- |
| Mosquito | 12,687 | 100% | 91% | 99% | 66% | -- | 231,408 | 161,565 |
| Weed | 28,129 | 100% | -- | 88% | 43% | 35% | 117,429 | 84,320 |
| Worm | 22,705 | 90% | 78% | 85% | 30% | 29% | 100,270 | 244,675 |
| Yeast | 7,222 | 92% | 32% | 83% | 34% | 89% | 12,156 | 13,876 |
| Zebrafish | 1,583 | -- | -- | 66% | 89% | -- | -- | -- |
|
|
Table column details Notes
|
Historical: euGenes summary table from
July 2001
| Genes reported[1] | Located on genome | Homology [2] | GO data | Genome kilobases | Genome features |
|
| Fruitfly | 23,649 | 56% | 44% | 31% | 116,094 | 41,570 |
| Human | 37,049 | 66% | 76% | -- | 3,310,005 | 1,575,667 |
| Mouse | 28,210 | -- | 88% | 20% | -- | -- |
| Weed | 26,819 | 100% | 18% | 14% | 116,702 | 54,053 |
| Worm | 21,881 | 100% | 27% | 27% | 100,090 | 207,478 |
| Yeast | 7,226 | 90% | 30% | 88% | 12,155 | 13,594 |
| Zebrafish | 1,221 | -- | 87% | -- | -- | -- |
|
Table columns:
- (a) Genes reported - count of genes available in euGenes data set from source databases (see note [1]).
- (b) Located on Genome - percent of reported genes which have been located on full genome.
- (c) Predicted genes - percent of reported genes which are identified as predicted or computed, i.e. lacking experimental verification as yet.
- (d) Protein availble - protein sequence or coding sequence translation from DNA is available. See the * Reference proteins data file in each organism page for these.
- (e) Homology - Percent of reported genes with significant protein homology to any other protein in the euGenes data set, and with additional organisms. Find details listed in this homology summary table. See note [2].
- (f) GO data - Percent of reported genes with GeneOntology term associations
- (g) Genome kilobases - total kilobases of DNA in genome, which can include counts of as yet unsequenced regions (NNN). See * Feature tables and DNA for genome for each organism.
- (h) Genome features - total number of genome feature annotations Find details listed in this feature summary table. See note [3].
"--" indicates data not yet available.
Notes:
[1] Reported Gene counts are not yet the true number of genes in
these organisms. They count valid gene records extracted from
genome project sources. This number will converge on true gene
counts as the genome project annotations become more accurate.
Factors that vary these from true gene count include orphan gene
records (from older research that cannot be confirmed to exist on
genome), prediction artifacts, unmerged predicted - experimental
records, and unfinished sequencing gaps. I.e., there are genes
reported here that do not truly exist, and there exist genes that
are not yet reported here.
In particular, the Fruitfly reported gene count is likely larger
than true gene number, owing to extensive orphan gene records. The
true number is likely close to Protein available counts
of 13,559 (July 2002)
[2] Percent homology is the number of protein coding sequences
showing significant homology to any organisms in the test set,
compared to available protein coding sequences for an organism.
Test organisms include these 8 euGenes, and Rat, Rice and E.coli
proteome set.
[3] Significant changes in counts of genome features (2001 to
2002) reflect (a) different sources for feature annotations; (b)
different collating of features. Hopefully the newer data
represent better feature identifications.
Send comments to us at
eugenes@iubio.bio.indiana.edu
euGenes uses Argos: A Replicable Genome infOrmation System