euGenes .. Fish .. Fly .. Human .. Mouse .. Mosquito .. Rat .. Weed .. Worm .. Yeast Help .. Preferences

euGenes: Summary of eukaryote genomes
July 2002

Genes
reported [1]
Located
on genome
Predicted
genes
Protein
available
Homology
[2]
GO
data
Genome
kilobases
Genome
features [3]

Fruitfly 25,728 51% 35% 52% 61% 27% 116,109 100,309
Human 53,210 84% 61% 87% 39% 38% 3,118,900 258,835
Mouse 36,433 -- -- 25% 94% 22% -- --
Mosquito 12,687 100% 91% 99% 66% -- 231,408 161,565
Weed 28,129 100% -- 88% 43% 35% 117,429 84,320
Worm 22,705 90% 78% 85% 30% 29% 100,270 244,675
Yeast 7,222 92% 32% 83% 34% 89% 12,156 13,876
Zebrafish 1,583 -- -- 66% 89% -- -- --

Table column details    Notes

Historical: euGenes summary table from
July 2001

Genes
reported[1]
Located
on genome
Homology
[2]
GO
data
Genome
kilobases
Genome
features

Fruitfly 23,649 56% 44% 31% 116,094 41,570
Human 37,049 66% 76% -- 3,310,005 1,575,667
Mouse 28,210 -- 88% 20% -- --
Weed 26,819 100% 18% 14% 116,702 54,053
Worm 21,881 100% 27% 27% 100,090 207,478
Yeast 7,226 90% 30% 88% 12,155 13,594
Zebrafish 1,221 -- 87% -- -- --

Table columns:

"--" indicates data not yet available.

Notes:

[1] Reported Gene counts are not yet the true number of genes in these organisms. They count valid gene records extracted from genome project sources. This number will converge on true gene counts as the genome project annotations become more accurate. Factors that vary these from true gene count include orphan gene records (from older research that cannot be confirmed to exist on genome), prediction artifacts, unmerged predicted - experimental records, and unfinished sequencing gaps. I.e., there are genes reported here that do not truly exist, and there exist genes that are not yet reported here.

In particular, the Fruitfly reported gene count is likely larger than true gene number, owing to extensive orphan gene records. The true number is likely close to Protein available counts of 13,559 (July 2002)

[2] Percent homology is the number of protein coding sequences showing significant homology to any organisms in the test set, compared to available protein coding sequences for an organism. Test organisms include these 8 euGenes, and Rat, Rice and E.coli proteome set.

[3] Significant changes in counts of genome features (2001 to 2002) reflect (a) different sources for feature annotations; (b) different collating of features. Hopefully the newer data represent better feature identifications.


Send comments to us at eugenes@iubio.bio.indiana.edu
euGenes uses Argos: A Replicable Genome infOrmation System