euGenes: Homologous Genes
euGenes .. Fish .. Fly .. Human .. Mouse .. Mosquito .. Rat .. Weed .. Worm .. Yeast Help .. Preferences

euGenes: Homologous Genes Summary Table

Genes with one or more similar genes or homologs in other organisms.
Number and percent with homologs, of total protein sequences available per organism.
  Homologs >
Source
FruitflyHumanMouseMosquitoWeedWormYeastZebrafishRatRiceE. coli|AnyN genes
Fruitfly -- 5624
44%
4302
33%
7470
58%
2427
19%
4096
32%
1772
13%
826
6%
2392
18%
989
7%
442
3%
|7888
61%
12728
Human8708
19%
-- 15159
33%
9400
20%
4534
9%
6750
14%
2842
6%
3421
7%
7674
16%
1593
3%
608
1%
|17892
39%
45767
Mouse4424
48%
8463
92%
-- 4296
46%
2072
22%
3550
38%
1428
15%
1663
18%
4266
46%
876
9%
344
3%
|8652
94%
9186
Mosquito7288
61%
5137
43%
3737
31%
-- 2359
19%
3822
32%
1565
13%
812
6%
2217
18%
992
8%
413
3%
|7834
66%
11868
Weed3783
15%
4468
18%
3382
13%
3690
15%
-- 3543
14%
2939
11%
366
1%
2070
8%
7886
32%
1076
4%
|10732
43%
24521
Worm4266
22%
4918
26%
3808
20%
4428
23%
2426
12%
-- 1817
9%
839
4%
2241
11%
996
5%
456
2%
|5684
30%
18893
Yeast1508
25%
1623
26%
1251
20%
1452
24%
1601
26%
1376
22%
-- 179
2%
832
13%
697
11%
440
7%
|2059
34%
6031
Zebrafish548
55%
852
86%
829
84%
514
52%
131
13%
417
42%
85
8%
-- 625
63%
68
6%
46
4%
|876
89%
982
Rat3484
52%
6213
93%
5708
85%
3415
51%
1578
23%
3031
45%
1210
18%
1810
27%
-- 0
0%
0
0%
|6413
96%
6674
Rice979
10%
1092
11%
784
8%
1297
14%
4577
49%
1038
11%
889
9%
118
1%
0
0%
-- 0
0%
|4601
49%
9226
E. coli325
7%
373
8%
266
6%
326
7%
516
12%
311
7%
354
8%
42
1%
0
0%
0
0%
-- |667
15%
4174

Genes with one or more similar genes, or paralogs, in same organism.
  FruitflyHumanMouseMosquitoWeedWormYeastZebrafishRatRiceE. coli
Paralogs4708
36%
18602
40%
4912
53%
5232
44%
18138
73%
10071
53%
1939
32%
697
70%
5470
81%
5868
63%
1194
28%
Gene homologies of Rat, Rice, E. coli, are provided for comparison. They are not part of the euGenes data set.

Source (rows): source protein for BLAST query
Homologs (columns): sequences in BLAST database match with probability <= 1e-30
Any column: Source organism protein have one or more homologs in any organism
N genes column: Number of protein sequences available for the organism

Methods:
This is a computed gene homology or similarity using reference protein sequences identified by the source databases. A BLAST of all these sequences is computed, each sequence against all others, using specific parameters given below.

The sequences used in these calculations are found in the "Reference proteins" data files in each organism's folder. This summary is determined from the "Homologous genes table" data files also in each folder.

The counts in this table are of the number of available genes for an organism, and those which have one or more significant homologs in the other organisms. Percentages are the count of genes with any homolog (one or several) in another organism, divided by the total available genes in that organism (x 100).

You can read the homology summary table best this way. To answer the question what percent of all fruitfly genes have any homologs in C.elegans, look at the Fruitfly row, and find the Worm column. To answer the reverse, the Worm row and Fruitfly column says the percent of all worm genes that have any fruitfly homologs.

Note the asymmetry in these counts and percents. Fruitfly genome may have 5600 of its 13000 genes (43%) showing some homology to Human genes, whereas the Human genome may have 8700 of its 47000 genes (18%) showing some homology to Fruitfly genes. These include the same gene-pairs.

Please note that the summary you find here is based on a specific set of data and homology/similarity (BLAST) calculation parameters. Someone else may publish a different set of values, which would be equally valid, using a different set of data and options for determining "homology". The euGenes summary tries to stay up-to-date by using reference sequences identified by genome databases, and by matching current gene information from these databases with their current sequences.

These values are all moving targets; more data that is coming from genome sequencing projects will change these values. Especially in the euGenes summary, the fruitfly, worm (C.elegans) and yeast data is based on full genome sequences, while man, mouse, weed, and fish are based only on partial sequencing, which may skew values like percent homologies higher than in fact these organisms have.


Homology computations:
blastall -v 10 -b 10 -m 9 -a 4 -p blastp -e 1e-30 -d org1/refprot.fasta -i org2/refprot.fasta
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2
Homologous genes for fly computed 27-June-2002 with BLASTP 2.2.2


Send comments to us at eugenes@iubio.bio.indiana.edu
euGenes uses Argos: A Replicable Genome infOrmation System