Precomputed sequence similarities
|
KEGG SSDB (Sequence Similarity DataBase) contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, which is computationally generated from the GENES database in KEGG.
All possible pairwise genome comparisons are performed by the SSEARCH program, and the gene pairs with the Smith-Waterman similarity score of 100 or more are entered in SSDB, together with the information about best hits and bidirectional best hits (best-best hits).
SSDB is thus a huge weighted, directed graph, which can be used for searching orthologs and paralogs, as well as conserved gene clusters with additional consideration of positional correlations on the chromosome.
The relationship of gene x in genome A and gene y in genome B is defined as follows:
forward best: reverse best: best-best: |
x is compared against all genes in genome B and y is found as top-scoring
y is compared against all genes in genome A and x is found as top-scoring
both of these relationships hold |
|
Orthologs and paralogs
|
In order to speed up the search, SSDB is organized as a collection of "GFIT tables" containing selected information that is useful for identifying possible orthologs and paralogs.
This includes not only the score and the direction of best hits, but also the margin, which is the score difference between the best hit and the second best hit.
|
Search orthologs: (enter keggid in the form of org:gene, e.g., syn:sll1452)
Search paralogs: (enter keggid)
Conserved gene clusters
|
SSDB is useful to efficiently search a conserved gene cluster containing the query gene.
First, the query gene and its best-best hits are considered as an initial cluster.
Second, neighboring genes on both sides of the chromosome are included in the cluster as long as they are also best-best hits.
Third, gapped genes are included in the cluster if they are forward best hits.
|
Search conserved gene clusters: (enter keggid)
Precomputed sequence motifs
|
SSDB also contains precomputed motif patterns of Pfam and PROSITE for all protein coding genes.
|
Search motifs: (enter keggid in the form of org:gene, e.g., eco:b0002)
Search common motifs: (enter multiple keggid's, eg., eco:b0002 eco:b3940 eco:b4024)
Search sequences with given motifs: (enter one or more motif identifier, e.g., pf:DnaJ ps:DNAJ_2)
Last updated: January 9, 2009
|
|