KEGG Overview
1. KEGG Databases
KEGG is a database of biological systems, consisting of genetic building blocks of genes and proteins (KEGG GENES), chemical building blocks of both endogenous and exogenous substances (KEGG LIGAND), molecular wiring diagrams of interaction and reaction networks (KEGG PATHWAY), and hierarchies and relationships of various biological objects (KEGG BRITE). KEGG provides a reference knowledge base for linking genomes to biological systems and also to environments by the processes of PATHWAY mapping and BRITE mapping.
| Database |
Content |
Source |
| PATHWAY |
Molecular interaction and reaction networks for metabolism, various cellular processes, and human diseases |
Manually entered from published materials |
| BRITE |
Functional hierarchies representing our knowledge on various aspects of biological systems |
Manually entered from published materials |
| GENES |
KEGG ORTHOLOGY (KO): Ortholog groups based on PATHWAY and BRITE |
Manually defined by KEGG |
| GENES: Gene catalogs of complete genomes with manual annotation |
Generated from RefSeq and other public resources with reannotation by KEGG |
| DGENES: Gene catalogs of draft genomes with automatic annotation |
| EGENES: Gene catalogs (consensus contigs) of EST data with automatic annotation |
| GENOME: Genome maps and organism information |
| SSDB: Sequence similarities with best-hit information for identifying ortholog/paralog clusters and conserved gene clusters |
Computationally derived from GENES by pairwise genome comparisons of all protein-coding genes |
| LIGAND |
COMPOUND: Chemical compounds |
Manually entered from published materials |
| DRUG: Drugs approved in the U.S. and Japan |
| GLYCAN: Glycans |
| REACTION: Chemical reactions |
| RPAIR: Chemical structure transformation patterns |
| ENZYME: Enzyme nomenclature |
Generated from ExplorEnz enzyme database with annotation by KEGG |
2. KEGG Objects
KEGG is a computer representation of the biological systems. It is based on the concept of graph for representation and manipulation of various KEGG objects from molecular to higher levels. Mathematically, a graph is a set of nodes (KEGG objects) and edges (biological relationships). Each of the KEGG objects (database entries) is given an unique identifier shown below.
| Release |
Database |
Object Identifier |
| 1995 | KEGG PATHWAY | map number |
| KEGG GENES | locus_tag / GeneID |
| KEGG ENZYME | EC number |
| KEGG COMPOUND | C number |
| 2000 | KEGG GENOME | organism code / T number |
| 2001 | KEGG REACTION | R number |
| 2002 | KEGG ORTHOLOGY | K number |
| 2003 | KEGG GLYCAN | G number |
| 2004 | KEGG RPAIR | A number |
| 2005 | KEGG BRITE | br number |
| KEGG DRUG | D number |
| 2007 | KEGG MODULE | M number |
| KEGG DISEASE | H number |
KEGG objects are linked to/from major life science databases. KEGG objects are also part of the Web; they can be found by Web search engines.
| Graph |
Node |
Edge |
Search and Analysis |
| KEGG |
KEGG object |
Biological relationship |
KEGG |
| Integrated database |
Entry |
Cross-reference link |
DBGET, Entrez, SRS, etc. |
| Web |
Web page |
Hyperlink |
Google, etc. |
3. Network Hierarchy
The molecular interaction/reaction network is the most unique data object in KEGG, which is stored as a collection of pathway maps (graphical diagrams) in the PATHWAY database. Reflecting the map resolution, KEGG PATHWAY is organized in a hierarchy. The top two levels in the current hierarchy is the following.
| First Level |
Second Level |
| Metabolism |
Carbohydrate Metabolism
Energy Metabolism
Lipid Metabolism
Nucleotide Metabolism
Amino Acid Metabolism
Metabolism of Other Amino Acids
Glycan Biosynthesis and Metabolism
Biosynthesis of Polyketides and Nonribosomal Peptides
Metabolism of Cofactors and Vitamins
Biosynthesis of Secondary Metabolites
Xenobiotics Biodegradation and Metabolism |
| Genetic Information Processing |
Transcription
Translation
Sorting and Degradation
Replication and Repair |
| Environmental Information Processing |
Membrane Transport
Signal Transduction
Signaling Molecules and Interaction |
| Cellular Processes |
Cell Motility
Cell Growth and Death
Cell Communication
Endocrine System
Immune System
Nervous System
Sensory System
Development
Behavior |
| Human Diseases |
Cancers
Immune Disorders
Neurodegenerative Diseases
Metabolic Disorders
Infectious Diseases |
4. Network Reconstruction
Originally, the integration of pathway information and genomic information was first achieved in KEGG by the EC numbers.
Once the EC numbers were correctly assigned to enzyme genes in the genome, organism-specific pathways could be generated automatically by matching against the networks of EC numbers (enzymes) in the reference metabolic pathways.
However, in order to incorporate non-metabolic pathways and to overcome various problems inherent in the enzyme nomenclature, a new scheme based on the ortholog IDs was introduced replacing the EC numbers.
KO (KEGG Orthology) is a further extension of ortholog IDs based on not only the pathway maps but also the BRITE functional hierarchies, most notably classifications of protein families.
| Identifier |
Purpose |
| EC number |
Mapping enzyme genes to metabolic pathways |
| Ortholog ID |
Mapping genes to both metabolic and regulatory pathways |
| KO |
Mapping genes to both pathways and BRITE hierarchies |
Thus, under the current KO system, the KO identifiers (K numbers) are placed at the fourth (lowest) level in the network hierarchy shown above, or at the lowest level of the BRITE hierarchy.
5. BRITE Functional Hierarchy
The BRITE database is a collection of hierarchical text files and binary relation files.
It is intended to supplement the PATHWAY database in two ways.
One is to computerize higher-level knowledge that cannot easily be represented as molecular interaction/reaction networks, in terms of the hierarchically structured vocabulary.
The other is to integrate our knowledge about the genomic space (K numbers) with different types of knowledge in the chemical space (C/D/G/R/A numbers in the LIGAND database).
The BRITE collection is currently categorized as follows.
| Top Category |
Second Category |
| Genes and Proteins |
Network hierarchy
Protein families |
| Compounds and Reactions |
Compounds
Reactions
Compoound interactions |
| Drugs and Diseases |
Drugs
Diseases |
| Cells and Organisms |
Organisms |
References
- Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996).
[pdf]
- Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997).
[pubmed]
- Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999).
[pubmed]
[pdf]
- Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000).
[pubmed]
[pdf]
- Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42-46 (2002).
[pubmed]
[pdf]
- Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004).
[pubmed]
[pdf]
- Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006).
[pubmed]
[pdf]
- Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008).
[pubmed]
[pdf]
Last updated: June 10, 2008
|