KEGG icon KEGG   KEGG2   ATLAS   PATHWAY   BRITE   DBGET   transparent  
 
KEGG Home
   Introduction
   Overview
   Release notes
   Current statistics
KEGG Identifiers
KGML
KEGG API
KEGG FTP
KegTools

Feedback
GenomeNet

KEGG Overview

1. KEGG Databases

KEGG is a database of biological systems, consisting of genetic building blocks of genes and proteins (KEGG GENES), chemical building blocks of both endogenous and exogenous substances (KEGG LIGAND), molecular wiring diagrams of interaction and reaction networks (KEGG PATHWAY), and hierarchies and relationships of various biological objects (KEGG BRITE). KEGG provides a reference knowledge base for linking genomes to biological systems and also to environments by the processes of PATHWAY mapping and BRITE mapping.

Database Content Source
PATHWAY Molecular interaction and reaction networks for metabolism, various cellular processes, and human diseases Manually entered from published materials
BRITE Functional hierarchies representing our knowledge on various aspects of biological systems Manually entered from published materials
GENES KEGG ORTHOLOGY (KO): Ortholog groups based on PATHWAY and BRITE Manually defined by KEGG
GENES: Gene catalogs of complete genomes with manual annotation Generated from RefSeq and other public resources with reannotation by KEGG
DGENES: Gene catalogs of draft genomes with automatic annotation
EGENES: Gene catalogs (consensus contigs) of EST data with automatic annotation
GENOME: Genome maps and organism information
SSDB: Sequence similarities with best-hit information for identifying ortholog/paralog clusters and conserved gene clusters Computationally derived from GENES by pairwise genome comparisons of all protein-coding genes
LIGAND COMPOUND: Chemical compounds Manually entered from published materials
DRUG: Drugs approved in the U.S. and Japan
GLYCAN: Glycans
REACTION: Chemical reactions
RPAIR: Chemical structure transformation patterns
ENZYME: Enzyme nomenclature Generated from ExplorEnz enzyme database with annotation by KEGG

2. KEGG Objects

KEGG is a computer representation of the biological systems. It is based on the concept of graph for representation and manipulation of various KEGG objects from molecular to higher levels. Mathematically, a graph is a set of nodes (KEGG objects) and edges (biological relationships). Each of the KEGG objects (database entries) is given an unique identifier shown below.

Release Database Object Identifier
1995KEGG PATHWAYmap number
KEGG GENESlocus_tag / GeneID
KEGG ENZYMEEC number
KEGG COMPOUNDC number
2000KEGG GENOMEorganism code / T number
2001KEGG REACTIONR number
2002KEGG ORTHOLOGY  K number
2003KEGG GLYCANG number
2004KEGG RPAIRA number
2005KEGG BRITEbr number
KEGG DRUGD number
2007KEGG MODULEM number
KEGG DISEASEH number

KEGG objects are linked to/from major life science databases. KEGG objects are also part of the Web; they can be found by Web search engines.

Graph Node Edge Search and Analysis
KEGG KEGG object Biological relationship KEGG
Integrated database Entry Cross-reference link DBGET, Entrez, SRS, etc.
Web Web page Hyperlink Google, etc.

3. Network Hierarchy

The molecular interaction/reaction network is the most unique data object in KEGG, which is stored as a collection of pathway maps (graphical diagrams) in the PATHWAY database. Reflecting the map resolution, KEGG PATHWAY is organized in a hierarchy. The top two levels in the current hierarchy is the following.

First Level Second Level
Metabolism Carbohydrate Metabolism
Energy Metabolism
Lipid Metabolism
Nucleotide Metabolism
Amino Acid Metabolism
Metabolism of Other Amino Acids
Glycan Biosynthesis and Metabolism
Biosynthesis of Polyketides and Nonribosomal Peptides
Metabolism of Cofactors and Vitamins
Biosynthesis of Secondary Metabolites
Xenobiotics Biodegradation and Metabolism
Genetic Information Processing Transcription
Translation
Sorting and Degradation
Replication and Repair
Environmental Information Processing Membrane Transport
Signal Transduction
Signaling Molecules and Interaction
Cellular Processes Cell Motility
Cell Growth and Death
Cell Communication
Endocrine System
Immune System
Nervous System
Sensory System
Development
Behavior
Human Diseases Cancers
Immune Disorders
Neurodegenerative Diseases
Metabolic Disorders
Infectious Diseases

4. Network Reconstruction

Originally, the integration of pathway information and genomic information was first achieved in KEGG by the EC numbers. Once the EC numbers were correctly assigned to enzyme genes in the genome, organism-specific pathways could be generated automatically by matching against the networks of EC numbers (enzymes) in the reference metabolic pathways. However, in order to incorporate non-metabolic pathways and to overcome various problems inherent in the enzyme nomenclature, a new scheme based on the ortholog IDs was introduced replacing the EC numbers. KO (KEGG Orthology) is a further extension of ortholog IDs based on not only the pathway maps but also the BRITE functional hierarchies, most notably classifications of protein families.

Identifier Purpose
EC number Mapping enzyme genes to metabolic pathways
Ortholog ID Mapping genes to both metabolic and regulatory pathways
KO Mapping genes to both pathways and BRITE hierarchies

Thus, under the current KO system, the KO identifiers (K numbers) are placed at the fourth (lowest) level in the network hierarchy shown above, or at the lowest level of the BRITE hierarchy.

5. BRITE Functional Hierarchy

The BRITE database is a collection of hierarchical text files and binary relation files. It is intended to supplement the PATHWAY database in two ways. One is to computerize higher-level knowledge that cannot easily be represented as molecular interaction/reaction networks, in terms of the hierarchically structured vocabulary. The other is to integrate our knowledge about the genomic space (K numbers) with different types of knowledge in the chemical space (C/D/G/R/A numbers in the LIGAND database). The BRITE collection is currently categorized as follows.

Top Category Second Category
Genes and Proteins Network hierarchy
Protein families
Compounds and Reactions Compounds
Reactions
Compoound interactions
Drugs and Diseases Drugs
Diseases
Cells and Organisms Organisms

References

  1. Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996). [pdf]
  2. Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997). [pubmed]
  3. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999). [pubmed] [pdf]
  4. Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000). [pubmed] [pdf]
  5. Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42-46 (2002). [pubmed] [pdf]
  6. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004). [pubmed] [pdf]
  7. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). [pubmed] [pdf]
  8. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008). [pubmed] [pdf]

Last updated: June 10, 2008
Copyright 1995-2008 Kanehisa Laboratories