KEGG icon KEGG   KEGG2   PATHWAY   BRITE   DISEASE   DRUG   transparent  
» Japanese
KEGG Home
   Introduction
   Overview
   Release notes
   Current statistics
KEGG Identifiers
   Pathway maps
   Brite hierarchies
KEGG XML
KEGG API
KEGG FTP
KegTools

GenomeNet
DBGET/LinkDB
Feedback

KEGG Overview

1. KEGG Databases

KEGG is an integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information as shown below. Genomic and chemical information represents the molecular building blocks of life in the genomic and chemical spaces, respectively, and systems information represents functional aspects of the biological systems, such as the cell and the organism, that are built from the building blocks. KEGG has been widely used as a reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

Category Database Content
Systems
information
KEGG PATHWAY Pathway maps for metabolism and other cellular processes, as well as human diseases; manually created from published materials
KEGG BRITE Functional hierarchies (ontologies) representing our knowledge on various aspects of biological systems; manually created from published materials
KEGG MODULE Tighter functional units for pathways and complexes; manually defined
KEGG DISEASE List of disease genes and molecules; manually entered from published materials
KEGG DRUG Chemical structures and associated information of approved drugs in Japan, USA, and Europe; manually entered from published materials
KEGG EDRUG Chemical components and associated information of crude drugs and other natural products; manually entered from published materials
Genomic
information
KEGG ORTHOLOGY KEGG Orthology (KO) groups based on PATHWAY and BRITE; manually defined
KEGG GENOME Genome maps and organism information; generated from RefSeq and other public resources
KEGG GENES Gene catalogs of complete genomes with manual annotation; generated from RefSeq and other public resources
KEGG SSDB Sequence similarity scores and best-hit relations; computationally derived from GENES by pairwise genome comparisons of all protein-coding genes
KEGG DGENES Gene catalogs of draft genomes with automatic annotation; generated from web resources
KEGG EGENES Gene catalogs (consensus contigs) of EST data with automatic annotation; generated from dbEST
KEGG MGENES Gene catalogs of metagenomes with automatic annotation; generated from NCBI resources
Chemical
information
KEGG COMPOUND Chemical compounds; manually entered from published materials
KEGG GLYCAN Glycans; manually entered from published materials
KEGG REACTION Chemical reactions; manually defined from ENZYME and PATHWAY
KEGG RPAIR Chemical structure transformation patterns; manually defined from REACTION
KEGG RCLASS Reaction class defined by chmeical structure transformation patterns of main reactant pairs; generated from RPAIR with annotation
KEGG ENZYME Enzyme nomenclature; generated from ExplorEnz with annotation by KEGG

2. KEGG Objects

KEGG is a computer representation of the biological systems. It is based on the concept of graph for representation and manipulation of various KEGG objects from molecular to higher levels. Mathematically, a graph is a set of nodes (KEGG objects) and edges (biological relationships). Each of the KEGG objects (database entries) is given an unique identifier shown below.

Release Database Object Identifier
1995KEGG PATHWAYmap number
KEGG GENESlocus_tag / GeneID
KEGG ENZYMEEC number
KEGG COMPOUNDC number
2000KEGG GENOMEorganism code / T number
2001KEGG REACTIONR number
2002KEGG ORTHOLOGY  K number
2003KEGG GLYCANG number
2004KEGG RPAIRRP number
2005KEGG BRITEbr number
KEGG DRUGD number
2007KEGG MODULEM number
2008KEGG DISEASEH number
2010KEGG EDRUGE number
KEGG RCLASSRC number

KEGG objects are linked to/from major life science databases. KEGG objects are also part of the Web; they can be found by Web search engines.

Graph Node Edge Search and Analysis
KEGG KEGG object Biological relationship KEGG
Integrated database Entry Cross-reference link DBGET, Entrez, SRS, etc.
Web Web page Hyperlink Google, etc.

3. Network Hierarchy

The molecular interaction/reaction network is the most unique data object in KEGG, which is stored as a collection of pathway maps (graphical diagrams) in the PATHWAY database. Reflecting the map resolution, KEGG PATHWAY is organized in a hierarchy. The top two levels in the current hierarchy is the following.

First Level Second Level
Metabolism Carbohydrate Metabolism
Energy Metabolism
Lipid Metabolism
Nucleotide Metabolism
Amino Acid Metabolism
Metabolism of Other Amino Acids
Glycan Biosynthesis and Metabolism
Metabolism of Cofactors and Vitamins
Metabolism of Terpenoids and Polyketides
Biosynthesis of Other Secondary Metabolites
Xenobiotics Biodegradation and Metabolism
Genetic Information Processing Transcription
Translation
Folding, Sorting and Degradation
Replication and Repair
Environmental Information Processing Membrane Transport
Signal Transduction
Signaling Molecules and Interaction
Cellular Processes Transport and Catabolism
Cell Motility
Cell Growth and Death
Cell Communication
Organismal Systems Immune System
Endocrine System
Circulatory System
ExcretorySystem
Nervous System
Sensory System
Development
Environmental Adaptation
Human Diseases Cancers
Immune System Diseases
Neurodegenerative Diseases
Cardiovascular Diseases
Metabolic Diseases
Infectious Diseases

4. Network Reconstruction

Originally, the integration of pathway information and genomic information was first achieved in KEGG by the EC numbers. Once the EC numbers were correctly assigned to enzyme genes in the genome, organism-specific pathways could be generated automatically by matching against the networks of EC numbers (enzymes) in the reference metabolic pathways. However, in order to incorporate non-metabolic pathways and to overcome various problems inherent in the enzyme nomenclature, a new scheme based on the ortholog IDs was introduced replacing the EC numbers. KO (KEGG Orthology) is a further extension of ortholog IDs based on not only the pathway maps but also the BRITE functional hierarchies, most notably classifications of protein families.

Period Identifier Mapping Assignment
1995-1999 EC number Metabolic pathways Domain based
2000-2002 Ortholog ID Metabolic and regulatory pathways Domain based
2003- KO Pathways and BRITE hierarchies Gene based

Thus, under the current KO system, the KO identifiers (K numbers) are placed at the fourth (lowest) level in the network hierarchy shown above, or at the lowest level of the BRITE hierarchy.

5. BRITE Functional Hierarchy

The BRITE database is a collection of hierarchical text files and binary relation files. It is intended to supplement the PATHWAY database in two ways. One is to computerize higher-level knowledge that cannot easily be represented as molecular interaction/reaction networks, in terms of the hierarchically structured vocabulary. The other is to integrate our knowledge about the genomic space (K numbers) with different types of knowledge in the chemical space (C/D/G/R/RP/EC numbers in the LIGAND database). The BRITE collection is currently categorized as follows.

Top Category Second Category
Genes and Proteins Network hierarchy
Protein families
Compounds and Reactions Compounds
Reactions
Compoound interactions
Drugs and Diseases Drugs
Diseases
Cells and Organisms Organisms

References

  1. Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996). [pdf]
  2. Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997). [pubmed]
  3. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999). [pubmed] [pdf]
  4. Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000). [pubmed] [pdf]
  5. Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42-46 (2002). [pubmed] [pdf]
  6. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004). [pubmed] [pdf]
  7. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). [pubmed] [pdf]
  8. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008). [pubmed] [pdf]
  9. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355-D360 (2010). [pubmed] [pdf]

Last updated: July 1, 2010
Copyright 1995-2010 Kanehisa Laboratories