Protein Data BankThe Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and nucleic acids. This data, typically obtained by X-ray crystallography or NMR spectroscopy, is submitted by biologists from around the world, is released into the public domain, and can be accessed for free.
The structural data can be used to visualize the biomolecules with appropriate software, such as rasmol, chime or a VRML plugin. The PDB website also contains resources for education, structural genomics, and related software.
As of 2002, the database contained about 18,000 structures and took in about 2,000-3,000 new ones per year. Data is stored in the mmCIF format specifically developed for the purpose. Note that the database stores information about the exact location of all atoms in a large biomolecule; if one is only interested in sequence data, i.e. the list of amino acids making up a particular protein or the list of nucleotides making up a particular nucleic acid, the much larger databases from Swiss-Prot and the International Nucleotide Sequence Database Collaboration should be used.
Each structure published in PDB receives a four character alphanumeric identifier, its PDB ID. This should not be used as an identifier for biomolecules, since often several structures for the same molecule (in different environments or conformations) are contained in PDB with different PDB IDs.
If a biologist submits structure data for a protein or nucleic acid, PDB staff reviews and annotates it. The data is then automatically checked for plausibility. The source code for this validation software has been released for free. The main data base accepts only experimentally derived structures, and not theoretically predicted ones.
Various funding agencies and scientific journals now require scientists to submit their structure data to PDB.
Founded in 1971 by Brookhaven National Laboratory, the Protein Data Bank was transferred in 1998 to the Research Collaboratoy for Structural Bioinformatics (RCSB), which is composed of Rutgers University, the University of Wisconsin, Madison, NIST and the San Diego Supercomputer Center. Funding comes from the National Science Foundation, Department of Energy, National Library of Medicine and the National Institute of General Medical Sciences. The European Bioinformatics Institute in the UK and the Institute for Protein Research in Japan also collect, process and submit data files.