Any questions on the material from the previous lecture?
Chapter 4: Protein Structure
After learning the sequence of a protein, the next step is understanding how the linear
chain folds up in three-dimensional space to give a functional structure.
From the success of the genome project, we can infer the sequences of
most proteins in the cell, perhaps 30,000 in total. However, knowledge of their
detailed three-dimensional structure lags far behind. 3D structure determination
is considerably more difficult than sequencing.
In fact, solving this folding problem is considered to be the holy grail of
molecular biology
Chapter 4 introduces some of the techniques for determining protein structure.
It then covers some of the general principles, including conformation of residues, structural
motifs and families, and some folding dynamics
Proteomics
The study of the complete set of proteins in a cell is an emerging field called proteomics.
For even a simple bacterium such as E. coli, there are over 4000 different proteins that
comprise its proteome, only a portion of which can be seen in the following 2D PAGE gel:
Section 4.1: Levels of Protein Structure
The complexity of proteins can be better understood by thinking about them
at multiple levels of detail. There are four levels of structure that are used for describing them:
Primary structure (amino acid sequence)
Secondary structure (α-helices and β-sheets)
Tertiary structure (structure of an entire polypeptide chain)
Quaternary structure (arrangement of multiple subunits)
Secondary Structure
What is secondary structure?
Secondary structure refers to regularities in local conformations, maintained by hydrogen
bonds from the peptide backbone
The most prevalent kinds of secondary structure are α-helices and β-sheets.
These are also the most recognizable because of their repetitive nature
Loops and turns are additional types of secondary structure. They are non-repetitive
elements that connect to α-helices and β-sheets
Tertiary Structure
What is tertiary structure?
Tertiary structure refers to the spatial arrangement of an entire polypeptide chain
Elements of tertiary structure include compact globular units called domains
In addition to the hydrogen bonding of local secondary structure, tertiary structure is
determined by noncovalent interactions between surfaces of adjacent domains. It is also
constrained by covalent disulfide bonds if they are present
Quaternary Structure
What is quaternary structure?
Quaternary structure describes proteins that are formed by the noncovalent association of
distinct polypeptide chains called subunits
Many proteins are built from multiple subunits, which can be either identical or distinct from each other.
The protein shown above is a tetramer (4 subunits).
The tetramer is composed from two distinct types of subunits, an α subunit and a β subunit.
It can also be thought of as a symmetrical arrangement between two α-β dimers
Section 4.2: Techniques for Determining 3D Structure
3D structure determination refers to a set of techniques that reveal the geometrical shapes
of biomolecules. In many cases, these techniques are capable of very high resolution,
yielding the precise relative positions of every atom within a molecule
This exact determination of structure has been possible for nearly a century for simple
inorganic crystals. But structure determination for the much more complex biomolecules was only
first achieved around 50 years ago.
Recent decades have shown an increasing rate of progress, with over 30,000 structures now
available in the Protein Data Bank, the global public repository for biomolecular structures.
The two principle techniques for 3D structure determination are x-ray crystallography and
NMR
X-Ray Crystallography
X-Ray crystallography is the most powerful method for producing detailed atomic-resolution models of biomolecules
This method accounts for the majority of the structures submitted to the Protein Data Bank
It is capable of very high resolution, with models resolved in ranges of 1.5 - 3 A being common
Note: a higher-resolution model correlates with a lower angstrom value, and vice versa
3 A is generally the minimum resolution needed to resolve individual atoms. Above that number, only the
path of the main chain backbones can normally be identified
Performing computations on the resulting diffraction pattern to obtain a model of the actual structure
Producing good-diffracting crystals is often the most difficult part of the process, and is ususally the limiting
factor
Crystallization involves the orderly, periodic packing of protein molecules into a solid from a purified solution:
X-Ray Diffraction
Once properly diffracting crystals have been produced, they are cooled to a low temperature and subjected to
a high intensity beam of x-rays
The x-rays that are used have wavelengths similar in length to the diameters of atoms
Consequently, a diffraction pattern will be produced by the constructive and destructive interference of
waves that are reflected by the periodic array of atoms in the crystal
Model Construction
Once a diffraction pattern has been produced, a computation is done to infer the positions of the atoms from the pattern of reflected waves
The amplitude and phase of each diffracted wave contibutes to the intensities observed at spots in the diffraction pattern
One of the major difficulties in the computation is to determine the phases of the reflected waves
Phase information can be obtained from distinct diffraction patterns produced by heavy metal atoms, such as tungsten or mercury, that have been co-crystallized with the protein
Once the phases have been obtained, the computation will produce a 3D map of the electron densities
The positions of the atoms can then be inferred from the electron density
NMR
NMR (Nuclear Magnetic Resonance) is the second-most important technique for providing high-resolution 3D molecular structures
It has provided approximately 15% of the structures in the Protein Data Bank (PDB)
Limitations: currently, it can only handle smaller molecules (approx 40 kd or less)
Advantages: it can provide a more dynamic picture of molecules, and in conditions that are closer to physiological states, compared to
X-Ray crystallography
Principles of NMR
NMR relies upon the magnetic dipole moments of the nuclei of certain elements. These dipoles occupy discrete energy levels or 'spin' states:
An NMR machine provides a strong external magnetic field that amplifies the difference between these energy levels
It then provides an external electromagnetic signal to induce a transition from one spin state to the other.
The amount and frequency of external radiation required to produce this resonance reflects
the local chemical environment of the nuclear dipole
One-Dimensional NMR
Local electron densities can create local magnetic fields that oppose the externally-applied magnetic field
These local densities will create shielding effects that are measured as chemical shifts in the frequency at
which the magnetic resonance occurs
A recording of the shifted frequencies yields in a one-dimensional NMR spectrum:
2D NMR
The 1D NMR spectrum can identify various kinds of chemical bonds and local structure, but it is not
sufficient to determine the 3D structure of proteins
A refinement of the NMR technique involves the Nuclear Overhauser Effect (NOE), which is a short-range
magnetic interaction between nuclei
Measurement of this additional magnetization effect allows the determination of distances between pairs of
protons:
NMR Distance Constraints
The distances between proton pairs form a set of geometrical constraints for determining 3D structure
The constaints can be applied manually in building a molecular model
They can also be used as external 'folding influences' in molecular dynamics simulations
The distances that are measured are only approximate however, because of noise in the signal and
fluctuations in structure of the population of molecules in a measured sample
Multiple NMR Structures
Typically, a family of structures will be produced that satisfy the constraints:
Section 4.3: Protein Residue Conformation
The fundamentals of protein structure begin with a look at the conformations of residues in the polypeptide chain
We are familiar by now with the general structure of an individual amino acid. When it is linked
together with another residue, a new unit is formed, the peptide group:
This group, shown in blue above, includes the carbonyl C and its two substituents,
the carbonyl O and α-carbon. The other half consists of the amino N and its
two substituents, the H and the α-carbon of the next residue in the chain
The conformation of the peptide group, in conjunction with the orientations between adjacent groups, are the
primary geometric parameters that determine the conformation of the overall protein chain
Planarity of the Peptide Bond
Because of resonance, the peptide bond between the amino N and carbonyl C has some characteristics of a double bond:
Consequently, the atoms connected to the peptide bond all lie in the same plane:
Cis and Trans Configurations
The planar peptide bond can exist in two configurations, trans and cis:
In the trans configuration, the carbonyl O and amino H are on opposite sides of the bond
In the cis configuration, they are on the same side
There is a greater potential for steric clashes between opposing carbonyl oxygens in the cis configuration (side chains also).
Consequently, the trans configuration is favored for most types of residues
Parameters of Protein Conformation
In contrast to the peptide bond, there are two single bonds in the chain backbone that can freely rotate
These are the bond between the amino N and the α carbon and the bond between the α carbon and the
carbonyl C:
The rotation about these bonds is measured by two angles, φ (phi) and ψ (psi). The particular angles
of these two bonds are consequently the main parameters for describing the conformation of the polypeptide chain
Protein Conformational Space
With rotation about two angles per residue, the main chain of a protein can assume a huge number of possible conformations
For a typical protein of 300 residues, if we consider only two orientations per angle, this gives:
2300 ~= 10 90 possible conformations!!
Because there are many more than just two possible orientations, the number of conformations is actually much larger.
This large conformational space is one of the reasons why the 'Protein Folding' problem is so difficult
Ramachandran Diagrams
The φ and ψ angles typically can assume many more than two orientations, but not all possible orientations will occur
Ramachandran observed some constraints on φ and ψ angles that occur as a result of steric clashes
A plot of φ and ψ angles shows where favorable and unfavorable regions are found:
Questions
Questions about the material covered today?
References
The Protein Data Bank
http://www.rcsb.org/pdb/
(Main repository for 3D structural data for biomolecules)