Thursday, August 09, 2007

Classification of Protein Structure

Structures of proteins: In all modern day organisms, proteins play a wide variety of roles in the cell. At the molecular level, they are responsible for performing all the mechanical work done by your muscles (as known until now) in addition to the chemical catalysis that they perform on nearly all biochemical reactions inside the cell [1]. The immediate question that begs to be answered "How are these complex molecules able to perform this work? How are they so specific in what they do and specific to the reactions that they catalyze?" While the answer to these questions are nontrivial and the subject of research of more than half the biophysics labs world wide, the common theory going around in the scientific world is that the function of a protein is determined by its structure [2]. This statement is only partially accurate and the reasoning behind this statement is that a protein functions because it is able to have a certain 3-dimensional configuration of certain atoms or functional groups in the amino acids that make up the active site of the protein and these functional groups are then able to catalyze the reaction. The specificity of the reaction they catalyze comes from the specific 3-dimensional configuration of these functional group that it is able to catalyze only when the substrate is able to interact with it in a certain manner. While it is true that the structure does determine how a protein will go about performing its function, these structures are static pictures of the molecule which is otherwise in motion [3]. In addition to the global motion of the molecule, there are relative motions of the atoms that make up the protein which leads to slightly different configurations of the important functional groups and hence they are neither completely specific nor is the function completely dependent on its structure alone.

Classification of Protein Structure: The average protein consists of about 20000-30000 atoms and in order to make sense of the structure of the protein, it is necessary to simplify the protein structure. There are more than 30000 structures in the protein database [4] and to go about looking at each structure would be horrendous. Hence, one needs to come up with a classification scheme for protein structure.

One way of simplifying it is to break it into parts called the secondary structure of the protein. There are various courses/books [1,2] to explain the secondary structure, but for our discussion, it is sufficient to know that certain configurations called alpha helices and beta sheets are common structural motifs found in nearly all proteins. While these secondary structures do help in understanding the local structure of the proteins, they give very little insight about the chemistry that the protein is able to perform and about it's active site itself.

A second and more meaningful attempt at classification of protein structure would be to find certain common structural motifs that can exist independently and classify the proteins based on these structural motifs. For example, a protein can be multifunctional but each function can be carried out independently by different parts of the protein even if you split them up. It could make sense that one can split these multifunctional proteins up based on what function they perform and if you can find the same structure/function motif in different proteins, club them together as a single group. In a protein, the part of the protein that can maintain its structure and function independently is called a domain [5]. Quite often domains of one shape combine with domains of very different shapes to form quite different proteins (very much like building blocks can come together in various different configurations giving walls of various different shapes) [6].

To give meaning to the classification scheme, it would also help to know which proteins perform related function (for example, perform the same reaction on different substrates) or have active sites in the same region of the protein structure. In order to give meaning to this classification, it is better to form groups of more closely related structures that perform similarly. Great minds have always argued that evolution should be the guiding principle while studying biology and it does make sense to classify proteins which have a common evolutionary origin from those that have achieved the same structure independently (also called convergent evolution).

There are various different databases that divide proteins into individual domains and divide these domains up into evolutionarily related groups heirarchically. These databases include the SCOP (Structural Classification Of Proteins) [7](manually divided), CATH (Class,Architecture, Topology, and Homologous superfamily) [8], and FSSP (Families of Structuraly Similar Proteins) databases [9](automatically performed). However, these databases are often flawed and corrections to these databases are often suggested in literature. Part of the problem is it is very difficult to say when similarity in structures occured due to homology (evolutionarily related), or convergence (evolutionarily independent origins). The trivial relationships are those that are apparent in the sequences of the two proteins. When two proteins have very similar sequences (measured by the number of times they have the same amino acid or a slightly related amino acid in the same position of the structure), they are related and statistics based on extreme value distributions can be used to find the probability of both proteins having a common origin [11]. However, the structure remains conserved (does not vary much) much more than sequences and below a certain sequence identity, it is very difficult to prove that there is a relationship between the two proteins without a structure [10].

Other problems that come up are related to the process by which structures are obtained (X-ray crystallography or NMR spectroscopy). These methods are inherently noisy because of various problems such as Heisenberg's uncertainty principle onto the crystallization conditions and the substrates that interact with the protein. So there is never a completely correct structural alignment (that is finding one to one which residues in the structure overlap each other) that also causes minor problems in the classification procedure.

But the most important problem is the level at which to classify structures. While domains are the most commonly used level of classification (because a domain is basically independent), during the evolution process, domains might not have been the basic level at which proteins were constructed. Rather subdomain level small structural units called structural words [12] or foldons [13] (because they could be independent folding units) could also be the smallest level of proteins that had evolved from the RNA world. The theory is that these foldons could come together and form various different domains and then evolved further to form proteins with different functions.

[1] - Biochemistry by Stryer.
[2] - Introduction to Protein Structure by Branden and Tooze.
[3] - A perspective on enzyme catalysis by Stephen Bankovic and Sharon Hammes-Schiffer
[4] - RCSB protein database.
[5] - Domains.
[6] - Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination by Gordana Apic, Wolfgang Huber & Sarah A. Teichmann.
[7] - SCOP.
[8] - CATH.
[9] - FSSP.
[10] - How far divergent evolution goes in proteins - Murzin.
[11] - Maximum Likelihood Fitting of Extreme Value Distributions - Eddy..
[12] - On the evolution of protein folds - Lupas, Ponting, and Russell.
[13] - Foldons, Protein Structural Modules, and Exons by Anna Panchenko, Z. Luthey-Schulten, and P.G. Wolynes.

Wednesday, August 08, 2007

About Rainbows

In this post, what I would like to do is illustrate scientific methodology and scientific curiosity in the context of the simple natural phenomenon called rainbows that we are all familiar with. The choice of this system is only because we all think we understand it and the physics involved is simple ray optics that we all learnt in school at some point (and of course it is pretty as in the picture along side). Now, the first step in scientific methodology is the collection and categorization of facts that we want an explanation for. In the context of rainbows, I want to be able to explain the following facts that I have established by watching rainbows in the sky. Apart from the obvious one about the colors, they are

1. Rainbows are seen when there is sun and rain (incipient or actual). That is why it is called a rainbow.

2. When I stand facing the rainbow, the sun is always behind me. I never see a rainbow on the same side of the sky as the sun.

3. The rainbow is a bow.

The next step is to look into my knowledge bank from the past and see what I already know that would be useful for me to explain the above facts. And I have to do this piece by piece. Now, I remember something about seeing dispersion, the breaking up of white light into its constituent colors when the light moves from one medium to another. Water glass held appropriately in bright sunlight, prisms I played with when I was young and so on. Yes? So, I begin my quest to understand a rainbow by quantifying this vague notion in my head [1].

Willebrord Snellius and the one and only Rene Descartes figured this out for us 400 years ago. They found that if a monochromatic (just jargon for one-color) light ray is incident on the interface between two media (say air and water), then light is refracted (jargon for “bent”) so that if the angle that the incoming ray makes with the interface is ui , then the outgoing ray comes out at an angle [4] ur = sin-1((n1/n2)sin(ui)), where n1 and n2 are properties of the two media in question called the refractive index of the material (again it is just a name, I could have called the property Karthik or Pradeep, but for the sake of conformity I call it by the name already given to it). Don’t worry about the formula if it looks complicated to you. Think of it as follows. If someone told you that they shined light at the interface of two media of given refractive indices, you can just tap some keys on your calculator and know where to put your eye or your camera so that you can see the refracted ray. So much for that. But how does this explain dispersion? The key is that the properties n1 and n2 depend not only on what the medium in question is (i.e., water, air glass etc) but also on the color of the light in question. Different colors will have different values of n1/n2. So even if they all come in at the same angle ui as in the case of sunlight, they will come out at different angles and hence I will be able to see all the different colors. So that is why I am able to see different colors in a rainbow, because there is air and water involved. As an aside also note that the above paragraph tells us that the fact that a straw in a water glass looks bent and the colors of the rainbow come from the same underlying physical equation! Cool isn’t it? This is another aspect of scientific methodology, i.e., link together as many apparently disparate facts as possible as arising from one underlying phenomenon.

Wait a minute, this cannot be right. What I said above cannot be the whole truth. Why is that? It is because of fact 2 above. The sun is on the opposite side of the rainbow. So I cannot possibly be seeing bent light, I must be seeing reflected light that bounced off something. So, what did I miss? What I missed is hidden in that messy formula in the previous paragraph. Recall that sine function takes values from -1 to 1. So if n1/n2 is bigger than 1, that equation can never be satisfied for all values of the angle of incidence. What is wrong here? Clearly I can shine light at whatever angle I wish, so placing a restriction on ui makes no sense. So, ask again, what did we do wrong? What we did wrong was to assume that there is always a refracted ray, i.e., a ray that goes into medium 2. What the “impossible to satisfy” equation above tells us is that beyond a particular angle all the light will be reflected back into medium 1 if n1/n2 is bigger than1. This phenomenon is called “total internal reflection”. If medium 1 is water and medium 2 is air in the earlier picture, then n1is bigger than n2 and light incident at large angles will be reflected back into the water. So, in the context of the rainbow what is happening is along the lines of the figure shown below. The light from the sun enters the raindrop, gets refracted at the front edge of the drop, travels through the drop, gets internally reflected at the back edge of the drop (i.e., the back edge of the drop is acting like a mirror) and then comes back out of the front edge again. And this is the light that you and I on earth see as the rainbow. So we have established that we need refraction and total internal reflection to account for the colors and the fact that the rainbow is on the opposite side to the sun with respect to the observer (jargon for you and me).

Still with me? Just hang on for a little bit more. We only have one fact remaining that we have yet to explain, the fact that the rainbow is indeed a bow. Again the answer lies in the discussion earlier. We just have to tease it out. Let us do this by first noting that the picture in the previous paragraph is clearly an oversimplification. What is really happening is more like this picture below. Light rays from the sun hit the drop and they are reflected and refracted at each interface. And you are standing in such a position that you get only one of a total of four outgoing rays from the drop. So the amount of light that is reaching you is a pretty small fraction of the light that fell on the drop. That is why we made such a big deal about the total internal reflection thing earlier, for it cuts out one of the outgoing rays and increases the intensity (brightness) of the one we get to see. Secondly the reflected light is diffuse. What does this mean? The sun is far enough away that all the light coming from the sun can be thought of as parallel rays. If the interface at hand was flat, then all the reflected/refracted rays will be in the same direction, yes? (Just generalize the picture in the first part of the discussion to many rays to see this). But our interface is a spherical water drop. So, even though the incoming light is all in the same direction, the outgoing light is going to be all over the place. And my eye is a pretty small hole in the scheme of things and I am only going to get a ray or so of the reflected light, not enough to see anything [2]. But I do see the rainbow. How?

This part is slightly more messy to state so bear with me. Let us revisit the picture in the paragraph on total internal reflection for a moment. Since the sun’s light is all parallel, the angle of incidence is going to change depending on where in the sphere the light hits. The angle at which the light comes out to the observer uf depends on the angle of incidence as uf =ui -ud where ud is called the angle of deviation (just another name). Now, clearly, by repeated application of Snell’s law, I can express this angle of deviation as a function of the angle of incidence right? The details are unimportant for us. So let us just say ud = f(ui) for some known f. In order that I see as much light as possible, I need that uf change as little as possible when ui changes, yes? Which is of course the same as saying ud or f must change as little as possible. Now, I remember from some calculus class I took ages ago that a function is “stationary”, i.e., changes as little as possible near the points at which it takes its minimum or maximum value. Do you remember this as well? So, I am most likely to see enough light to make out my rainbow when ud is a minimum (you can easily convince yourself that you have to be on the moon or something to see the region when ud is a maximum). For a drop of rain water and for red light this turns out to be a position such that your eye is located at an angle of 42 degrees to the direction of sunlight (look at the picture to see what I mean). Now, I clearly cannot change where the sun is or where the water is. So I just see all those water drops that make this angle with my eye. And viola! It is a bow!

Phew! We are done. We succeeded in explaining all the things we set out to explain. But just to throw a wrench in the works let me point out why you should not be happy yet. I can think of a 100 reasons but let me state the first couple that come to my mind. In all of the above, I thought of light as a straight line (ray optics). But I remember somebody telling me light is made of photons, little blobs of energy. I even remember learning that light is a wave just like the wave I can make in a string by oscillating it. WTF? How is it a straight line, a blob and a wave? On a totally different front (a front on which I don’t know the answer), I “know” that I see VIBGYOR when I see a rainbow. Hey! But white light is a “continuous” mixture of wavelengths (colors). So what this VIBGYOR business must be telling me is the degree of resolution in the cones of my retina? On the same note, what is it in the processing of images in my brain that leads me to see rainbows around light bulbs when I am drunk or sleepy but not otherwise? That is scientific curiosity for you and there is more than enough stimulus for it from the world around us to keep me occupied for the rest of my days![3]

[1] You can do the simplest of things, go to wikipedia and read this and this.

[2] It is over and beyond my patience levels to make a picture illustrating this. So I recommend that you go and play with this Java applet to see for yourself that this is true.

[3] Apologies on the length of this post. I “cross my heart and hope to die” when I say my future posts will be way shorter!

[4] After I uploaded everything in blogger I see that it has made all my theta's into u's. So the u in the text corresponds to theta in the images.