Scientific curiosity: Classification of Protein Structure

Structures of proteins: In all modern day organisms, proteins play a wide variety of roles in the cell. At the molecular level, they are responsible for performing all the mechanical work done by your muscles (as known until now) in addition to the chemical catalysis that they perform on nearly all biochemical reactions inside the cell [1]. The immediate question that begs to be answered "How are these complex molecules able to perform this work? How are they so specific in what they do and specific to the reactions that they catalyze?" While the answer to these questions are nontrivial and the subject of research of more than half the biophysics labs world wide, the common theory going around in the scientific world is that the function of a protein is determined by its structure [2]. This statement is only partially accurate and the reasoning behind this statement is that a protein functions because it is able to have a certain 3-dimensional configuration of certain atoms or functional groups in the amino acids that make up the active site of the protein and these functional groups are then able to catalyze the reaction. The specificity of the reaction they catalyze comes from the specific 3-dimensional configuration of these functional group that it is able to catalyze only when the substrate is able to interact with it in a certain manner. While it is true that the structure does determine how a protein will go about performing its function, these structures are static pictures of the molecule which is otherwise in motion [3]. In addition to the global motion of the molecule, there are relative motions of the atoms that make up the protein which leads to slightly different configurations of the important functional groups and hence they are neither completely specific nor is the function completely dependent on its structure alone.

Classification of Protein Structure: The average protein consists of about 20000-30000 atoms and in order to make sense of the structure of the protein, it is necessary to simplify the protein structure. There are more than 30000 structures in the protein database [4] and to go about looking at each structure would be horrendous. Hence, one needs to come up with a classification scheme for protein structure.

One way of simplifying it is to break it into parts called the secondary structure of the protein. There are various courses/books [1,2] to explain the secondary structure, but for our discussion, it is sufficient to know that certain configurations called alpha helices and beta sheets are common structural motifs found in nearly all proteins. While these secondary structures do help in understanding the local structure of the proteins, they give very little insight about the chemistry that the protein is able to perform and about it's active site itself.

A second and more meaningful attempt at classification of protein structure would be to find certain common structural motifs that can exist independently and classify the proteins based on these structural motifs. For example, a protein can be multifunctional but each function can be carried out independently by different parts of the protein even if you split them up. It could make sense that one can split these multifunctional proteins up based on what function they perform and if you can find the same structure/function motif in different proteins, club them together as a single group. In a protein, the part of the protein that can maintain its structure and function independently is called a domain [5]. Quite often domains of one shape combine with domains of very different shapes to form quite different proteins (very much like building blocks can come together in various different configurations giving walls of various different shapes) [6].

To give meaning to the classification scheme, it would also help to know which proteins perform related function (for example, perform the same reaction on different substrates) or have active sites in the same region of the protein structure. In order to give meaning to this classification, it is better to form groups of more closely related structures that perform similarly. Great minds have always argued that evolution should be the guiding principle while studying biology and it does make sense to classify proteins which have a common evolutionary origin from those that have achieved the same structure independently (also called convergent evolution).

There are various different databases that divide proteins into individual domains and divide these domains up into evolutionarily related groups heirarchically. These databases include the SCOP (Structural Classification Of Proteins) [7](manually divided), CATH (Class,Architecture, Topology, and Homologous superfamily) [8], and FSSP (Families of Structuraly Similar Proteins) databases [9](automatically performed). However, these databases are often flawed and corrections to these databases are often suggested in literature. Part of the problem is it is very difficult to say when similarity in structures occured due to homology (evolutionarily related), or convergence (evolutionarily independent origins). The trivial relationships are those that are apparent in the sequences of the two proteins. When two proteins have very similar sequences (measured by the number of times they have the same amino acid or a slightly related amino acid in the same position of the structure), they are related and statistics based on extreme value distributions can be used to find the probability of both proteins having a common origin [11]. However, the structure remains conserved (does not vary much) much more than sequences and below a certain sequence identity, it is very difficult to prove that there is a relationship between the two proteins without a structure [10].

Other problems that come up are related to the process by which structures are obtained (X-ray crystallography or NMR spectroscopy). These methods are inherently noisy because of various problems such as Heisenberg's uncertainty principle onto the crystallization conditions and the substrates that interact with the protein. So there is never a completely correct structural alignment (that is finding one to one which residues in the structure overlap each other) that also causes minor problems in the classification procedure.

But the most important problem is the level at which to classify structures. While domains are the most commonly used level of classification (because a domain is basically independent), during the evolution process, domains might not have been the basic level at which proteins were constructed. Rather subdomain level small structural units called structural words [12] or foldons [13] (because they could be independent folding units) could also be the smallest level of proteins that had evolved from the RNA world. The theory is that these foldons could come together and form various different domains and then evolved further to form proteins with different functions.

References:
[1] - Biochemistry by Stryer.
[2] - Introduction to Protein Structure by Branden and Tooze.
[3] - A perspective on enzyme catalysis by Stephen Bankovic and Sharon Hammes-Schiffer
[4] - RCSB protein database.
[5] - Domains.
[6] - Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination by Gordana Apic, Wolfgang Huber & Sarah A. Teichmann.
[7] - SCOP.
[8] - CATH.
[9] - FSSP.
[10] - How far divergent evolution goes in proteins - Murzin.
[11] - Maximum Likelihood Fitting of Extreme Value Distributions - Eddy..
[12] - On the evolution of protein folds - Lupas, Ponting, and Russell.
[13] - Foldons, Protein Structural Modules, and Exons by Anna Panchenko, Z. Luthey-Schulten, and P.G. Wolynes.

15 comments:

CuriousCat said...: Hey BaL, I have a very limited "theoretical physics" point of view understanding of proteins. That is to say, I have thought about the landscape theories and worried about pathways to native structure in the most abstract of setting, probably not even remotely close to the ral world you are talking about. So when you say "would be to find certain functional elements of a protein and find it in different proteins and classify these functional elements" I am not able to see if you are talking chemistry (i.e., a particular sequence of atoms leading to a unique functional structure) or physics (in that a wide range of possible chemical structures that lead to a generic functional structure). Could you clarify? Is my question making sense?; 8/10/2007 7:04 AM
Born a Libran said...: @CC: Actually, they have done it more from a physics point of view as well as from a chemistry point of view. In the physics point of view, you would be looking for the same overall structure or fold of a protein and very similar active sites (the regions where the chemistry occurs) but not necessarily the same chemistry. In the chemistry point of view, they look at the active site alone and just try to classify the proteins based on the range of possible chemical reactions that can take place or has been observed to take place naturally. In this post, I was concentrating on the physics side. There are possible implications due to that - having a common evolutionary origin could help us understand how they have changed from the common ancestor to be specific to their substrates. Also, possible implications (but never really been conclusively proved) is that they have very similar funnels and maybe have certain elements conserved for their folding. If you want more info, I have loads of refs for it. Just shoot me an email.; 8/10/2007 8:52 AM
Born a Libran said...: @CC: In that particular part of the post, I meant do we do the classification based on the whole protein or parts of the protein. Some proteins are able to perform multiple functions. So do we treat these as 1 structural unit or should the parts of the protein that perform each function be treated independently. I hope I am clearer now.; 8/10/2007 9:07 AM
CuriousCat said...: I see what you are saying BaL...let me mull some things over and refresh my memory on things I thought about years ago and then come back and bother you with more targeted questions..; 8/10/2007 1:27 PM
Wavefunction said...: Just one comment; while Heisenberg's uncertaintly principle does apply to all forms of spectroscopy, I think it should be noted that there are problems much more serious than that hindering NMR spectroscopy, and Heisenberg is not really the major source of practical problems, although it is a limit. Even if the Heisenberg principle pointed to a much finer line width than what it does, overlapping resonances would still be the major cause of problems.; 8/18/2007 8:31 AM
Born a Libran said...: @ashutosh: Actually, I was trying to talk about X-ray crystallography here. But I agree with the rest of your comment. I did mention some of the major problems I know about but NMR does have lower resolution than X-ray crystallography in my opinion...; 8/19/2007 12:52 PM
workhard said...: Been reading your posts, very informative stuff..

Work from home; 2/11/2009 1:19 AM
paper writing services said...: This is the most interesting and at the same time the simplest scientific article I've ever read! Amazing!; 6/05/2011 3:19 PM
pay per head said...: Thank you for sharing your journey, it is very inspiring, and I use it to motivate me with my own fat loss journey.; 5/13/2012 8:04 AM
facebook covers said...: thanks for great post this is really inspiring post in your blog!; 9/02/2012 1:25 AM
Anonymous said...: Hello! This is my first comment here so I just wanted to give a quick shout
out and tell you I truly enjoy reading through your blog posts.
Can you recommend any other blogs/websites/forums that go over the same subjects?
Thank you so much!

Feel free to visit my site :: pramiracetam; 5/11/2013 4:28 AM
Anonymous said...: Its like you read my mind! You appеar to know a lot about this, liκe you wrote the
book in it or somеthіng. I think that you could do with a fеw pics to drive the messаge homе a little bit, but other than thаt,
this is wοndеrful blog. А fantaѕtic read.
I'll definitely be back.

Review my website: phoenix az carpet cleaning; 6/15/2013 4:06 PM
Anonymous said...: I'm really enjoying the design and layout of your website. It's
a very easy on the eyes which makes it much more pleasant
for me to come here and visit more often.
Did you hire out a designer to create your theme? Excellent work!

Review my weblog garage door repair phoenix; 6/16/2013 5:10 AM
Anonymous said...: Hі! Ӏ κnοw this is kіnԁa οff toрic howеvеr , I'd figured I'd ask.
Would уou bе іnteгeѕtеd in trаding lіnks or mаybe guest authoring
a blog poѕt οг vice-versa? Μy blog gοes οѵer a
lot of the samе ѕubϳectѕ аs yоurѕ and I
feеl ωe сould grеatly benefit from eaсh оther.

Ιf yοu're interested feel free to send me an e-mail. I look forward to hearing from you! Wonderful blog by the way!

Here is my web blog - capital letter tattoos; 6/18/2013 6:16 AM
Blogger said...: cobalah greek yogurt tinggi protein; 4/24/2019 1:50 PM

Scientific curiosity

Thursday, August 09, 2007

Classification of Protein Structure

15 comments:

Labels

Contributors

Recommended Reading

Blog Archive

Followers