The Emergence Observatory » The Recipe Inside The Recipe

Oct 29 2011

The Recipe Inside The Recipe

Published by Peter D'Adamo at 12:50 pm
Under Complexity | Generative Medicine | Information Theory

n insignificant percentage of the total amount of DNA is devoted actual gene function. The most common protein recipe in the human genome is not even for a human protein, but rather an enzyme commonly used by viruses to copy them called reverse transcriptase, an essential part of the toolbox used by the AIDS virus. Reverse transcriptase genes account for about 1-2% of the entire junk DNA in the human genome, which may not sound like much, but then again remember that the actual genes that account for you only amount to about 3% of the genome. Humans have about 23,000 genes, which is certainly more than most fungus (around 6,000) and many worms (around 19,000) but less than some fish (around 40,000) and most plants (around 60,000).

As with most techniques, it’s not what you have, but rather what you do with it.

You’d think that the job was simple enough, string some nucleotides into a few codons, and away you go. But no, it has to be difficult! I remember when I first had Cable TV installed in my house, it was advertised as being commercial-free, and for a while it was. However, gradually more and more commercials have been added to the Cable Program Roster, to the point where it is hard to tell the difference between Pay or Cable TV and Commercial TV —other than the fact that you pay for one and not the other. Genes are fond of running commercials during their broadcasts.

In geneticalese we call these commercials introns and the programs exons.

Messenger RNA (mRNA) is usually primped before it is shot out of the nucleus, the primping usually involves taking out all the introns, and reconnecting the exons, just as if you had paused the VCR during commercials as you were recording the Super Bowl.

I can still taste the chocolate.

When completed, the haploid human genome found far fewer genes than had been expected before it was sequenced. However the case has been advanced that a process by which exons of the precursor RNA produced by transcription of a gene are reconnected in multiple ways during the RNA splicing that produces mRNA. The resulting different mRNAs may be translated into any of several different forms of the same protein (protein isoforms) or a variety of glycoproteins with different attached glycans (polysaccharides). Thus a single gene may code for multiple proteins. Alternative splicing greatly increases the diversity of proteins that can be encoded by the genome, and in humans it is estimated that over 80% of genes are alternatively spliced.

Just exactly how much DNA makes up a gene? One common definition, advanced by Richard Dawkins, is that a gene “is any portion of chromosomal material small enough to last for a large number of generations.” However, this definition has utility only with regard to evolution. Many geneticists use the concept of a cistron interchangeably with the term “gene.” The most common definition of a cistron is “a section of DNA that contains the genetic code for a single polypeptide and functions as a hereditary unit.” Sound like a gene, doesn’t it? However using the terms interchangeably, although common, is not correct. Why? Because as a result of some recent research, it appears that some cistrons can encode for more than one protein.

To understand how this is possible, it is necessary to understand the deeper working of the cistron, which it turns out, is rather complex. Like genes, cistrons contain “meaningful” information –the sequence of bases that code for amino acids. As we’ve learned, these are called exons. However, dispersed in the cistron are chunks of additional base sequences that don’t appear to do anything at all, called introns. Now imagine a recipe for chocolate chip cookies is made up of mixing two cups of flour, one tablespoon of chopped peanuts, one cup of butter, one cup of chocolate chips, one cup of sugar and two eggs. The way information is contained in DNA is exactly like the way that information needed for making your cookie dough is contained here: (1)

Mix two cups of flour, one cross related two cups tablespoon of chopped bag element peanuts, one cup of case honest butter, one cup of penguin green chocolate flint chips, one walking spoon cup of nail bank sugar and two canvas eggs.

A process known as alternative splicing has been identified by which the spliceosomes in different cells can do different things with the same pre-RNA, thereby generating two or more different proteins (called isoforms) from the same code of pre-RNA. In other words, the same block of information can produce two different outcomes, two different protein products. In humans, over 80% of genes are alternatively spliced, which may help explain why the total number of genes in our genome is rather on the low side. For example, our chocolate chip cookie recipe, hidden inside the gibberish of cookie introns, also has inside of it a recipe for peanut butter cookies as well:

Mix two cups of flour, two cups peanut butter, one cup of sugar and two eggs

The vast majority of reverse transcriptase coding in junk DNA probably has little to do with retroviruses such HIV, or Feline Leukemia Virus. Rather, the reverse transcriptase is probably there as a leftover of certain types of “jumping genes” called reverse transposons (retrotransposons).

Much of our junk DNA is repetitive; certain patterns of Cs, Ts, Gs and As just repeat themselves. These chunks are usually about 50-100 bases long and the number of these spread across the chromosome vary considerably from person to person.

Genes differ widely from each other. The gene for insulin, a relatively smallish gene, is 1700 base pairs long which, in a railroad analogy, would produce a stretch of insulin railroad track 1700 sleepers long. Figuring each sleeper as being about two feet apart, a stretch of railroad track gene sufficient to code for insulin would be a little longer than half of a mile. At the other end of the spectrum is the gene that codes for a common type of muscular dystrophy. It is two million base pairs long. If it were a railroad line, this length of gene track would stretch from New York City to Chicago.