Sunday, November 16, 2008

Human-Chimp DNA Similarities: Truth, Distortions, Lies

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy (Shakespeare)

And for every minute thing in heaven and earth, Hamlet,
An infinitude of lies and distortions regarding it await (Me)
**************************************
If falsehood, like truth, had but one face, we would be more on equal terms. For we would consider the contrary of what the liar said to be certain. But the opposite of truth has a hundred thousand faces and an infinite field. (Michel Eyquem de Montaigne)
**************************************

So many percentages are floated around regarding human/chimp DNA similarities. Despite the essentially non-existent contribution of creationists to biological science, google "human chimpanzee DNA similarities" and you'll be bombarded with creationist tracts. For every fact-based statement about biological reality, you'll find creationists offering a near-infinitude of lies and distortions.

DNA can be compared to a book, written with only four letters. If your task is to assign a percentage value to the similarities between two books, how would you do it? It's tricky! If you simply try to line up two otherwise identical books, letter for letter, the deletion of a single letter on page 1 of one text will result in a very low similarity score. How will you treat alternative spellings? In such a case, no meaning is lost...shouldn't a text riddled with alternative spellings receive a higher similarity score than a text riddled with spelling errors? If, oddly, one version tells exactly the same story two times, is it really fair to say that this version differs from the other by 50%?

Needless to say, good, intelligent, earnest, truth-seeking scientists have devised all sorts of measures to compare DNA strands. Needless to say, creationists accuse these scientists of various anti-god conspiracies. Needless to say, given a choice between two different comparison methods, the creationists will opt for the method that offers the lower percentage.

What are these percentages?

99.7%: This is the percentage of similarity in protein coding regions when "synonymous" substitutions are ignored. Imagine two cookbooks. Much of the two books are drivel...expositions on the author's intimate relationship with fennel, for example. However, a portion of both books actually offer recipes that you can try out. A taste-test of consumers finds that the end products of the two books are 99.7% similar.

98.4%-98.8%: These are the most commonly cited measures of human-chimp similarity. Most of the relevant studies here simply seek to match up letters to letters (as opposed to words to words, or paragraphs to paragraphs). Mary-Claire King's 1973 study of chimp/human differences in a handful of proteins extrapolated a 99% similarity, and the figure hasn't change much since. Since better than 80% of mutational events involve letter/letter substitutions, the 98.5% figure is probably the one that best gives a sense of the degree of divergence between humans and chips.

95%: This is the 2002 similarity estimate arrived at by Roy Britten at CalTech. His method of calculating similarities between books differs from that of most other researchers, though all parties are using the same books. Britten includes deletions or insertions of words, sentences, and paragraphs in his work, a somewhat controversial tactic. To quote Carl Zimmer:

Suppose a stretch of our DNA 6,000 base pairs long disappeared a million years ago. Britten would count that as 6,000 separate changes, yet other geneticists would count it as a single evolutionary event.

And isn't the counting of evolutionary events the truly important measure when we consider the divergence of humans and chimps?

83%: This is the percentage of coding sequences on human-chimp chromosome 22 whose corresponding proteins show any differences at all. To simplify, imagine two cookbooks. The proportions of ingredients in 83% of the recipes differ very slightly (say, one part in 300), with the end result being that consumers usually can't even taste a difference. The remaining 17% of recipes are exactly the same. Shall we conclude that the two cookbooks are "83% different"?

76%: For various reasons, including the fact that the chimpanzee genome was not fully sequenced at the time of the study in question, scientists chose to "align" 76% (2400 gigabases out of 3100) of the human and chimp genomes and then proceed to ask questions based on this alignment. See the 48.6% figure below for a book-based analogy. The figure, of course, has been propagated by agenda-bearers (in particular, one Richard Buggs) as an example of the progressive lowering of human-chimp homology estimates. Never mind that the same paper goes on to reaffirm previous estimates of homology, based on the 76% of usable, high-quality sequences.

To look at it another way, one can imagine a study where scientists choose to compare human/human DNA. Due to budget constraints, only 10% of the sequences are deemed of high enough quality for comparison. Do we conclude that humans are only related to each other by 10%?

Buggs argues that 76% is a conservative, purely scientific derivation, but that's bunk: it's like reading the first 76% of two books, finding they're extremely similar, and then stating there's a 50/50 chance that the remaining pages will diverge in content.

48.6%: This refers to Fujiyama's 2002 comparison of chimp and human DNA. In addition to the fact that Fujiyama employed stringent comparison requirements, the human genome sequence was not completed until 2003, meaning that Fujiyama was working with a draft. It's as if you have two similar books, but one is incomplete. You cut out text from the finished book and try to match it to text in the other. The degree to which the two books match up will then largely depend on the extent to which they're finished. The difficulty of comparison is increased if one book is being written by constantly adding sentences in random locations, as opposed to tacking successive paragraphs to the ends of the last.

Does the above mean that Fujiyama was wasting his time? No. He simply continued his analysis and made inferences based on the text sequences that did line up reasonably well (the ones with minimal "valid alignments"). In those cases, the letter-for-letter similarity came in at 98.77%, confirming earlier studies.

29%: To simplify a bit, assume two books of 20,000 paragraphs. If 29% of all paragraphs are precisely the same, and most paragraphs only differ in one or two letters, would you conclude the two books are hugely different? That's what the creoids do after "reading" 2005's monumental "Initial sequence of the chimpanzee genome and comparison with the human genome".

25%: If your books contain four letters, with each letter randomly appearing about 25% of the time, chance dictates that two letters will line up 25% of the time. A fool will then exclaim that no two books can ever differ by more than 75%. Never mind that no credible comparison methods are this simplistic. A single letter may match up 25% of the time, but two letters will only match up 6.25% of the time (etc.)

6.4%: This is a recent estimate of difference in "copy number" between humans and chimps. To simplify, assume that one book has duplicates of 6.4% of its paragraphs and the other hasn't. If the two books were previously thought to be 98% similar, does this new information now mean that we can subtract 6.4% from 98% and get a revised estimate of 91.6%? Not really. It's a simple matter to copy and paste a paragraph to a new location in your text, but it takes some serious effort to write a new paragraph. To make the fallacy even more obvious, consider two books that are precisely the same, except for the fact that one book duplicates all paragraphs 100 times. Is it really fair to say they are only similar by 1%?

I'm not critiquing the paper per se. However, if humans find some comfort in the 6.4% figure, mice should take deep solace...they differ from their dirty, unrefined cousins, the rats, by a full 10% in copy number.

0%: This is the frequency with which the creobots, when given a choice between two homology measurements, choose the larger. It's also tempting to offer this percentage as the contribution of these ninnies to biological understanding, but that would be generous...I'm not being facetious when I say that a negative number would be most appropriate.

No comments: