## The link between information and entropy on Azimuth

Mathematical physicist extraordinaire John Baez digs in to Shannon entropy and coding over at Azimuth:

So, I want to understand Shannon’s theorems and their proofs—especially because they clarify the relation between information and entropy, two concepts I’d like to be an expert on. It’s sort of embarrassing that I don’t already know this stuff! But I thought I’d post some preliminary remarks anyway, in case you too are trying to learn this stuff, or in case you can help me.

## What’s cooler than information and entropy?

The answer is, of course, not much. A subject near and dear to my heart: Evolution, Entropy, and Information, over at Cosmic Variance, referencing John Baez’s great series on information geometry (latest entry here).

For a project I’m working on, I’ve lately been considering the various ways your DNA can be considered to contain information, and how we can use information content to read DNA sequences. Information theory has of course been used for a long time to find functional sequences in DNA (see this classic paper). This is what we call bioinformatics, but there are also more physical reasons to use information theory to understand biology. There are deep relationships between information, thermodynamics, and computation (PDF), which we can use to understand how the thermodynamic system of the cell processes information contained within DNA.

There will be more on this in the future, but in the mean time, go check out the links.

## Energy and information (or lack thereof) in biological thinking

Eric Smith, “Thermodynamics of Natural Selection” (PDF):

The two paradigms dominating biological theory are the machine-like functioning of componentry (increasingly elaborated in molecular biology) (Alberts, 2002), and the Darwinian framework for understanding the stochastic dynamics of death and reproduction (Gould, 2002; Lewontin, 1974). The representation of biological processes as machines is often by way of models, which represent control flow and causation, and for which the goal is to conceptually or quantitatively reproduce typical observed behaviors (mechanisms of binding, Stormo and Fields, 1998, transcription or translation, Berman et al., 2006, cell cycling, Novak et al., 2001, regulation of cell division, Tyson et al., 2002 or metabolic pathways, Holter et al., 2001, etc.). Energy naturally appears in these contexts as an input, as a quantitative constraint, or as a medium of control. However, models constructed for the purpose of illustrating causality often diminish the importance of the incursion of error at all levels of organization and the consequent energetic costs of systemic error correction, and so are not suited to composition into a system-level description of either emergence or stability. At the other extreme, Darwinian selection is a purely informational theory, concerned with emergence and stabilization through statistical processes. Yet, for lack of a comprehensive theory of individual function, models of the dynamics resulting from selection inevitably take for granted (Hartl and Clark, 1997) the platform of physiology, growth, development, and reproduction, decoupling the problem of information input from energetic constraints on the mechanisms by which it occurs.

## How to find your way in E. coli without stopping for directions

One of the keys to success in life is to regulate your genes properly. Genes are regulated by transcription factor proteins, which have to navigate their way around the genome and bind particular DNA targets. The problem is that there are only a few correct targets and the genome is large. So an obvious question is, why don’t transcription factors get lost? Do they stop and ask for directions? Where is the information for genome navigation coming from?

The answer to this question is still being worked out for eukaryotes, but it has been solved for E. coli. Peter von Hippel and Otto Berg largely figured out the answer in their classic 1986 paper “On the specificity of DNA protein interactions.” E. coli’s solution for making gene regulation manageable is simple and elegant, because this bacterium has the virtue of possessing a small genome. Let’s take a look at how genome navigation works in a bacterium: Continue reading “How to find your way in E. coli without stopping for directions”

## How DNA is like a Magnet

Now that we have piles and piles of widely available genome sequence, one of our main tasks as biologists is to figure out how to read what’s in there. Protein-coding sequences have long been relatively easy to read, ever since the genetic code was worked out. Non-coding regulatory sequences – enhancers and promoters – are much more difficult to interpret, obviously. Usually our first task is to identify the individual binding sites for gene-regulating proteins in these sequences. But then what? Well, most people stop there, happy to have identified the necessary parts of the gene regulating machinery, but many of us are interested in learning the underlying logic by which this machinery operates – we want to learn the grammar of regulatory DNA. The question is, how does a particular combination of regulatory binding sites give rise to a particular pattern of gene expression? In my biased opinion, this the real secret of life – how your cells read information in your DNA in order to turn on the right genes at the right place in the right time.

So, how do we read non-coding, regulatory DNA? One way that has proven very useful is take an approach from the 1920’s that was developed to understand the physics of magnets. No, I’m not talking about the pseudoscience of biomagents; I’m talking about Ising models. Continue reading “How DNA is like a Magnet”