On models and misunderstandings

Folks commonly misunderstand what the term ‘model’ means in science, particularly those operating from a particular theological or ideological model of the world that leads them to attack mainstream conclusions in climate science or evolutionary biology. This confused comment attacking climate models is fairly typical:

Extrapolation is not fact. It is estimate. And the accuracy is in the eye of the beholder. So if they [the North Carolina legislature] want to legislate HOW to estimate, it is far less controversial than you make it sound. You base estimates on past experience, not models, which is what climate change is really based on, not fact.

This person is lacking a coherent notion of what extrapolation, estimate, model, and fact mean in science. Reading the comment in context, this person seems to be defending the idea that a linear fit to your data which you use to make predictions is “extrapolation” from past experience, not a model, and is a more reliable way to do science than using a model. To be fair, this confusion is common, and in my experience the role of models in science is not generally taught well in schools. So let’s talk about the role of models in science.

1) Science is based on models. That includes all science. Models express our beliefs about causal relationships in the systems we study. Any time you deal with causal relationships you generally use a model to capture those relationships in some organized form: the attraction and repulsion between atoms, relationships between reactant concentrations and reaction rates, the diffusion of perfume in a room, the relationships between temperature, ice sheet area, and sea level, the gravitational attraction between masses, all of these are dealt with using models. Those pretty pictures of protein and DNA structures you see in science magazines are models, based on approximations of the behavior of chemical bonds and data from X-ray diffraction experiments. Supply and demand. Volcanoes. The carbon cycle. We use models to study them.

2) Generally when you apply mathematical equations to your data, those equations are your model.* Most (all?) relationships that you’re taught to think of as ‘fundamental laws’ are models, but they happen to be extremely general and typically highly exact, and they usually have a stellar evidential track record. But they are models nonetheless, including Newton’s and Maxwell’s laws. Fundamental laws don’t perfectly express exact fundamental relationships; they are human constructs that represent our best but inexact understanding of what we believe are fundamental relationships.

Scientists make predictions from models. Any extrapolation from your data is based on a model; a linear model like y = mx + b is no less a model than a non-linear model. There is no reason to assume that linear models are a better, a more honest, or a more accurate representation of the real relationship between your variables. In fact, given how non-linear the world tends to be, linear models are usually a less than optimal choice, but scientists like to work with them because they are much more mathematically tractable than non-linear models.

Models very often involve simplification. The real world is complex, but many complexities can be conveniently ignored. (This is particularly true of fundamental laws, which are very idealized representations of the world, particularly the macroscopic world.) The most informative models capture the most relevant, important, and interesting causal relationships of your system of interest. Models that try to capture every complexity become unwieldy and less useful.

3) Models are not the same thing as computer simulations. I occasionally hear climate science derisively denounced because it is based on “computer simulations.” You can use a computer to simulate the behavior of your model, but not all simulations are based on good models. I could build an awesome game physics engine (well, actually I couldn’t, but someone else could), one that accurately simulates the behavior of exploding objects and shattering glass on a modern urban battlefield, without basing the internal workings of that engine on anything resembling our known understanding of physics. In that case, I would have created a cool simulation, but not a genuine physical model.

Scientific simulations however, are based on models, on quantitative expressions of what we believe to be the key causal relationships. Often our models get so complex that we can’t wrap our minds around them just by looking at the equations, and so we use computers to see how those models behave. Such simulations are not inherently less valid than simple linear models, but in practice it is harder to successfully find good model parameters for complex models. That means the predictions of complex models can be more uncertain. (On the other hand, you can also run into some pretty serious problems with parameter estimation for linear models when you use them to represent complex, non-linear phenomena.)

To sum up, this is how the process works in just about every branch of science:

Step 1: You formulate a model to represent your best understanding of how something works, of the underlying causal relationships. That model doesn’t have to be mathematical, but it often is.

Step 2: You try to match your model to your data, by fitting or training your model on this data. This step lets you plug some real numbers into your model, and you can see if your model behaves at all like the system your are studying.

Step 3: Once you’ve trained your model, you use your model to make predictions or extrapolations. These can be predictions about the future, such as the sea level rise as ice sheets melt, or about areas that haven’t been studied yet, such as the behavior of a genetic variant in a different cell type or treatment outcome of a different patient population.

Step 4: Very important, but too often skipped: check your predictions against reality. If your predictions are about the future, you may have to wait a bit. The success of your predictions will increase your confidence in your model, or cause you to revise it.

Good scientists use models. Bad scientists use models. There are bad models and good models, but all fields of science depend on models. There is no such thing as a model-free science that is somehow less speculative and more close to the data.

* Obvious exceptions are definitions, like ∆G = -RT ln K (if you don’t consider this a definition, then at least accept that it can be derived from the definition of G and K). I would also consider descriptive statistics an exception.

Author: Mike White

Genomes, Books, and Science Fiction

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s