FoGT vs MMH: Climate Models vs Weather Models, Null Hypotheses vs Bullshit
03/09/10 19:43 Filed in: Gavin Kirsch | WoS
By Gavin Kirsch, FoGT Chief Climate Modeler
A manuscript to be published in popular online weather comic Atmospheric Science Letters by our Friends Ross McKitrick and Steve McIntyre, along with some guy called Herman, has lately been generating some buzz in certain parts of the blogosphere (although not as much as a way funnier one by another couple of statisticians who manage to conclude both that 1997-2006 was 80% likely to have been the warmest 10 years in the last 1000 and that Michael Mann was wrong). These three amigos, MMH - who shouldn’t be confused with MBH, but the H gives them a hint of warmalarmist familiarity and the M&M jokes were getting old - use ‘new’ methods to re-roast an old chestnut (or rather, marshmallow): the comparison between observed and modeled temperature trends in the tropical troposphere.
Non-braindamaged readers might that recall this topic was the subject of some acrimony a couple of years back. Friends of Science hero David Douglass and others cheekily took the mean trend (and only the mean trend) of a number of climate model mean trends and pretended to be amazed when it turned out different from the mean trend of a bunch of observations that were known to be problematic but were nonetheless not assigned uncertainties. IPCC hit-squad Ben Santer and the Gang of 16 responded by showing that a more thoughtful consideration of errors not only made models and observations overlap but also allowed a range of model trends wide enough to please everybody (or ultimately, nobody). In their new paper, MMH have re-examined the models and (updated) observations and have approached model-prediction uncertainty using a couple of techniques they found in econometrics textbooks. While like Santer et al. they found no model - observation disagreement over the period 1979-1999, they concluded that for the period 1979-2009 the observations tracked significantly below the model outputs.
While econometric approaches can fairly be criticized on the basis that economists, unlike marmots, are completely useless at predicting anything, the latter result has nonetheless led to some anxious hand-wringing in the warmalarmist community. Certainly, a couple of potential problems with the analysis have been identified; and although it’s standard practice, the use of the ensemble mean itself has been criticized on the not unreasonable grounds that there’s no statistical (let alone philosophical) basis for assuming the mean of a distribution of models of an unknown reality should be any closer to that reality than any individual model. FoGT statisticians conclude that these attacks are simply the last desperate cries of corrupt world-government scientists who’ve lost the argument. We do, however, find a previously unseen blunder in the analysis of the three amigos that when corrected illustrates the true significance of their result: they failed to recognize that compared to the climate model set, the observation set is extremely data-limited; and to enable equivalent statistical analyses to be applied to both datasets, many alternative observations need to be generated.
The reason for this is straightforward. The climate system is very complicated, featuring many cyclical processes, themselves internally variable, interacting with each other in different ways to produce even larger variations on short timescales from the daily to the decadal. While climate models attempt to incorporate the processes that lead to these variations, the complexity of interactions among those processes means that no model can predict their exact timing and amplitude. Individual climate models give different results, or ‘realizations’ of short-term variation each time they’re run, and different climate models of course behave differently from each other, so when many runs of many models are put together, the result is a graph of the sort shown below (from a prominent warmalarmist website).

Now, what many people fail to recognize is that the possible variation in the actual climate [what some might call the ‘weather’] is analogous to the variation in the models. Our lack of ability to account for and perfectly quantify every important variable in the global weather system at any one time is very much like a random ‘initialization’ of a climate model run: it permits many different combinations and permutations to happen. This is why the weather, and even major short-term climatic events like ENSO, can only be predicted a short time in advance and with limited precision, and why meteorologists are derided almost as much as economists. In contrast to the multiple outputs that climate models generate, however, we only get to experience one ‘realization’ of the weather - and for comparing actual climatic variation, short term and long, to climate models, this is a problem. Unimaginative people have approached this conundrum by attempting to refine or statistically constrain the climate models. Visionaries that we are, we at FoGT have taken the opposite approach: we have generated multiple realizations of the actual climate. We call this innovation the Weather Model.

While the exact processes involved in making this model are the subject of an upcoming patent application, the principle is very simple: we took the input data - in this case, the 1979-1999 tropical lower troposphere temperature anomaly series from the satellites belonging to our Friend Dr Roy Spencer - and shifted that data up to 60 months in 12 month increments either way. These shifts were designed to reflect the 3-5 year variation in the ENSO cycle, which our Friends McLean, de Freitas, and Carter recently showed to be responsible for the majority of short-term global temperature variation. We regard this (FoGT-1_E5) as a simple, ‘first generation’ weather model. In fact, astute readers might detect that trends drawn from the ‘weather realizations’ comprising it are astoundingly similar to trends taken from different time intervals in a 25.5 to 31.5 year window on the actual observations. Future updates will add sophistication, varying the period and amplitude within each realization, but it can be seen that even this simple version imitates the climate models quite well - notice how the amplitude of short-term variability is similar on both graphs - and it gives us 11 realizations, rather than just one, to compare to the 23 climate model outputs analyzed by MMH.
For their analysis, MMH took each climate model output and a number of observational datasets (including the satellite one we used for FoGT-1_E5), determined linear trends for each using their econometric methods, and then compared those trends and their 2-sigma uncertainties. In a critical step, they summarized the behavior of the climate models by doing exactly what our Friends Douglass et al. were so rudely treated for: they took the mean trend of the 23 model trends, estimated the uncertainty of just that mean trend, and used that as the climate model ‘prediction’. The effect of this is shown in the graph below.

The coloured lines are individual model trends from their Table 1, and the black dashed and dotted lines are the mean model trend and +/- twice its ‘standard error’ from their Table 2. Exactly how their ‘standard error’ was determined we are not told, and the fact that it excludes the vast majority of the data supposedly used to create it does seem rather odd, but hey, these guys are professional economists and statisticians, so who are we to argue. McIntyre, on his famous website ClimateIdiot, justifies this procedure with an appropriately schoolboy analogy (which critics might counter with the observation that the composition of the mythical ‘median school’ would be a hopeless predictor of the composition of any randomly-selected real school, but we note with satisfaction that the critics on ClimateIdiot haven’t).
Obviously, if this is the statistically correct way to represent the climate model data, we should represent the observations the same way before comparing them. Thanks to our Weather Model, we can now, for the first time, do this. The plot below is analogous to the one above. The coloured lines are the linear trends from the weather realizations shown in our first graph, the black dashed line is the mean trend, and the dotted lines are the +/- 2x ‘standard error’ that we estimated, in the absence of knowledge of the three amigos’ procedure, by assuming it encompasses the same proportion of the data spread as their climate model error envelope: about 16%.

Readers will immediately notice that the bold pink line, which represents the actual weather over the period of interest (start 1979 to end 2009) is slightly lower than the trend determined by MMH; we suspect the difference is because we used more sophisticated techniques (i.e., the latest version of Excel). Ours is obviously superior because it’s cooler, but for our purposes here the small difference is not significant. Despite the overlap of the higher end of the individual weather trends with the lower end of the individual climate model trends, the means of each dataset are significantly different from each other in terms of their defined ‘standard errors’, thereby confirming the MMH analysis. Even though the Spencer satellite series has the coolest trend of their four observation sets (see their Figure 2), the mean trends of Weather Models built from the others would also be significantly different from the mean climate model trend. We are pleased to be able to confirm the three amigos’ work, and look forward to a Friendly mention on ClimateIdiot soon.
Now, the unimaginative types who would view our Weather Model as just a collection of moving snapshots of the same data series might consider it unjustifiable that trends derived from snapshots up to only 5 years either side of the chosen start and end dates of the 31-year period of interest are declared statistically invalid by our definition of ‘standard error’. Well, don’t blame us. This was exactly the procedure chosen by MMH for their climate model analysis, and as we’ve said, they’re the professionals and we want to compare apples to apples. Those unimaginative types might also notice that not only is the 1979-2009 line outside our ensemble mean uncertainties and must therefore be among the trends that are discarded, it’s also just about the lowest of all the weather realizations. They might therefore wonder aloud whether the three amigos’ choice of the 1979-2009 interval was cherry-picked. Such musings would constitute a very crude attempt at diversion.
So far, so good. But because the mean trends we and MMH have compared were constructed from time series having very considerable internal variability, the curious might be tempted to ask, is there a way to compare variability, rather than so-called trend, and what might be learned from such an exercise? FoGT’s research department has uncovered an obscure (i.e., it didn’t appear on a blog) paper by a couple of obscure (i.e., they’re not bloggers) climate scientists who did just that: Easterling & Wehner 2009. These two worthies took exactly the opposite approach to us and MMH by comparing the observed record to the outputs of individual climate model runs.

The graph on the left, Easterling & Wehner’s Figure 1, shows the NCDC surface temperature record from 1974 to 2009. Highlighted are two sub-decadal intervals that show no significant trend. The graph on the right, their Figure 2, shows 100 years of future output from a single run of the ECHAM5 climate model, forced by the evil IPCC’s A2 emissions scenario (‘business as usual&rsquo
. Again, highlighted are two intervals (including, oddly enough, 2001-2010) showing no significant trend. Inspection of the graph shows it would be possible to choose others of similar lengths, thereby demonstrating that this climate model produces short-term variability in the presence of a long-term trend.
What happens when all the climate models are analysed for variability on this sort of timescale?

The graph above (Easterling & Wehner’s Figure 3) shows the probability density functions of trends of any 10 year period in the NCDC observations and in all the climate models in the IPCC CMIP3 collection. The pre-industrial ‘control’ model (purple) is symmetrical about zero, showing an equal probability of negative and positive decadal trends in the absence of forcing, consistent with no long-term warming or cooling trend. The black (observations) and green (simulations with 20th century natural and so-called ‘anthropogenic’ forcings, e.g. plant food) curves are offset a bit to the right, suggesting a tendency for more positive than negative decades and an overall positive long-term trend. The blue and red curves, compiling simulations for the first half and the whole of the 21st century respectively, are offset further to the right but still have non-zero probabilities of negative decadal trends. For 2001-2050, this probability is about 10%; and using a similar approach, a blogger showed the probability for 2001-2007 was about 15%. We’re sure even the Friends of Science would admit that 15% is not zero and that the result could be consistent with even their advanced analysis of the observations.
Given such a result, what can we conclude about comparisons of climate models with observations over a selected ~30 year period? Obviously, the particular period analyzed is very important. If it contains an interval of about 10 years of no or negative trend, comparing it to an averaged trend of a large ensemble of climate models with similar multidecadal trends but interannual to decadal trends that don’t correspond with each other is as misguided as expecting that individual moving snapshots of the actual observations should have the same trend. Our Friends MMH did this explicitly when they formulated their null hypothesis that the average model trend should be equivalent to the trend in observations over a 30-year period and assumed that trends in both models and observations should be described by single straight lines over that period. This could have been simple naivety, or it could, given their record and the possibility of cherry-picking we noted above, be given the far more satisfying (to us deniers) interpretation of bullshit.
On the latter question, we’ll let our readers decide. Meantime, we’ll be writing to Atmospheric Science Letters informing them of our replication of the three amigos’ result and offering them the opportunity to be the first peer-reviewed journal to publish research that uses our new climatological innovation, the Weather Model. We’re sure they’ll jump at the chance; we’ll let you know.
A manuscript to be published in popular online weather comic Atmospheric Science Letters by our Friends Ross McKitrick and Steve McIntyre, along with some guy called Herman, has lately been generating some buzz in certain parts of the blogosphere (although not as much as a way funnier one by another couple of statisticians who manage to conclude both that 1997-2006 was 80% likely to have been the warmest 10 years in the last 1000 and that Michael Mann was wrong). These three amigos, MMH - who shouldn’t be confused with MBH, but the H gives them a hint of warmalarmist familiarity and the M&M jokes were getting old - use ‘new’ methods to re-roast an old chestnut (or rather, marshmallow): the comparison between observed and modeled temperature trends in the tropical troposphere.
Non-braindamaged readers might that recall this topic was the subject of some acrimony a couple of years back. Friends of Science hero David Douglass and others cheekily took the mean trend (and only the mean trend) of a number of climate model mean trends and pretended to be amazed when it turned out different from the mean trend of a bunch of observations that were known to be problematic but were nonetheless not assigned uncertainties. IPCC hit-squad Ben Santer and the Gang of 16 responded by showing that a more thoughtful consideration of errors not only made models and observations overlap but also allowed a range of model trends wide enough to please everybody (or ultimately, nobody). In their new paper, MMH have re-examined the models and (updated) observations and have approached model-prediction uncertainty using a couple of techniques they found in econometrics textbooks. While like Santer et al. they found no model - observation disagreement over the period 1979-1999, they concluded that for the period 1979-2009 the observations tracked significantly below the model outputs.
While econometric approaches can fairly be criticized on the basis that economists, unlike marmots, are completely useless at predicting anything, the latter result has nonetheless led to some anxious hand-wringing in the warmalarmist community. Certainly, a couple of potential problems with the analysis have been identified; and although it’s standard practice, the use of the ensemble mean itself has been criticized on the not unreasonable grounds that there’s no statistical (let alone philosophical) basis for assuming the mean of a distribution of models of an unknown reality should be any closer to that reality than any individual model. FoGT statisticians conclude that these attacks are simply the last desperate cries of corrupt world-government scientists who’ve lost the argument. We do, however, find a previously unseen blunder in the analysis of the three amigos that when corrected illustrates the true significance of their result: they failed to recognize that compared to the climate model set, the observation set is extremely data-limited; and to enable equivalent statistical analyses to be applied to both datasets, many alternative observations need to be generated.
The reason for this is straightforward. The climate system is very complicated, featuring many cyclical processes, themselves internally variable, interacting with each other in different ways to produce even larger variations on short timescales from the daily to the decadal. While climate models attempt to incorporate the processes that lead to these variations, the complexity of interactions among those processes means that no model can predict their exact timing and amplitude. Individual climate models give different results, or ‘realizations’ of short-term variation each time they’re run, and different climate models of course behave differently from each other, so when many runs of many models are put together, the result is a graph of the sort shown below (from a prominent warmalarmist website).

Now, what many people fail to recognize is that the possible variation in the actual climate [what some might call the ‘weather’] is analogous to the variation in the models. Our lack of ability to account for and perfectly quantify every important variable in the global weather system at any one time is very much like a random ‘initialization’ of a climate model run: it permits many different combinations and permutations to happen. This is why the weather, and even major short-term climatic events like ENSO, can only be predicted a short time in advance and with limited precision, and why meteorologists are derided almost as much as economists. In contrast to the multiple outputs that climate models generate, however, we only get to experience one ‘realization’ of the weather - and for comparing actual climatic variation, short term and long, to climate models, this is a problem. Unimaginative people have approached this conundrum by attempting to refine or statistically constrain the climate models. Visionaries that we are, we at FoGT have taken the opposite approach: we have generated multiple realizations of the actual climate. We call this innovation the Weather Model.

While the exact processes involved in making this model are the subject of an upcoming patent application, the principle is very simple: we took the input data - in this case, the 1979-1999 tropical lower troposphere temperature anomaly series from the satellites belonging to our Friend Dr Roy Spencer - and shifted that data up to 60 months in 12 month increments either way. These shifts were designed to reflect the 3-5 year variation in the ENSO cycle, which our Friends McLean, de Freitas, and Carter recently showed to be responsible for the majority of short-term global temperature variation. We regard this (FoGT-1_E5) as a simple, ‘first generation’ weather model. In fact, astute readers might detect that trends drawn from the ‘weather realizations’ comprising it are astoundingly similar to trends taken from different time intervals in a 25.5 to 31.5 year window on the actual observations. Future updates will add sophistication, varying the period and amplitude within each realization, but it can be seen that even this simple version imitates the climate models quite well - notice how the amplitude of short-term variability is similar on both graphs - and it gives us 11 realizations, rather than just one, to compare to the 23 climate model outputs analyzed by MMH.
For their analysis, MMH took each climate model output and a number of observational datasets (including the satellite one we used for FoGT-1_E5), determined linear trends for each using their econometric methods, and then compared those trends and their 2-sigma uncertainties. In a critical step, they summarized the behavior of the climate models by doing exactly what our Friends Douglass et al. were so rudely treated for: they took the mean trend of the 23 model trends, estimated the uncertainty of just that mean trend, and used that as the climate model ‘prediction’. The effect of this is shown in the graph below.

The coloured lines are individual model trends from their Table 1, and the black dashed and dotted lines are the mean model trend and +/- twice its ‘standard error’ from their Table 2. Exactly how their ‘standard error’ was determined we are not told, and the fact that it excludes the vast majority of the data supposedly used to create it does seem rather odd, but hey, these guys are professional economists and statisticians, so who are we to argue. McIntyre, on his famous website ClimateIdiot, justifies this procedure with an appropriately schoolboy analogy (which critics might counter with the observation that the composition of the mythical ‘median school’ would be a hopeless predictor of the composition of any randomly-selected real school, but we note with satisfaction that the critics on ClimateIdiot haven’t).
Obviously, if this is the statistically correct way to represent the climate model data, we should represent the observations the same way before comparing them. Thanks to our Weather Model, we can now, for the first time, do this. The plot below is analogous to the one above. The coloured lines are the linear trends from the weather realizations shown in our first graph, the black dashed line is the mean trend, and the dotted lines are the +/- 2x ‘standard error’ that we estimated, in the absence of knowledge of the three amigos’ procedure, by assuming it encompasses the same proportion of the data spread as their climate model error envelope: about 16%.

Readers will immediately notice that the bold pink line, which represents the actual weather over the period of interest (start 1979 to end 2009) is slightly lower than the trend determined by MMH; we suspect the difference is because we used more sophisticated techniques (i.e., the latest version of Excel). Ours is obviously superior because it’s cooler, but for our purposes here the small difference is not significant. Despite the overlap of the higher end of the individual weather trends with the lower end of the individual climate model trends, the means of each dataset are significantly different from each other in terms of their defined ‘standard errors’, thereby confirming the MMH analysis. Even though the Spencer satellite series has the coolest trend of their four observation sets (see their Figure 2), the mean trends of Weather Models built from the others would also be significantly different from the mean climate model trend. We are pleased to be able to confirm the three amigos’ work, and look forward to a Friendly mention on ClimateIdiot soon.
Now, the unimaginative types who would view our Weather Model as just a collection of moving snapshots of the same data series might consider it unjustifiable that trends derived from snapshots up to only 5 years either side of the chosen start and end dates of the 31-year period of interest are declared statistically invalid by our definition of ‘standard error’. Well, don’t blame us. This was exactly the procedure chosen by MMH for their climate model analysis, and as we’ve said, they’re the professionals and we want to compare apples to apples. Those unimaginative types might also notice that not only is the 1979-2009 line outside our ensemble mean uncertainties and must therefore be among the trends that are discarded, it’s also just about the lowest of all the weather realizations. They might therefore wonder aloud whether the three amigos’ choice of the 1979-2009 interval was cherry-picked. Such musings would constitute a very crude attempt at diversion.
So far, so good. But because the mean trends we and MMH have compared were constructed from time series having very considerable internal variability, the curious might be tempted to ask, is there a way to compare variability, rather than so-called trend, and what might be learned from such an exercise? FoGT’s research department has uncovered an obscure (i.e., it didn’t appear on a blog) paper by a couple of obscure (i.e., they’re not bloggers) climate scientists who did just that: Easterling & Wehner 2009. These two worthies took exactly the opposite approach to us and MMH by comparing the observed record to the outputs of individual climate model runs.

The graph on the left, Easterling & Wehner’s Figure 1, shows the NCDC surface temperature record from 1974 to 2009. Highlighted are two sub-decadal intervals that show no significant trend. The graph on the right, their Figure 2, shows 100 years of future output from a single run of the ECHAM5 climate model, forced by the evil IPCC’s A2 emissions scenario (‘business as usual&rsquo
What happens when all the climate models are analysed for variability on this sort of timescale?

The graph above (Easterling & Wehner’s Figure 3) shows the probability density functions of trends of any 10 year period in the NCDC observations and in all the climate models in the IPCC CMIP3 collection. The pre-industrial ‘control’ model (purple) is symmetrical about zero, showing an equal probability of negative and positive decadal trends in the absence of forcing, consistent with no long-term warming or cooling trend. The black (observations) and green (simulations with 20th century natural and so-called ‘anthropogenic’ forcings, e.g. plant food) curves are offset a bit to the right, suggesting a tendency for more positive than negative decades and an overall positive long-term trend. The blue and red curves, compiling simulations for the first half and the whole of the 21st century respectively, are offset further to the right but still have non-zero probabilities of negative decadal trends. For 2001-2050, this probability is about 10%; and using a similar approach, a blogger showed the probability for 2001-2007 was about 15%. We’re sure even the Friends of Science would admit that 15% is not zero and that the result could be consistent with even their advanced analysis of the observations.
Given such a result, what can we conclude about comparisons of climate models with observations over a selected ~30 year period? Obviously, the particular period analyzed is very important. If it contains an interval of about 10 years of no or negative trend, comparing it to an averaged trend of a large ensemble of climate models with similar multidecadal trends but interannual to decadal trends that don’t correspond with each other is as misguided as expecting that individual moving snapshots of the actual observations should have the same trend. Our Friends MMH did this explicitly when they formulated their null hypothesis that the average model trend should be equivalent to the trend in observations over a 30-year period and assumed that trends in both models and observations should be described by single straight lines over that period. This could have been simple naivety, or it could, given their record and the possibility of cherry-picking we noted above, be given the far more satisfying (to us deniers) interpretation of bullshit.
On the latter question, we’ll let our readers decide. Meantime, we’ll be writing to Atmospheric Science Letters informing them of our replication of the three amigos’ result and offering them the opportunity to be the first peer-reviewed journal to publish research that uses our new climatological innovation, the Weather Model. We’re sure they’ll jump at the chance; we’ll let you know.



