Keenan Confirmed!!! Met Position Laid To Utter Waste!!

Editor Note:  This post is written by Hank.  Anyone familiar with Hank’s style and demeanor knows he’s not given to bombastic statements.  However, if one reads this post, one can plainly and easily see how entirely inept our climate science community is.  The point isn’t so much that Keenan is correct, although in this point he most assuredly is.  The point is that the Met, and the other groups of climate science nutters have been entirely wrong …… for years!!  The stubborn inability of the cli-sci community to accept proper criticism, over decades, has rendered them useless for science advancement and have relegated them to nothing more than political advocacy.  This particular episode is a damning example of just that.  —– James “Suyts” Sexton

Guest post by Hank

The Battle of the Models – Keenan vs. Met

The latest contestants to enter the climate debate stage are mathematician Douglas J. Keenan and the UK’s Meteorological Office or Met for short. On November 8, 2012, Lord Donoughue, a member of the UK Parliament posed the question:

To ask Her Majesty’s Government … whether they consider a rise in global temperature of 0.8 degrees Celsius since 1880 to be significant. [HL3050].

The Met Office answered, “the temperature rise since about 1880 is statistically significant,” meaning that the temperature rise since 1880 is attributable to human caused global warming.

Keenan challenged the Met Office’s reply to Parliament on the basis that the model chosen by Met’s leading statistician, Dr. Julia Slingo, was not suited to answer the question. Keenan offered up his own choice of model that he claims was much better suited. As you might imagine, a series back and forth opinion editorials and commentaries ensued which sparked a rather heated debate.


Douglas Keenan – Editor of

Much has already been written about the tone of the debate, Met’s failure to provide specifics on their claims of statistical significance, and of Keenan’s brazen incitements of Met’s Chief Executive Officer, John Hirst and leading scientist Dr. Slingo. It is the ruckus between Keenan and the Met that has captured most people’s interest and is the basis of most blog articles on the matter.

Being a numbers guy, I’m less interested in the tone of the debate and more interested in the models that the Met and Keenan espouse. I’ll lay out the strengths and weaknesses of the two models and offer a closing assessment of which model is more suited to determining the statistical significance of the temperature rise of 0.8 degrees C since 1880.

Statistical significance is a somewhat nebulous term when it comes to non-stationary time series data such as the temperature record. Most measurements of processes in nature will show statistically significant results if the time series is long enough simply because Mother Nature does many cyclical or curvilinear things in big ways. The following graphic illustrates this.


The Pacific Decadal Oscillation is a natural cycle with significant trends

In medical research, statistical significance is usually determined by comparing a control group to an experimental group. However, in climate research there is no control group, making the testing for statistical significance a more mathematically tangled affair, which sparks much disagreement among scientists on how to best do it.

The central debate between Keenan and the Met Office revolves around which model provides a better “goodness of fit” (GOF) to the temperature record and has the best predictive skills from which to infer statistical significance. Rather than focus on the question of statistical significance in post-industrial temperatures, which I’ll leave to Met and Keenan to debate, this article takes an objective look at the two models they espouse and why one model is better suited to determine statistical significance.

The Met uses the AR(1) model. They claim their model shows statistical significance in current temperature trends, meaning recent temperature trends are being caused by something other than the natural variances we’ve seen in the temperature swings. Keenan’s charge is that AR(1) is not robust enough to make such determination. He espouses the ARIMA(3,1,0) model which he claims is more robust. He claims his model shows today’s temperature trends are not statistically significant and more likely caused by natural variability.

Decomposing The Models

Both models begin with the letters AR, meaning that both models employ Auto-Regression, a stochastic process that weights the sum of previous measurements in the time series and white noise error to produce a predictive output for the next measurement. In the case of Met’s AR(1) model, the “1” means it is a “first-order autoregression” model. In practical terms, first order means the temperature from January of last year has a direct impact on January’s temperature for this year but the temperature from January two years ago or earlier has no direct relationship to this year.

The use of the AR(1) model is premised on the assumption that changes in temperatures have only first order relationships as described above. That’s a rather bold and unjustified assumption in my opinion. There are known natural cycles and events, lasting from a few years to many decades, which affect the earth’s temperature. The notion that temperatures more than one year back can have some effect on today’s temperature seems intuitive to me. This is one of Keenan’s criticisms of the AR(1) model – a criticism that I agree with.

The ARIMA(3,1,0) model proposed by Keenan is parameterized to use third-order autoregression as indicated by the value three (3). Keenan’s model assumes the three previous temperature measurements and the white noise term contribute to the current temperature measurement. If the sum of the three previous temperature measurements is positive, the output will resemble a low pass filter with the high frequency part of the white noise decreased in amplitude. We’ll see that low pass filtering effect when we start looking at comparative graphs of AR(1) vs. ARIMA(3,1,0) model output.

There are three parts to the model: AR (AutoRegression), which has been discussed above. The “I” (Integrative) term indicates that the model applies an integrative function to change the data from non-stationary to stationary structure to make it more suitable for autoregression. Keenan’s model applies one (1) order for differencing. The MA (Moving Average) term indicates the model applies a moving average to its output. Because Keenan’s value is zero (0), no moving average is applied to the model’s output.

Keenan’s model is flexible and more adaptive but considerably more complex. This is the chief criticism of Met regarding Keenan’s choice of model. I’ll be exploring that criticism shortly.


The Battle of the Met’s AR(1) and Keenan’s ARIMA(3,1,0) Models

To determine “the better model,” there are a number of objective measures we can look at to assess the model’s parameterization, goodness of fit to observed temperatures, and predictive skills. In Keenan’s supplementary material, he explains that he tested the Met’s AR(1) and his preferred ARIMA(3,1,0) models against the GISS temperature series. In Met’s response to Parliament of Keenan’s criticisms, they used the HadCRUT4 temperature series.

I personally don’t think which temperature series is used to test the models matters. If it does, then we have other systematic issues to deal with that go beyond the choice of models. For the purpose of this article, I’ll use the latest GISS GHCNv3 temperature series to compare the two models. All of my statistical analysis is done using the statistical package R with the “Forecast” library – a package designed for AR and ARIMA modeling.

Let the Analysis Begin!

Test for Statistical Significance – I performed appropriate tests to determine if the results I was getting were, themselves, statistically significant and could be relied upon. Again, I’m not testing the statistical significance of the temperature series but rather I’m testing the reliability of the tests I’m performing on the models. This is standard practice in mathematics. In most cases an ANOVA test was performed. In all cases, I tested at the 95% Confidence Interval (alpha = 0.05, p < 0.05). The tests I performed on both models were as follows:

Sigma^2 – When we compare the fit of the models to the data, we look for the smaller standard deviation of the residual (error) of the model. The residual represents the portion of the data the model can’t explain. By “can’t explain,” I mean the model fails to account for. The lower the residual, the better the model fits to the data and explains it. The standard deviation of the residual is called the Sigma^2.

Model Sigma^2 Value Winner
Met’s AR(1) 102.8 clip_image007
Keenan’s ARIMA(3,1,0) 84.6 clip_image008

Akaike Information Criterion test with correction (AICc) – AICc is a method of testing a given model from a set of models. The best model is one that minimizes the Kullback-Leibler distance between the model and the data. Kullback-Leibler distance measures the difference between the probability distributions of the model as compared to the actual temperature series distribution. The model that has the best GOF (or least divergence between the two distributions) is the better model.

Model AICc Value Winner
Met’s AR(1) 1002.06 clip_image007[1]
Keenan’s ARIMA(3,1,0) 969.15 clip_image008[1]

Bayesian Information Criterion (BIC) – The chief criticism Met levies against Keenan’s choice of model is the greater number of parameters used in his chosen model makes it unnecessarily complicated and prone to error. The BIC test is similar to the AICc test but penalizes the model for over-parameterization (being unnecessarily complicated for the data under analysis). It is a better test when comparing models with differing orders of parameterization as in our testing.

Model BIC Value Winner
Met’s AR(1) 1010.54 clip_image007[2]
Keenan’s ARIMA(3,1,0) 980.36 clip_image008[2]

Coefficient of Determination (R2) and Residual (1-R2) – The R2 measure determines what percentage of the variance in the trend produced by the model output can be explained by the model. It is a measure of how well the model is designed and parameterized to account for the variance and trend in the data. The residual of the coefficient, 1-R2, tells us what percentage of the variance in the model trend is due to residual (error) and is not accounted for by the model. The higher the R^2 and the lower the 1-R2, the better the model is designed and parameterized.

Model R2 1-R2 Winner
Met’s AR(1) 87.68% 12.32% clip_image007[3]
Keenan’s ARIMA(3,1,0) 90.10% 9.9% clip_image008[3]

F Ratio – The F ratio is part of the ANOVA test of the model’s performance. It tells us the ratio of the variability of the trend explained by the model to the unexplained variability of the trend, each divided by the degrees of freedom (df) of autoregression used by the model. The higher the F ratio, the more useful the model is to analyze the temperature data.

Model F Ratio Winner
Met’s AR(1) 255.980 clip_image007[4]
Keenan’s ARIMA(3,1,0) 383.561 clip_image008[4]

Lag Correlations – The lag correlations tell us how well the parameterizations of the models perform their autocorrelation to the time series data. There are two objective measures of lag correlation: ACF (residual AutoCorrelation Function) and PACF (residual Partial AutoCorrelation Function). Collectively, these two measures determine how well the parameters to be used for the “AR”, “I”, and “MA” functions perform for the respective models. They can also be used to test for the best parameters to be used. ACF and PACF are best analyzed visually in graph form.


I could write a book about how to interpret ACF and PACF. So, to keep it simple we’ll look at one important aspect of the above graphs. Met’s AR(1) model is on the left side with ACF at the top and PACF at the bottom. Keenan’s ARIMA(3,1,0) model is on the right. Notice that each graph has dotted blue lines. This is referred to as the cutoff band. When a model’s parameterization is optimum, its lag values will remain inside the cutoff. A special note is in order here. In R, a Lag 0 line is drawn for the ACF graph that extends well outside of the cutoff lines. This is a well-known defect in the ACF graphics function so it can be ignored.

Note that Met’s AR(1) model makes excursions outside of the cutoff lines. Keenan’s ARIMA(3,1,0) model’s lag values remain inside the cutoff lines, indicating that Keenan’s model autocorrelates better to the time series data.

Diebold-Mariano Test – This tests the null hypothesis that the two models perform equally as well against the same time series data. The test compares the residuals (error) of each model. The results of the Diebold-Marino test are as follows:

DM = -2.4648

Forecast horizon = 1

Loss function power = 2

p-value = 0.01499

alternative hypothesis: two.sided

The Diebold-Marino test is statistically significant (p = 0.015), meaning we reject the null hypothesis that both models perform equally. The DM value is negative, indicating that Keenan’s ARIMA(3,1,0) model performed better than Met’s AR(1).

The Box-Ljung Test for Residual Dependency – The Box-Ljung test is a diagnostic tool used to test the null hypothesis that the model does not exhibit a lack of fit to the time series. The test is applied to the residuals of a time series after fitting an ARIMA(p, d, q) model to the data. The test examines autocorrelations of the residuals for dependency. If the autocorrelations are very small, we conclude that the model does not exhibit significant lack of fit.

This is somewhat inverse logic and a little bit difficult to follow. Ideally, you want the model to fail the Box-Ljung test (have a p-value > 0.05) supporting the null hypothesis that the observed fit in the model is due to the model’s predictive skills (I’m oversimplifying here but hopefully you mathy people get the point).

Model Box-Lung p-value Winner
Met’s AR(1) p = 0.006451 clip_image007[5]
Keenan’s ARIMA(3,1,0) p = 0.8287 clip_image008[5]

The Proof Is In the Tasting of the Pudding

So far, I’ve thrown out a bunch of model tests and made my best attempt to explain what they mean. If you’ve reached this place in the article, congratulations, you managed to stay awake, LOL! For those who found the above boring, stick around. It’s going to get more fun from here.

The proof is in the tasting of the pudding. Both the Met’s AR(1) and Keenan’s ARIMA(3,1,0) models are time series predictive models, meaning they are mathematical crystal balls that can tell the future, given some hint of the past.

This is what the models render when we plug all of the temperature data into them:


The actual GISS GHCNv3 temperature trend is the black line in the graphic. Met’s AR(1) model output is in red. Keenan’s ARIMA(3,1,0) model output is in green. If you study this graph closely, you might notice something odd towards the current timeline in the graph. Here, let me show you:


One of the criticisms of the AR(1) model is it’s ability to follow a non-stationary trend falls off as the time series gets longer. Above, you can see that up until 1960, the AR(1) model tracks the actual temperature trend fairly well. However, as we approach the year 1990 and forward, it starts to drift negative in relation to the actual trend. It is a “drifting” model whereas Keenan’s ARIMA(3,1,0) is driftless.

Earlier in this article, I mentioned that the ARIMA(3,1,0) model uses third order auto-correlation. In mostly positive or mostly negative trends it tends to act somewhat as a low pass filter, meaning it favors the lower frequencies of the trend and tends to be less reactive to the high frequency variability components of the trend. You can see that the green line of the ARIMA(3,1,0) model is less variable than the AR(1) line. The key performance indicator of these models is how well they predict the trend and not so much how well they account for high frequency components in the trend.

Now, we’re going to see how well the two models serve as crystal balls. In this next test, I removed the years 2002 through 2012 from the GISS GHCNv3 temperature series. I intentionally selected 2002 as my truncation date because the years from 1997 through the end of 2001 provide a “hint” that the rate of global warming was beginning to stagnate. I wanted both models to have equal opportunity to get the hint.

The following two graphics overlay the model’s predictions over the actual temperature trend which includes the 2002 through 2012 actual temperature anomalies.

First up… Keenan’s ARIMA(3,1,0) model prediction for the years 2002 through 2012:


And now, Met’s AR(1) model’s prediction of temperatures from 2002 to 2012:


In both graphics, the dark blue lines are the model’s temperature prediction. The light blue bands around the prediction lines are the 95% Confidence Interval. The still wider gray bands are the 90% confidence Interval.

It is obvious that Keenan’s ARIMA(3,1,0) model took the hint I provided and modeled the temperatures from 2002 to 2012 with an accuracy of close to 0.05 deg. C. Met’s AR(1) model completely missed the hint and assumed that temperatures would follow the momentary down trend that was occurring at the truncation of the temperature series.

An interesting surprise to me was that Keenan’s ARIMA(3,1,0) model actually predicted the two peaks and one trough in temperature trends during the prediction period, albeit somewhat muted as expected. You can see the two small peaks and single trough in the dark blue line in the ARIMA(3,1,0) model prediction. Well done!

Is There a Better Model?

A model can always be improved upon. We hear that from climatologists all the time, right? Well, actually it’s true. Referring back to my discussion of ACF and PACF, I played around with the ARIMA model class and found a set of parameters that performed slightly better than Keenan’s ARIMA(3,1,0) model. My model was the ARMIA(0,1,1).

When I say slightly better, I mean my ARIMA(0,1,1) model failed several tests for significance in performance against Keenan’s ARIMA(3,1,0) model, meaning that while some of my key performance metrics were ever so slightly better, they weren’t better enough to be of any interest outside of the fun I had playing with it. Here’s a graph of my model compared to Met’s and Keenan’s.


The black line is the actual temperature trend. Red is Met’s AR(1) model. Purple is Keenan’s ARIMA(3,1,0) model, and green is my ARIMA(0,1,1) model. As you can see, my model tracked Keenan’s almost identically and was not significantly different in performance. I ran the 2001 to 2012 predictive test on my model and, while I got a linear trend similar to Keenan’s, mine failed to predict the peaks and troughs because it lacked autoregression to learn patterns from the past.

Which Model Won?

Forgetting about my uninteresting model we’re going to come full circle in the central debate – the question of which model is the better for determining the statistical significance in answer to Parliament’s question?

Rather than putting my dog in the fight, I’ll let Met answer the question. The Met opens their response to Her Majesty’s Government with the following:

This briefing paper has been produced to provide background information relating to analyses undertaken in response to a series of Parliamentary Questions on the use of statistical models to assess the global temperature record, and to address misleading ideas currently appearing in various online discussions.

The emboldened text above obviously refers to Keenan (he who shall not be named). But in a breathtakingly contradictory statement, Met concludes Keenan’s critique of their choice of the AR(1) model holds merit:

The results show that the linear trend model with first-order autoregressive noise [Met’s model] is less likely to emulate the global surface temperature timeseries than the driftless third-order autoregressive integrated model [Keenan’s model]. … This provides some evidence against the use of a linear trend model with first-order autoregressive noise [Met’s model] for the purpose of emulating the statistical properties of instrumental records of global average temperatures, as would be expected from physical understanding of the climate system.

Oops! Talk about an awkward moment. Then in a face saving move, the Met closes with this amazing statement:

These results have no bearing on our understanding of the climate system or of its response to human influences such as greenhouse gas emissions and so the Met Office does not base its assessment of climate change over the instrumental record on the use of these statistical models.

Translation: “We don’t need no stinking statistical models to determine statistical significance. We can determine it by eyeballing the data and you’ll just have to take our word at it.”

What Met has done was to state to Parliament that the rise in global temperature of 0.8 degrees Celsius since 1880 is statistically significant but Keenan is right in his criticism that the model they use is useless in making that determination. But no worries, they don’t need statistical models to determine statistical significance.

Do you have any idea how incredibly stupid that sounds to a statistician? Statistical significance can only be determined by using statistical tests. That’s why the term “statistically” is in front of “significant.” That was the whole point of Parliament’s question.

I’m off to the grocery store to buy more popcorn. The ongoing debate between Keenan and the Met Office isn’t over yet. It will be interesting to watch how it unfolds.

This entry was posted in Climate. Bookmark the permalink.

105 Responses to Keenan Confirmed!!! Met Position Laid To Utter Waste!!

  1. kim2ooo says:

    Reblogged this on Climate Ponderings and commented:
    Thank you Mr Hank! 🙂
    Ya gotta love Suyts Space.

  2. DirkH says:

    “These results have no bearing on our understanding of the climate system”

    Why am I not surprised.
    “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”
    – Upton Sinclair

    Coming from signal processing, I would say that AR(1) can be represented by a one pole digital filter, while ARIMA(3,1,0) can be represented by a three pole filter. This has dramatic implications; a one pole low pass filter has a steepness of 6 dB/octave while a three pole filter has up to 18 db/octave. Meaning it cuts off high frequency noise much sharper from the cutoff frequency upwards, the higher the frequency the more it is dampened, this dampening increases three times as fast with frequency for the 18 db/octave filter.

    And if that gives the better fit, so be it. It’s not like a 3 pole filter is terribly complex. It’s still far simpler than any GCM, and has far less parameters than a GCM that you could tune to overfit a curve.

    • suyts says:

      Yes, apparently, they understand our climate system sans appropriate metrics.

    • DirkH says:

      One would have to build a PLL-style or more generally Kalman filter like architecture around the simple lowpass filters I mentioned to arrive at computational equivalents of the predictive models. Just for completeness.

      • kim2ooo says:

        Hey! I know what a PLL is and how it works.

        • kim2ooo says:

          My Brother taught me – Amateur Radio

        • DirkH says:

          A PLL is a special example of a Kalman filter.

        • kim2ooo says:

          My Brother taught me how to “slip” a Phase Lock Loop [ PLL ] on a Yeasu to gain locked 27 Mgz bands

        • kim2ooo says:

          Thanks for the UTUBE

        • HankH says:

          Kim, were you playing around in the upper portion of the 27 Mhz band? I’ll bet you were running single sideband 😉

        • kim2ooo says:

          Lower side band 🙂

        • HankH says:

          My rig was a Drake TR-4C with one of those Astatic D104 “chicken choker” mics. Neutralize the finals and change the mic battery every so often and the rig was good to CQ DX…

          And yes, after PLL went digital, I confess to building one or two binary adders with frequency readout to replace the selection dial on the radio that bypassed all the good frequencies. That and extending the range of the analog frequency offset to get away from those pesky upper sidebanders.

        • kim2ooo says:

          My Brother has / had a Silver Eagle D104.

          Transceiver was 10 watts out into 1500 amp… with moonraker dual phased directional antenna…directed by a ham4 rotor. Shur mic – swr meters and freq counter. 🙂


        • HankH says:

          A SSB note you might find interesting. I got a waiver from the FCC to run an experimental ACSB (amplitude companding side band) test for business mobile communications where a sub audio frequency was inserted, allowing the receiver to auto-tune then notch filter it out. It got nearly double the range as FM but was too sideband sounding to have commercial appeal. Thus, closed a very brief chapter in radio history.

    • HankH says:

      You’re right, there is some high pass filtering taking place, more so in the ARIMA(31,0) model. The filtering occurs only when all of the lag measurements are either positive or negative. If you have, say one positive measurement followed by a negative measurement, then the model does not filter. So, think of it as sort of a selective filtering process.

    • HankH says:

      I should mention also that the filtering doesn’t take place as an input function of the model nor at the integrative term. It occurs as a net result at the output of the autocorrelation term.

  3. Stephen Richards says:

    I just love that analysis but why oh why have the crimatologists not done it. The beauty of maths is that it lends itself to rigid and definitive testing. Yes you have to make assumptions here and there to find an end point but hell it’s the best language with which to prove your point within reasonable bounds.

    • suyts says:

      Indeed. Welcome Stephen. But, yes, why haven’t they, especially when many voices were screaming that they were doing it wrong? I’d say they don’t care about facts. They don’t care about the numbers. They’re only interested in advocacy.

    • kim2ooo says:

      Because they practice arbitrary – inference.

    • HankH says:

      Yes, that is the beauty of math. It allows for assumptions while expressing concepts in rigid terms with bounds.

      Interestingly, Keenan wasn’t the first to criticize the AR(1) model. He was the first, to my knowledge, to turn it into a public debate and offer up his own model that, in my opinion, is superior.

      Dr. Gavin Schmidt, a chief proponent of AGW, criticized the AR(1) model:

      The results in S07, both for annual and monthly data sets, show a distinctly increasing ratio as a function of lag; the estimated time scale is, for low lag, roughly proportional to the lag. This casts severe doubt on the applicability of equation (1), which is the basis for the estimate of the time constant. The monthly data result (Figure 7 in S07) is in clear contradiction to the AR(1) model.

      He and others critical of the AR(1) model were ignored by the IPCC, leading other organizations to follow the IPCC lead and use it anyway.

  4. tckev says:

    Excellent work Hank, and well explained. It’s been a while since I touched such things.
    From what I read of Keenan’s work he has the upper hand in the math and you have shown it. The Met Office has been hiding behind Parliamentary obfuscation and wordy distractions.
    As you say more popcorn to be ordered as this story has legs.

    • HankH says:

      Thanks tckev. Keenan does indeed have the upper hand. I find Met’s response to Keenan’s criticisms somewhat breathtaking in how they weasel around giving direct answers.

  5. cdquarles says:

    Fascinating. Nevertheless, even given IR active gases in the atmosphere, I find it puzzling that people want to discount the ‘weather’ system, which we see every day. Contrast that with the putative ‘climate’ system, which can’t or hasn’t been shown to really exist apart from the weather.

    • HankH says:

      I find it puzzling that people want to discount the ‘weather’ system, which we see every day.

      That’s because they would rather believe their computer modes (virtual reality) than to look out the window (actual reality). The two worlds are very different.

  6. Martin A says:

    “In the case of Met’s AR(1) model, the “1” means it is a “first-order autoregression” model. In practical terms, first order means the temperature from January of last year has a direct impact on January’s temperature for this year but the temperature from January two years ago or earlier has no direct relationship to this year.”

    Like another commenter, I also come from signal processing and agree that a first-order autoregression model is equivalent to a one-pole digital filter. Therefore I think the above could usefully be reworded to clarify, as a one-pole filter’s current output is affected directly by *all* of the previous input values, but with a geometrically diminishing factor, since it only “remembers” its previous output value.

    • HankH says:

      Thanks for your comment Martin. While you’re right in terms of resultant behavior of AR(1), the filtering behavior becomes much more complex and adaptive once you go beyond first order. That, and what filtering occurs is a resultant output rather than an actual term in the models. As such, I resisted the temptation to describe it as unit pole filter.

      The original draft of the article was far more technical, making the article too lengthy. This final version is about 2/3 the length of my first draft. To cut it down to size, I had to choose in many cases between highly technical discussion in preference to a more abbreviated pragmatic presentation.

      Being a research scientist, it is sometimes difficult for me to reduce intricacies down to simple statements. There are times I could do better as is the case you point out.

    • DirkH says:

      That depends on whether it’s an IIR or a FIR filter. From the definition of ARIMA
      “The ARIMA model can be viewed as a “cascade” of two models. The first is non-stationary:”

      Y_t = ( 1-L)^d X_t (see wikipedia for less mutilated formula)

      where X_t is the time series and Y_t the output of this first model one sees that this “moving average” part of the model is nonrecursive ( Y(t-1 etc is not used as input to compute Y(t) ).

      So it’s a FIR filter.

      • DirkH says:

        ..hmm, but what they call autoregressive forecasting introduces a recursion into the future relying on previously forecast values. Ok; that’s what would be equivalent to the loop of a Kalman filter. We get a recursive behaviour but with a FIR filter as part of its core. As Hank says, it gets a little complicated.

      • Martin A says:

        AR = IIR
        MA = FIR

      • HankH says:

        Keep in mind MA doesn’t play a role in either of the compared models due to the zero parameter for the MA term.

        • cdquarles says:

          The key that I took from this is the ‘I’, or integral term. It is also nearly tautological that this January’s overall weather will be similar to last January’s, given orbital mechanics and the fact that the main power source’s net received power is determined by said orbital mechanics plus the realized local weather. There will be some variation around the mean because of ‘random’ weather; but the bounds are fairly fixed (that is, for a specified location, certain realized weather states are possible and some are not).

        • HankH says:

          True, the primary bounds are fairly fixed out to last January and several Januaries before. Then superimposed on the fixed bounds are weather variability and longer term cycles like ENSO, PDO and other “rhythms” of multi-annual and multi-decadal periodicity, which serve to make the temperature series non-stationary in terms of a mean.

          When a time series is non-stationary then its autocorrelations will be positive out to several lags. In such case the model needs to be differenced using the integrative (“I”) term. The key here is you want to keep autocorrelation in forecast errors as low as possible. Adding the “I” term, typically using one order of differencing for most time series, drives the autocorrelation more negative, which is what we want if it is positive.

          The integrative term uses no independent variable and thus, has no knowledge of last January’s value. It is concerned only with the difference in Y (the dependent variable) between lags and seeks to keep Y centered around a stationary mean. In doing so, it minimizes autocorrelation residual.

          What is interesting to note is if you over difference “I”, you will drive the model into oscillation where each successive value in the ACF plot will alternate between a positive and negative value. This give some insight into how the “I” term works.

        • DirkH says:

          What Martin wanted to say is that a simple MA without an autoregressive forecasting (AR) around it would be a FIR filter. Anything AR is comparable with an IIR insofar as a loopback with lags is involved.

        • HankH says:

          Agreed. I was brain locked on the models under test.

          My experience with digital signal processing is relatively limited. Most of my practical signal experience has been in wave propagation theory (resonant cavities, waveguides, antennas and such) where most of the “processing” is in tuned resonance rather than active filtering.

  7. Pingback: These items caught my eye – 9 June 2013 | grumpydenier

  8. copernicus34 says:

    excellent Hank, just top notch. So so hypocritical for the “scientific’ community to harp on the skeptics about the practice of bad science and here they won’t consider the math when it doesn’t suit their worldview. Science is the total package; you don’t get to see certain physics truths and come to conclusions without considerations from other fields of study. Trying to explain this to people who claim to latch on to “science” as their answer to everything is indicative of the lack of pentium class ‘brainology’ of the political consensus crowd. After all, admitting that statistics and statistical methods do in fact have a play in this would pretty much confirm that McIntyre was right all along regarding his destruction of Mikey Mann’s hockey stick; and as we know—they can’t have that.

    Bravo Hank

    • HankH says:

      Thank you copernicus. One of the chief taunts, if that’s what I can call it, is that skeptics are ignorant of science. Several studies have shown that skeptics are far less ignorant of the science than our alarmist counterparts. Many of the people who visit and comment here at Suyts Space are engineers, leading journal published researchers, and mathematicians, and such. The host, James, and readership here, like yourself, is very literate in science and the points of debate.

      It is because we understand science and raise questions about methods and how stats are being applied that we are skeptical of the outlandish claims.

    • suyts says:

      Indeed, Bravo Hank!!!

  9. Pingback: Bothering the Met Office over stats Suyts style | Tallbloke's Talkshop

  10. michael hart says:

    The question to parliament is described as being “0.8 degrees Celsius”.
    The paragraph that begins with “Being a numbers guy…” refers to “0.08 degrees”. Is this a typo?

  11. DaveG says:

    Great work Hank, it’s a keeper and a great reference piece.

  12. Walter Royal says:

    I have just spent several hours reading and re-reading this and with the comments I think I have a grasp of what is happening. My question is this. Has anyone actually used ARIMA to predict the next few years temps so the predictions can be compared to reality as it happens or did I miss something and that is something that can’t be done?

    • HankH says:

      Hello Walter. The test I did using the truncated time series was, in a sense, turning back the clock to 2002, letting the models predict, then come back to 2012 to see if they got it right.

      ARIMA is most useful in short term forecasts and is very commonly used in economic modeling for that purpose. It is not a good model for forecasting much beyond maybe ten or so data points beyond the time series. How far forward it’s forecasting skill is useful is very much dependent upon the variability of the time series. The more variable, the less useful it is in longer forecasting.

      • suyts says:

        LOL, sorry Hank, I started writing a response and got distracted before posted “reply”.

        • HankH says:

          No worries, James. I appreciate your stepping in especially when the popcorn machine says it’s ready and I need to go fill my bowl. 😆

        • suyts says:

          You know, Hank, you should be an instructor somewhere. The clarity you brought to this issue is something very few could have done. And, no one else did.

          I don’t know how many people this will touch, but, it’s something which most people should see.

      • Walter Royal says:

        Thank you Hank. I understand what you did with that and I was impressed to say the least. My question was prompted by the desire to influence the less educated with ongoing proof. Many would rationalize your results as a parlor trick and would leave it at that and refuse to look further. But, if a certain trend was predicted and that prediction was borne out then that would be a totally different matter. I was trying to find out if that type of thing was possible.

        • HankH says:

          Lets assume for discussion purposes I designed a model that predicted something ten years into the future. Then we watched as time progressed. And lets assume the model got it right. Was the model righty by chance or due to its predictive skill? How would we know?

          In economics, in particular, people have lost their shirts betting on a model that got it right once or twice then failed miserably on the next tick. I’m not aware of any models that demonstrate good predictive skills with natural or stochastic processes. Even the GCM model predictions of a decade ago are turning out to be a disappointment.

          Alas, I have little to no confidence that a model with good predictive skill will be built in my lifetime and possibly never.

    • suyts says:

      Walter, it’s an excellent question. Although, it isn’t the point of the post. Please allow me to digress.

      The question is whether or not our temperature trends are unusual or not. While this post doesn’t answer the question, it tells us that the Met, and the IPCC, and GISS, do not answer the question, either. And yet, the nutters swear by the interpretations of the organizations.

      I haven’t asked Hank about whether or not this model is a good predictive model or not, so, I can’t speak for him. But, I maintain, there is no model which can accurately predict our temps. Our system is simply too complex to do so.

      We must remember this is simply an evaluation of the numbers, not what is physically happening. But, that’s the point. While the Met said they don’t need statistics to evaluate whether or not the changes are statistically significant or not, which is an absurdity all by itself, it implies that their knowledge is sufficient to understand what is happening without the application of math. Which, is beyond absurd. It is simply impossible to do so. It’s impossible with math. Without math, it’s simply advocacy. Scientists, they are, killing math one process at a time.

      I guess what I’m saying is, we don’t know what happened in the past, there’s no way to try and understand what is going to happen in the future based upon what we know today. It isn’t denialism, it is math.

      • Latitude says:

        we don’t know what happened in the past


      • Walter Royal says:

        Thanks James. I see what you’re saying. I had hoped that this might be a tool to further take the AGW construct apart. In a way I suppose it is but just not the one I had hoped for.

      • suyts says:

        @ Walter, it is a great tool to deconstruct the nutters. But, to prove we know how the climate is, requires things we’ve always stated things are impossible to know.

        This has always been a problem…… a fallacious argument of the nutters. They provided an explanation for things which required no explanation. But, they put the impetus on skeptics to provide an explanation for the irrelevant changes in their records from which they draw significance, albeit, as we can see, not statistical significance.

        It’s too chaotic. The things we can show, is that we can’t know. And, more importantly, is that the nutters can’t know. Their certitude is a product of a mental illness, not math, not science, but, advocacy.

  13. Latitude says:


    I just got in….so I had to speed read….
    Gonna print it out and read it in bed tonight


  14. copernicus34 says:

    Hank for Suyts Space 2013 MVP

  15. Latitude says:

    James…Hank…..this would be an excellent tie in article to the top one at WUWT right now…

  16. suyts says:

    Weird….. views are up, but, the nutters aren’t commenting.

    • HankH says:

      Maybe it was something I said????

    • suyts says:

      LOL! Perhaps. I think they’re begging people who have a rudimentary knowledge of math to rebut. …… we know most nutters haven’t the ability to grasp the concept of numbers much less advanced mathematics.

      It won’t stop them. so, the Met and IPCC don’t know how to properly model their stupidity…. doesn’t mean they’re wrong, you science denier!!!!

  17. Jim Masterson says:

    Interesting post Hank–well done!

    Unfortunately, I’m not a statistician, so some parts of your post go over my head.

    The central debate between Keenan and the Met Office revolves around which model provides a better “goodness of fit” (GOF) to the temperature record and has the best predictive skills from which to infer statistical significance.

    The problem is the surface temperature record isn’t valid (you can’t average temperatures) and has been manipilated to show a spurious temperature rise.


    • HankH says:

      That’s certainly a big part of it, isn’t it? That’s one of the several reasons I didn’t put my dog in this fight by speaking to whether I found statistical significance in the temperature series.

      • Jim Masterson says:

        Along the same line with the manipulated temp record:

        The Met Office answered, “the temperature rise since about 1880 is statistically significant,” meaning that the temperature rise since 1880 is attributable to human caused global warming.

        The temperature rise may be attributable to humans, but it isn’t due to GHG emissions.


        • HankH says:

          True and there’s loss of statistical power from the 1800’s when there were fewer thermometers and unstandardized methods for measurement, UHI, changes in paint composition for Stevenson screens, improvement in calibration and accuracy of modern thermometers (if you can believe it as they adjust todays temperatures as much as 50 to 100 years ago) and other human caused systematic errors.

  18. Ivan says:

    I know I’m not the first one to say so, but anyway..
    Thank you very much for the post, Hank.
    It was one of your post, that first brought me to Suyts Space and you have never failed to entertain.
    Having a degree in math, I may hold a slight advantage over most people in understanding your posts, but this advantage is largely offset by English, which is not my first language (as you might have noticed by now). Still, I can say, your posts are always very interesting in the technicalities, yet so easy to understand, that I believe anyone who has the will to delve into it a bit, will understand your arguments. The posts are accurate, entertaining and you are always very careful in presenting your conclusions.
    I certainly hope you can make a lot more guest posting at Suyts’.

    • HankH says:

      Hello Ivan. Your english is excellent and thank you for your complement.

      I think it is fair to say there will be future articles that James will generously allow me to share with readers here, most likely in areas of mathematics, medicine, and technology, which are my areas of interest and published work.

      I personally know that many of the regulars who comment here at Suyts Space are mathematicians like yourself, engineers, researchers, published scientists, or trained in other technical, economic, and political disciplines. Then there’s folks with the wisdom of the sages who contribute here too. Being weak in some of those areas, I learn a lot here. Thanks for being a part of the community and commenting.

  19. michael hart says:

    Thanks Hank (and Suyts). I think it’s a helpful contribution, and highly readable for a post addressing statistics.

    One question: The AR1 drift. Is this referring to amplitude, because the phases don’t look as if they are changing much (to my eye)?

    • HankH says:

      To answer your question requires a very technical discussion. I’ll try to simplify at the possible expense of being incomplete in describing the terms.

      A characteristic of an AR(1) model is that when there is non-zero value of a temperature measurement along successive points in the time series the model will tend to dampen with an averaged damping rate of a1 per time step. a1 is the region of the time series where the polynomial (the changing temperature curve, if you will), is seen by the model to have a pair of conjugate roots and therefore one solution. When the polynomial of the time series departs from this a1 region, it takes on two real solutions. Given that AR(1) is limited by its 1 degree of freedom due to its parameterization of 1, it’s accumulative dampening becomes more evident in these regions.

      Here is a link to a paper where you can see similar behavior.

      Go to figure 5.3 (next to the last page). The orange line is the actual data. The black line is the AR(1) model output. The authors make no effort to call out this behavior but it can be plainly seen.

      • HankH says:

        That was the long answer. The short answer is your eyes are seeing it right. The phases aren’t changing much, nor is the variability, just the tracking is off because the AR(1) model starts to show its shortcomings in that region of the data, further evidence that it is unsuited for use in a temperature time series.

  20. philjourdan says:

    While a .8 degree rise in temperature is statistically significant, the problem still is the cause is unknown. While I understand the differences in the models, and the accuracies, I still have a problem with the statement:

    To ask Her Majesty’s Government … whether they consider a rise in global temperature of 0.8 degrees Celsius since 1880 to be significant. [HL3050].

    The Met Office answered, “the temperature rise since about 1880 is statistically significant,” meaning that the temperature rise since 1880 is attributable to human caused global warming.

    I cannot buy that. I have seen lots of evidence that warming is occurring. I have yet to see any evidence of the cause.

    • HankH says:

      You’ve well stated most skeptic’s positions. There is plenty of evidence to the warming, however, not so much lately. What we are skeptical of is the stated climate sensitivity that keeps getting down-rated with every new study, proving in hindsight that our skepticism was reasonable and warranted. I personally am open minded on the matter but side with you. I need to see evidence of the cause. Unvalidated and now failing models do little to impress me.

    • cdquarles says:

      Phil, I think a better statement is that there is some (the lots is a bit debatable) evidence that it has warmed, particularly in the bigger cities. At least for my neck of the woods, the ‘warming’ stopped in the 30s. There has been a slight ‘cooling’ here. Comparing what my grandparents said about the weather here to what I have personally seen, I’d say that the weather is ‘better’, being reasonably moist and mild. When the Rossby wave patterns are steep and the upper air (tropospheric, mostly) cold, we get more storms, with some of them becoming severe. The big tornado outbreaks in my lifetime are 74 and 2011, with lesser outbreaks in most years. There are years with no deaths, but I can’t think of a year that didn’t have at least one EF3 somewhere in the state.

  21. Pingback: LOL, Does NY Times Gillis Read Suyts? | suyts space

  22. Lord Bernard Donoughue says:

    Thank you Hank. As the Labour Peer who (with great statistical help from Doug Keenan) placed the 6 Parliamentary Questions, 5 of which the Met Office crudely tried to avoid , I found your post superb. Cannot say I understood it all, but it was brilliantly clear. Pity you cannot get a serious response from the Met Office or from the Department. I will persist with Questions which try to expose the shaky foundations to this hugely expensive programme which is affecting the world’s energy and climate policies.
    Lord Bernard Donoughue

    • suyts says:

      Lord Donoughue,

      Thank you very much for dropping by. I apologize for the wait. I get busy at times and forget to check the bin for pending comments. As a note, now that you are approved for commenting, you’ll no longer have to wait before the comment is posted.

      Please, please, please, keep us updated if there are further developments. We’ve a few Brit nationals who pop by from time to time. But, more than that, we’re all involved in the fight against such expense and assault on energy use.

      My best,

      James Sexton

      • philjourdan says:

        Got a big one! You are becoming more recognized!

        • DirkH says:

          Yep, as if labour wouldn’t happily erect a gazillion wind turbines if they currently were the paladins of the EU commission responsible for the British protectorate.

        • HankH says:

          I enjoy the the good folks that hang out here at suyts space. I learn a lot from others here on subjects I’m weak in. That’s plenty for me 😉

    • HankH says:

      Lord Donoughue,

      Likewise, my apology for a delayed reply. I just returned from business travel and finally had an opportunity to access the Internet.

      Thank you for your efforts in pressing this issue. I think it important that great claims should require great justification, which seemed somewhat lacking in the Met’s response. And, as evidenced in my testing of the models, Doug Keenan is clearly correct in his criticism of the Met’s (and IPCC’s) use of the AR(1) model.

      I will intently be following this issue. A very kind thank you for your kind words and comment.

      Kindest regards,

      Hank Hancock

  23. tckev says:

    I note with interest that the Met Office, with all their expensive technology, is unable to give reasonably good forecast for this weekend. Yet they insist that when using this same technology they can predict, with remarkable accuracy, decades into the future. Until they can give valid, accurate, and verifiable forecasts for at least 1 month ahead they should be called to task for wasting tax-payers money.

    • HankH says:

      I recall hearing someone say that whatever they predict, most folks in the UK plan for the opposite. 😀

      • Me says:

        😆 Only the ones that don’t fall for their bullshit, and don’t have anything invested in the green energy, because they are smart enough to know better.

      • tckev says:

        For instance a record breaking hot summer forecast last year (mostly wet), followed by no snow for winter (predicted around September last year). At the start of the year they forecast again a record breakingly hot summer – so far most the UK is barely into double figures.
        Yes, a totally random guess would have been more accurate. But that’s OK because the poor tax-paying saps in the UK are paying for this joke outfit.

        • Me says:

          I see the same here, Saps are being hornswaggled where there is money at hand to be swindeled.

        • DirkH says:

          If the state were responsible for the Sahara we’d have a shortage of sand in 5 years. – Milton Friedman

  24. Jim Masterson says:

    On the Number Watch Forum (, Brad Tittle posted this recently:

    W.M. Briggs doesn’t like statistical significance.


    • HankH says:

      I enjoy Brigg’s sometimes off-the-beaten-path writings. There are times where I agree with him in principal but would have personally stated it another way.

      Statistical significance is a statement of probability built upon some line drawn in the statistical sand. If you have a p-value of 0.49 (the statistic) with an alpha of 0.05 (the line drawn), your results are deemed to be on the ragged edge of being statistically significant. But what that really means is you have a nearly 5% probability of being wrong. There are many circumstances in life where a 5% probability of being wrong is too high of a risk to be taken.

      What many don’t understand is statistical significance “lends evidence” to a conclusion but is not proof of the conclusion

  25. Pingback: A Nice Thanks To Hank —- From Across The Pond | suyts space

  26. Pingback: Belcher Belches Blathering BS | suyts space

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s