Are energy budget TCR estimates biased low, as Richardson et al (2016) claim?

Originally a guest post on Jul 12, 2016 – 10:08 AM at Climate Audit

Introduction and Summary

In a recently published paper (REA16),[1] Mark Richardson et al. claim that recent observation-based energy budget estimates of the Earth’s transient climate response (TCR) are biased substantially low, with the true value some 24% higher. This claim is based purely on simulations by CMIP5 climate models. As I shall show, observational evidence points to any bias actually being small. Moreover, the related claims made by Kyle Armour, in an accompanying “news & views” opinion piece,[2] fall apart upon examination.

The main claim in REA16 is that, in models, surface air-temperature warming over 1861-2009 is 24% greater than would be recorded by HadCRUT4 because it preferentially samples slower-warming regions and water warms less than air. About 15 percentage points of this excess result from masking to HadCRUT4v4 geographical coverage. The remaining 9 percentage points are due to HadCRUT4 blending air and sea surface temperature (SST) data, and arise partly from water warming less than air over the open ocean and partly from changes in sea ice redistributing air and water measurements.

REA16 infer an observation-based best estimate for TCR from 1.66°C, 24% higher than the value of 1.34°C if based on HadCRUT4v4.. Since the scaling factor used is based purely on simulations by CMIP5 models, rather than on observations, the estimate is only valid if those simulations realistically reproduce the spatiotemporal pattern of actual warming for both SST and near-surface air temperature (tas), and changes in sea-ice cover. It is clear that they fail to do so. For instance, the models simulate fast warming, and retreating sea-ice, in the sparsely observed southern high latitudes. The available evidence indicates that, on the contrary, warming in this region has been slower than average, pointing to the bias due to sparse observations over it being in the opposite direction to that estimated from model simulations. Nor is there good observational evidence that air over the open ocean warms faster than SST. Therefore, the REA16 model-based bias figure cannot be regarded as realistic for observation-based TCR estimates.

It should also be noted that the 1.66°C TCR estimate ignores the fact that the method used overestimates canonical CMIP5 model TCRs (those per AR5 WG1 Table 9.5) by ~5% (Supplementary Information, page 4). Including this scaling factor along with the temperature measurement scaling factor reduces the estimate to 1.57°C (Supplementary Table 11).

Relevant details of and peculiarities in REA16

REA16 focus on energy-budget TCR estimates using the ratio of the changes in global temperature and in forcing, measuring both changes as the difference between the mean over an early baseline period and the mean over a recent final period. They refer to this variously as the difference method and as the Otto et al.[3] method; it was introduced over a decade earlier by Gregory et al.[4] and copied by both Otto et al. (2013) and Lewis and Curry (2015).[5] The primary baseline and final periods used by REA16 are 1861–80 and 2000–09, almost matching those used in the best-constrained Otto et al. estimate. Lewis and Curry, taking longer 1859–82 base and 1995-2011 final periods, obtained the same 1.33°C best estimate for TCR as Otto et al., using the same HadCRUT4v2 global temperature dataset.

REA16 estimate the TCR of each CMIP5 model by comparing its global warming with forcing estimated in the same way as in Otto et al., using model-specific data where available and multimodel mean forcing otherwise. The method is somewhat circular, since forcing for each model is calculated each year as the product of its estimated climate feedback parameter and its simulated global warming, adjusted by the change in its radiative imbalance (heat uptake). Each model’s climate feedback parameter is derived by regressing the model’s radiative imbalance response against its global temperature response over the 150 years following an abrupt quadrupling of CO₂ concentration.

In model historical simulations the weighted average period from when each forcing increment arose to 2000–09 is only ~30 years, not 150 years. Accordingly, the forcing estimation method relies upon a model exhibiting a fairly linear climate response, and hence having a climate feedback parameter (and an effective climate sensitivity) that does not vary with time (in addition to having a temperature response that is proportional to forcing). In this context, the statement in REA16 that they do not calculate equilibrium climate sensitivity (ECS) “to avoid the assumption of linear climate response” is peculiar: they have already made this assumption in deriving model forcings.

Although REA16 is based on simulations by all CMIP5 models for which relevant data are available, the weighting given to each model in determining the median estimates that are given varies over a range of ten to one. That is because, unlike for most IPCC model-based estimates, each available model-simulation – rather than each model – is given an equal weighting. Whilst only one simulation is included for most models, almost 60% of the simulations that determine the median estimates come from the 25% of models with four or more simulation runs.

REA16 do not appear to state the estimated median TCR applicable to the 84 historical-RCP8.5 CMIP5 simulations used. Dividing the primary periods tas-only difference method figure of 1.98°C per Supplementary Table 6 by 1.05 to allow for the stated overestimation by the difference method implies a median estimate for true TCR of 1.89°C. Back-calculating TCR from the difference method bias values in Supplementary Table 5 instead gives an estimate of 1.90°C. The figures are rather higher than the median TCR of 1.80°C that I calculate to apply to the subset of 68 simulations by models for which the canonical TCR is known.

There seem to be inconsistencies in REA16 between different estimates of the bias resulting from use of the difference method and blended air and SST temperature data. The top RH panel of Supplementary Figure 4 shows that the median TCR estimate when doing so, with 2000–09 as the final decade is ~2.00°C. This is a 5% overestimate of the apparent actual value of ~1.90°C rather than (as stated in Supplementary Table 5) an underestimate of 8%. Moreover, contrary to Supplementary Figure 4, Supplementary Table 6 gives a median TCR estimate in this case of 1.81°C, implying an underestimate of 4%, not 8%. Something appears to be wrong here.

REA16 also claim that energy budget TCR estimates are sensitive to analysis period(s), particularly when using a trend method. However, Supplementary Figure 4 shows that the chosen difference method provides stable estimation of model TCRs provided that the final decade has, like the 1861–80 base period, low volcanic forcing. That is, for decades ending in the late 2000s on. As discussed in some detail in LC15, sensitivity estimation using an energy budget difference method is sensitive to variations between the base and final periods in volcanic forcing, due to its very low apparent efficacy, so periods with matching volcanism should be used. The sensitivity, shown in Supplementary Table 6, of TCR estimation using the difference method to choice of base period when using a 2000–09 final period is explicable primarily by poor matching of volcanic forcing when base periods other than 1861–80 are used. Good matching of the mean state of the Atlantic Multidecadal Oscillation (AMO) between the base and final period is also necessary for reliable observational estimation of TCR.

The effect of blending air and SST data

I question whether using SST as a proxy for tas over the open ocean has caused any downward bias in estimation of TCR in the real climate system, or even (to any significant extent) in CMIP5 models.

The paper REA16 primarily cite to support faster warming of tas over open water than SST,[6] which is also model-based, attributes this effect to the thermal inertia of the ocean causing a lag in ocean warming. This argument appears to be unsound. Another paper,[7] which they also cite, instead derives an equilibrium air – sea surface warming differential from a theoretical model based on an assumed relative humidity height profile, with thermal inertia playing no role. This is a better argument. However, it depends on the assumed relative humidity profile being realistic, which it may not be. The first paper cited notes (caveating that observational uncertainties are considerable) that models do not match observed changes in subtropical relative humidity or in global precipitation.

For CMIP5 models, REA16 states that the tas vs SST warming differential is about 9% on the RCP8.5 scenario and is broadly consistent between models historically and over the 21st century. However, the differential I calculate is far smaller than that. I compared the increases in tas and ‘ts‘ between the means for first two decades of the RCP8.5 simulation (2006–25) and the last two decades of the 21st century, using ensemble mean data for each of the 36 CMIP5 models for which data was available. CMIP5 variable ‘ts‘ is surface temperature, stated to be SST for the open ocean and skin temperature elsewhere. The excess of the global mean increase in tas over that in ts, averaged across all models, was only 2%. Whilst ts is not quite the same as tas over land and sea ice, there is little indication from a latitudinal analysis that the comparison is biased by any differences in their behaviour over land and sea ice. Consistent with this, Figures 2 and S2 of Cowtan et al. 2015[8] (which use respectively tas and ts over land and sea ice) show very similar changes over time (save in the case of one model). Accordingly, I conclude that the stated 9% differential greatly overstates the mean difference in model warming between tas and blended air-sea temperatures. To a large extent that is because the 9% figure also includes an effect, when anomaly temperatures are used, from changes in sea ice extent. However, Figure 2 of Cowtan et al 2015 shows, based on essentially the same set of CMIP5 RCP8.5 simulations as REA16 and excluding sea-ice related effects, a mean differential of ~5% (range 1% – 7%), over double the 2% I estimate.

So, models exhibit a range of behaviours. What do observations show? Unfortunately, there is limited evidence as to whether and to what extent differential air-sea surface warming occurs in the real climate system. However, in the deep tropics, where the theoretical effects on the surface energy budget of temperature-driven changes in evaporation and water vapour are particularly strong, there is a near quarter century record of both SST and tas from the Tropical Atmosphere Ocean array of fixed buoys in the Pacific ocean. With averages over the full array extent based on a minimum of 40% valid data points, SAT and SST data are available for 1993-2015. The trend increase in SST over that period is 0.078°C/decade, considerably higher than the 0.047°C/decade for tas, not lower. If the required minimum is reduced to 20%, trends can be calculated over 1992-2015, for which they are 0.029°C/decade for SST, and 1.5% higher at 0.030°C/decade for tas. This evidence, although limited both spatially and temporally, does not suggest that tas increases faster than SST.

The effect of sea ice changes

The separation in REA16 of the effect of masking from that of sea ice changes on blending air and water temperature changes is somewhat artificial, since HadCRUT4 has limited coverage in areas where sea ice occurs. However, I will follow the REA16 approach. Their model-based estimate of the effect of sea ice changes appears to be ~4%, the difference between the 9 percentage points bias due to blending and the 5 percentage points (per Cowtan et al. 2015) due purely to the use of SST rather than tas for the open ocean. Changes in sea ice make a difference only when temperatures are measured as anomalies relative to a reference period, however I can find no mention in REA16 of what reference period is used.

CMIP5 models have generally simulated decreases in sea ice extent since 1900, accelerating over recent decades, around both poles (AR5 WG1 Figure 9.42). In reality, Antarctic sea ice extent has increased, not decreased, over the satellite era. Its behaviour prior to 1979 is unknown. On the other hand, since ~2005 Arctic sea ice has declined more rapidly than per mean CMIP5 projections. Differences in air temperatures above affected sea ice in the two regions, and the use of widely varying model weightings in REA16, complicate the picture. It is difficult to tell to what extent REA16’s implicit 4 percentage point estimate is biased. Nevertheless, based on sea ice data from 1979 on and unrealistically high long term warming by CMIP5 models in high southern latitudes (as discussed below), it seems to me likely to be an overestimate for changes between the baseline 1861–80 and final 2000–09 periods used in REA16.

The effect of masking to HadCRUT4 coverage

I turn now to the claims about incomplete, and changing, data coverage biasing down HadCRUT4 warming by 15 percentage points. The reduction in global warming from masking to HadCRUT4 coverage is based on fast CMIP5 model historical period warming in southern high latitudes as well as northern; see REA16 Supplementary Fig. 6, LH panel and Supplementary Table 8. But this is the opposite of what has happened; high southern latitudes have warmed more slowly than average, over the period for which data are available.

Based on HadCRUT4 data with a minimum of 20% grid cells with data, warming over 60S–90S averaged 0.05°C/decade from 1934 to 2015. The trend is similar using a 10% or 25% minimum; higher minima result in no pre-WW2 data. This trend is much lower than the 0.08°C/decade global mean trend over the period. For the larger 50S–90S region a trend over 1880–2015 can be calculated, at 0.03°C/decade, if a minimum of 15% of valid data points is accepted. Again, this is much lower than the global mean trend of 0.065°C/decade over the same period. An infilled spatial plot of warming since 1960 per BEST (http://berkeleyearth.org/wp-content/uploads/2015/03/land-and-ocean-trend-comparison-map-large.png) likewise shows slower than average warming in southern high latitudes. And UAH (v6.0beta5) and RSS (v03_3) lower-troposphere datasets show very low warming south of 60S over 1979–2015: respectively 0.01 and –0.02°C/decade.

It follows that the real effect of masking to HadCRUT4 coverage over the historical period is, in the southern extra-tropics, almost certainly the opposite of that simulated by CMIP5 models. Therefore, in the real world the global effect of masking is likely to be far smaller than the ~15% bias claimed by REA15.

In an article earlier this year updating the Lewis and Curry results,[9] I addressed the key claims about the effects of masking to HadCRUT4 coverage made in Cowtan et al. 2015 and repeated in REA16, writing:

“It has been claimed (apparently based on HadCRUT4v1) that incomplete coverage of high-latitude zones in the HadCRUT4 dataset biases down its estimate of recent rates of increase in GMST [Cowtan and Way 2014].[10] Representation in the Arctic has improved in subsequent versions of HadCRUT4. Even for HadCRUT4v2, used in [Lewis and Curry], the increase in GMST over the period concerned actually exceeds the area-weighted average of increases for ten separate latitude zones, so underweighting of high-latitude zones does not seem to cause a downwards bias. The issue appears to relate more to interpolation over sea ice than to coverage over land and open ocean in high latitudes.

The possibility of coverage bias in HadCRUT4 has since been independently examined by ECMWF using their well-regarded ERA-Interim reanalysis dataset. They found no reduction in that dataset’s 1979-2014 trend in 2 m near-surface air temperature when the globally-complete coverage was reduced to match that of HadCRUT4v4.[11] Since the ERA-interim reanalysis combines observations from multiple sources and of multiple atmospheric variables, based on a model that is well-proven for weather forecasting, it should in principle provide a more reliable infilling of areas where surface data [are] very sparse, such as high-latitude zones, than mechanistic methods such as kriging. Moreover, during the early decades of the HadCRUT4 record (which includes the 1859-1882 base period) data [were] sparse over much of the globe, and global infilling may introduce significant errors.”

Thus, the claim by Cowtan and Way (2014) that the ERA-interim analysis shows a rapidly increasing cold bias in HadCRUT4 after 1998 does not apply to HadCRUT4v4 over the longer post 1978 period. Focussing first on this period, the performance of the ERA-Interim and six other reanalyses in the Arctic was examined by Lindsay et al.[12] Although the accuracy of reanalyses in the fast warming but sparsely observed Arctic region has been questioned, the authors found that ERA-interim had a very high correlation with monthly temperature anomalies at 449 Arctic land stations. They reckoned ERA-interim to be the most accurate reanalysis for surface air temperature both in absolute terms and as to (post 1979) trend.

Lindsay et al. found GISTEMP to have a higher post-1978 trend in the Arctic than ERA-interim, but GISTEMP uses a crude interpolation and extrapolation based infilling method. Moreover, the ERA-interim version used by ECMWF to investigate possible coverage bias differs from the main dataset. It incorporates a homogeneity adjustment to its post 2001 SST data that significantly increases its temperature trend over that of the main ERA-interim reanalysis. Taking account of that might well eliminate the Arctic trend shortfall compared with GISTEMP. Certainly, over 1979-2015 both the adjusted ERA-interim and HadCRUT4v4 datasets showed a slightly higher trend in global temperature (of respectively 0.166 and 0.165 °C/decade) than did GISTEMP (0.162°C/decade).

Another recent study, Dodd et al,[13] stated that “ERA-Interim has been found to be consistent with independent observations of Arctic [surface air temperatures] and provides realistic estimates of Arctic temperatures and temperature trends that outperform, or are comparable to, other currently available reanalyses for all areas of the Arctic so far investigated.” In her Phd thesis, Dodd also noted that “The issues arising from using drifting platforms in this study highlight the difficulty of investigating [surface air temperatures] over Arctic sea ice.” All this suggests that mechanistic infilling methods are unlikely to outperform the ERA-interim reanalysis in the Arctic, or indeed the Antarctic.

Prior to 1979, there is very little evidence as to the actual effects of incomplete observational coverage, or of blending air and SST measurement, on estimated trends in global temperature. However, there are two well known long-term surface temperature datasets that are based (on a decadal timescale upwards) on air temperature over the ocean as well as land, and which moreover infill to obtain complete or near complete global coverage: NOAAv4.01 and GISSTEMP. Cowtan et al (2015) accept that the new NOAA data set “incorporates adjustments to SSTs to match night-time marine air temperatures and so may be more comparable to model air temperatures”. GISSTEMP uses the NOAAv4.01 SST data set (ERSST4). Both NOAAv4.01 and GISSTEMP show almost identical changes in mean GMST to that per HadCRUT4v4 from 1880-1899, the first two decades they cover, to 1995-2015, the final period used in the update of Lewis and Curry. This suggests that any downwards bias in TCR estimation arising from use of HadCRUT4v4 is likely to be very small. Moreover, whilst some downwards bias in HadCRUT4v4 warming may exist, there are also possible sources of upwards bias, particularly over land, such as the effects of urbanisation and of destabilisation by greenhouse gases of the night-time boundary layer.

A way to resolve some of the uncertainties arising from poor early observational coverage

It is doubtful that any method of global infilling of temperatures based on the limited observational coverage available in the second half of the 19th century or (to a decreasing extent) during the first half of the 20th century is very reliable.

Fortunately, there is no need to use the full historical period when estimating TCR. Uncertainty regarding ocean heat uptake in the middle of the historical period, although a problem for ECS estimation, is not relevant to TCR estimation. Lewis and Curry gave an estimate of TCR based on changes from 1930–50 to 1995–2011, periods that were well matched for mean volcanic activity and AMO state, and which delineate a period over which forcing approximated a 70 years ramp. That TCR estimate was 1.33°C, the same as the primary TCR estimate using 1859–82 as the base period. Updating the final period to 1995–2015 and using HadCRUT4v4 left the estimate using the 1930–50 base period unchanged at 1.33°C. The infilling of HadCRUT4 by Cowtan and Way is prone to lesser error when using a 1930–50 base period rather than 1859–82 (or 1861–80 as in REA16), since observational coverage was less sparse during 1930–50. Accordingly, estimating TCR using an infilled temperature dataset makes more sense when the later base period is used.

So does use of the infilled Cowtan and Way dataset increase the 1930–50 to 1995–2015 TCR estimate by anything like 15%, the coverage bias for CMIP5 models reported in REA16 for the full historical period? No. The bias is an insignificant 3%, with TCR estimated at 1.37°C. Small additional biases, discussed above, from changes in sea ice and differences in warming rates of SST and air just above the open ocean (which it appears the Cowtan and Way dataset does not adjust for) might push up the bias marginally. However, ~80% of the total warming involved occurred after 1979, and as noted earlier since 1979 the trend in HadCRUT4v4 matches that in the (adjusted) ERA-interim dataset, which estimates purely surface air temperature, not a blend with SST, and has complete coverage. That suggests the bias from estimating TCR from 1930–50 to 1995–2015 using HadCRUT4v4 data is very minor, and that observation based estimates of TCR of ~1.33°C need to be revised up by, at most, a small fraction of the 24% claimed in REA16.

Claims by Kyle Armour

In an opinion piece related to REA16 in the same issue of Nature Climate Change, “Climate sensitivity on the rise”, Kyle Armour made three claims:

That, as a result of REA16’s findings, observation-based estimates of climate sensitivity and TCR must be revised upwards by 24%.
.
That the findings in Marvel et al (2015)[14] about various other types of forcing having differing effects on global temperature from CO₂ (different efficacies) call for multiplying observational estimates of climate sensitivity and TCR by a further factor of 1.30.
.
That a robust behaviour in models of apparent (effective) climate sensitivity being lower in the early years after a forcing is imposed than subsequently, rather than remaining constant, requires multiplying estimates of climate sensitivity by a further factor of ~1.25 in order to convert what they actually estimate (effective climate sensitivity) to ECS.

I will show that each of these claims is very wrong. Taking them in turn:

REA16’s findings are purely model based and do not reflect behaviour in the real climate system. There is little evidence for any major bias when TCR is estimated using observed changes from early in the historical period to the recent past, but limited observational coverage in the early part makes it difficult to quantify bias. However, TCR can also validly be estimated from observed warming since 1930–1950, most of which occurred during the well observed post-1978 satellite era. Doing so produces an identical TCR estimate to when using the long period, and any downwards bias in the estimate appears to be very small. An adjustment factor in the range 1.01x to 1.05x, not 1.24x, appears warranted.
.
As I have pointed out elsewhere,[15] Marvel et al has a number of serious faults, only two of which have to date been corrected.[16] Nonetheless, for what it is worth, after correcting those two errors Marvel et al.’s primary (iRF) estimate of the effect on global temperature of the mix of forcings acting during the historical period is the same as if the forcing had been, as per the definition of TCR, solely due to CO₂. That is, historical forcing has an estimated transient efficacy of 1.0 (actually 0.99). That would, ignoring the other problems with Marvel et al., justify a multiplicative adjustment to TCR estimates of 1.01x, not 1.30x.
.
It is not true that increasing effective sensitivity is a “robust” feature of models. In four CMIP5 models, the shortfall of climate sensitivity estimated using the first 35 years’ data following an abrupt CO₂ increase (roughly corresponding to the weighted average duration of forcing increments over the historical period) compared to that estimated using the standard 150 year regression method, is negligible (2% or less) for six models; for three of those the short period estimate is actually higher. The average shortfall over all CMIP5 models for which I have data is only 7%. Moreover, there is little evidence that the principal causes of estimated ECS exceeding multidecadal effective climate sensitivity in many CMIP5 models (in particular, weakening of the Pacific Walker circulation) are occurring in the real world. So any adjustment to observational estimates of climate sensitivity on account of effective climate sensitivity being, in many models, below ECS (a) does not appear to be well supported by observations; and (b) if based on the average behaviour of CMIP5 models, should be 1.08x rather than 1.25x.

Nicholas Lewis
.

References

[1] Mark Richardson, Kevin Cowtan, Ed Hawkins and Martin Stolpe. Reconciled climate response estimates from climate models and the energy budget of Earth. Nature Clim Chng (2016) doi:10.1038/nclimate3066

[2] Kyle Armour. Projection and prediction: Climate sensitivity on the rise Nature Clim Chng (2016) doi:10.1038/nclimate3079

[3] Otto, A. et al. Energy budget constraints on climate response. Nature Geosci. 6, 415-416 (2013).

[4] Gregory, J. M., Stouffer, R. J., Raper, S. C. B., Stott, P. A. & Rayner, N. A. An Observationally Based Estimate of the Climate Sensitivity. J. Clim. 15, 3117–3121 (2002).

[5] Lewis, N. & Curry, J. A. The implications for climate sensitivity of AR5 forcing and heat uptake estimates. Clim. Dynam. 45, 1009_1023 (2015).

[6] Richter, I. & Xie, S.-P. Muted precipitation increase in global warming simulations: a surface evaporation perspective. J. Geophys. Res. 113, D24118 (2008).

[7] Ramanathan, V. The role of ocean-atmosphere interactions in the CO2 climate problem. J. Atmos. Sci. 38, 918_930 (1981).

[8] Cowtan, K. et al. Robust comparison of climate models with observations using blended land air and ocean sea surface temperatures. Geophys. Res. Lett. 42, 6526–6534 (2015).

[9] https://nicholaslewis.org/wp-content/uploads/2016/04/ar5_ebstudy_update_article1b.pdf

[10] Cowtan, K. & Way, R. G. Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends. Q. J. R. Meteorol. Soc. 140, 1935_1944. (2014)

[11] See http://www.ecmwf.int/en/about/media-centre/news/2015/ecmwf-releases-global-reanalysis-data-2014-0. The data graphed in the final figure shows the same 1979-2014 trend whether or not coverage is reduced to match HadCRUT4.

[12] Lindsay, R et al. Evaluation of Seven Different Atmospheric Reanalysis Products in the Arctic. J Clim 27, 2588–2606 (2014)

[13] Dodd, MA, °C Merchant, NA Rayner and CP Morice. An Investigation into the Impact of using Various Techniques to Estimate Arctic Surface Air Temperature Anomalies. J Clim 28, 1743-1763 (2015).

[14] Kate Marvel, Gavin A. Schmidt, Ron L. Miller and Larissa S. Nazarenko, et al.: Implications for climate sensitivity from the response to individual forcings. Nature Climate Change DOI: 10.1038/NCLIMATE2888 (2015).

[15] https://nicholaslewis.org/appraising-marvel-et-al-implications-of-forcing-efficacies-for-climate-sensitivity-estimates/

[16] https://nicholaslewis.org/marvel-et-al-giss-did-omit-land-use-forcing/

Are energy budget TCR estimates biased low, as Richardson et al (2016) claim?