library(zoo)
library(MBC)
library(dplyr)
library(lubridate)
library(EnvStats)Bias correction of climate models
1 Load necessary libraries
2 Load preprocessed data
As described in the database section, preprocessed datasets can be efficiently reloaded using the readRDS() function. In the code chunk below, we reload the observational and climate projection dataset that was previously prepared and exported. The structure of the R objects is fully preserved: the observed data are stored in a dataframe, while the climate projection data are organized as a nested list, ensuring seamless reuse in subsequent analyses.
hist_obs = readRDS("output_preprocessed/processed_obsdata_lez.rds")
names(hist_obs) = c("Date", "PR", "T", "PET", "Q")
head(hist_obs) Date PR T PET Q
1 1977-01-01 2.85 6.8425 0.75 14.561798
2 1977-01-02 0.00 4.6035 0.90 11.649438
3 1977-01-03 0.00 3.3415 1.55 7.766292
4 1977-01-04 1.70 2.9455 2.25 5.824719
5 1977-01-05 6.40 3.9835 1.00 4.853933
6 1977-01-06 5.95 4.8015 0.70 3.994787
Although data have been prepared and stored for three climate models, only the CNRM model is used here for demonstration purposes. Readers interested in further exploration may test the other models as an optional exercise.m_proj = readRDS(“output_preprocessed/processed_climatedata_lez.rds”) climdataCNRM = clim_proj[[“climdataCNRM”]]
lapply(climdataCNRM, function(df) head(df, 5))
clim_proj = readRDS("output_preprocessed/processed_climatedata_lez.rds")
climdataCNRM = clim_proj[["climdataCNRM"]]
lapply(climdataCNRM, function(df) head(df, 5))$hist
Date PR TMAX TMIN
1 1977-01-01 0 6.4188 -4.5737
2 1977-01-02 0 6.5757 -4.8144
3 1977-01-03 0 7.9662 -2.2713
4 1977-01-04 0 6.7198 -3.4396
5 1977-01-05 0 7.9568 -3.7383
$rcp45
Date PR TMAX TMIN
1 2070-01-01 0.0000000 2.5586 -6.2192
2 2070-01-02 1.1682000 4.4052 -2.5317
3 2070-01-03 0.0019633 9.8453 -1.0744
4 2070-01-04 0.1963200 8.2983 -1.3186
5 2070-01-05 3.1049000 4.6736 -2.7440
$rcp85
Date PR TMAX TMIN
1 2070-01-01 0.290380 10.0550 1.87530
2 2070-01-02 3.892100 9.8889 0.77607
3 2070-01-03 0.623500 12.8670 5.34700
4 2070-01-04 0.000000 15.0420 4.20750
5 2070-01-05 0.080232 15.0380 1.70360
3 Calculate potential evapotranspiration (PET)
Before applying bias correction, we first compute the PET from the climate model data. We will use the Hargreaves and Samani equation (Hargreaves & Samani, 1985), a widely adopted temperature-based method that estimates PET from daily temperature extremes and extraterrestrial radiation. Calculating PET at this stage ensures that both precipitation and evapotranspiration are consistently represented, which is essential for subsequent hydrological modelling and bias-correction procedures. To simplify the process, we have provided a ready-to-use function named calc_hargreaves(...) available in the helper functions folder of this practical’s repository. The chunk below source this function:
github_repo = paste0("https://raw.githubusercontent.com/",
"diopBachir/tp_modhydroclimat/main/")
source(file.path(github_repo, "helper_functions/calc_hargreaves.R"))# Apply the calc_hargreaves() function to the 'hist', 'rcp45', and 'rcp85'
climdataCNRM = lapply(climdataCNRM,
function(df)
{
df$PET = calc_hargreaves(df, lat = 44.1958)
df[, c("Date", "PR", "PET")]
})
lapply(climdataCNRM, function(df) head(df, 5))$hist
Date PR PET
1 1977-01-01 0 0.6549132
2 1977-01-02 0 0.6684382
3 1977-01-03 0 0.7041511
4 1977-01-04 0 0.6641879
5 1977-01-05 0 0.7342219
$rcp45
Date PR PET
1 2070-01-01 0.0000000 0.4991833
2 2070-01-02 1.1682000 0.5232178
3 2070-01-03 0.0019633 0.7814048
4 2070-01-04 0.1963200 0.7076990
5 2070-01-05 3.1049000 0.5511199
$rcp85
Date PR PET
1 2070-01-01 0.290380 0.7171000
2 2070-01-02 3.892100 0.7403796
3 2070-01-03 0.623500 0.7864600
4 2070-01-04 0.000000 0.9676216
5 2070-01-05 0.080232 1.0305621
4 Bias correction of climate model data
Bias-correction techniques are commonly applied to reduce systematic discrepancies between climate model simulations and observations, thereby making climate projections suitable for hydrological applications. A wide range of methods has been developed to correct biases in climate variables simulated by climate models. In this practical, we will apply the MBC method. The Multivariate Bias Correction is designed to calibrate and apply multivariate bias-correction algorithms to climate model simulations involving multiple climate variables simultaneously (Cannon et al., 2015; Cannon, 2016, 2018).
Both MBC and CDF-t aim to reduce biases in climate model outputs, but they differ in scope and complexity. MBC handles multiple variables simultaneously, accounting for inter-variable correlations, whereas CDF-t focuses on single-variable distributions, making it simpler but limited to univariate correction. In practice, the choice of method depends on the variables of interest and the need to preserve multivariate dependence structures in impact studies.
4.1 Applying the MBCn algorithm
We will use the MBCn (N-dimensional Multivariate Bias Correction) algorithm, from the MBC package, to perform bias correction. The MBCn method is based on a probability density function transformation, making it a powerful approach for correcting biases in climate model outputs (see Cannon (2018) for methodological details).
Unlike traditional univariate bias-correction methods that adjust each variable independently, MBCn corrects multiple variables simultaneously, such as precipitation and potential evapotranspiration (PET), while explicitly preserving their multivariate dependence structure. This property is particularly important for hydrological modelling, where interactions between climate variables directly influence soil moisture dynamics, evapotranspiration, and runoff generation.
4.1.1 Accounting for seasonality
To properly account for seasonality, the bias-correction procedure is applied separately for each calendar month. The data are first split into monthly subsets (e.g., all Januaries together, all Februaries together, and so on). The MBCn algorithm is then applied independently to each monthly “slice”, after which the corrected data are recombined into continuous time series.
This seasonal stratification ensures that statistical relationships learned for summer rainfall are not inadvertently applied to winter precipitation, thereby improving the physical consistency and reliability of the bias-corrected climate variables.
4.1.2 Apply bias correction
The helper function seasonalMBCn(...) sourced in the chunk below provides a clean wrapper around the MBCn() routine. It automatically extracts date components, applies the multivariate bias correction to historical and future simulations, and returns bias-corrected datasets that are ready for subsequent hydrological modelling.
source(file.path(github_repo, "helper_functions/seasonalMBCn.R"))# correct RCP45
MBCnSeasonalRCP45 = seasonalMBCn(hist_obs,
climdataCNRM[["hist"]],
climdataCNRM[["rcp45"]])
# correct RCP85
MBCnSeasonalRCP85 = seasonalMBCn(hist_obs,
climdataCNRM[["hist"]],
climdataCNRM[["rcp85"]])
# extract bias corrected data
histCNRM_MBCn_adj = MBCnSeasonalRCP45[['hist']]
rcp45CNRM_MBCn_adj = MBCnSeasonalRCP45[['proj']]
rcp85CNRM_MBCn_adj = MBCnSeasonalRCP85[['proj']]4.1.3 Visualizing bias correction results
To visualize the bias correction results, we plot the the Empirical Cumulative Distribution Functions (ECDFs) to validate the effectiveness of the MBCn algorithm.
Show the code
# variale.s to plot
vars = c("PR", "PET")
# plot layout
par(mfrow = c(1, 2), mar = c(5, 4, 3, 2))
for (v in vars) {
adj_v = paste0(v, "_adj")
obs = na.omit(hist_obs[[v]])
raw = na.omit(histCNRM_MBCn_adj[[v]])
adj = na.omit(histCNRM_MBCn_adj[[adj_v]])
if (v == "PR") {
obs = obs[obs > 0.1]
raw = raw[raw > 0.1]
adj = adj[adj > 0.1]
}
lims = range(c(obs, raw, adj))
qqPlot(x = obs, y = raw,
plot.type = "Q-Q",
qq.line.type = "0-1",
add.line = TRUE,
points.col = "#ffb04e",
line.col = "gray50",
cex = 1.2,
pch = 1,
cex.main = 1.3,
cex.lab = 1.5,
cex.axis = 1.4,
xlim = lims, ylim = lims,
main = paste("MBCn - ", ifelse(v=="PR", "Precip.", "PET")),
xlab = "Observations [mm.d]",
ylab = "RCM data [mm.d]")
coords_adj = qqPlot(x = obs, y = adj, plot.it = FALSE)
points(coords_adj$x, coords_adj$y,
col = "#5aa4ff",
pch = 1, cex = 1.2)
legend("topleft",
legend = c("Obs. vs Raw RCM", "Obs. vs Bias-corrected RCM"),
col = c("#ffb04e", "#5aa4ff"),
pch = 1,
bty = "o",
cex = 1.3,
pt.lwd = 2,
pt.cex = 1.5)
}
To better understand the seasonal patterns of precipitation and potential evapotranspiration, we compute the mean daily cycle over all years. Unlike a standard calendar year, we align the daily cycle with the hydrological year, starting on September 1st. This allows us to capture the seasonal dynamics of wet and dry periods consistently. To reduce day-to-day variability and highlight the underlying seasonal signal, the resulting daily series are further smoothed using a centered 30-day rolling mean before visualization.
Show the code
# Helper: compute hydrological day of year (1 = 1 Sept)
hydro_doy = function(dates)
{
# Shift the year so that 1 Sept = day 1
doy = yday(dates - months(8))
return(doy)
}
# Daily climatology based on hydrological year
daily_climatology_hydro = function(df, vars = c("PR_adj", "PET_adj"))
{
df %>%
mutate(HydroDOY = hydro_doy(Date)) %>%
group_by(HydroDOY) %>%
summarise(across(all_of(vars), ~mean(.x, na.rm = TRUE))) %>%
ungroup()
}
rollm = function(x) rollmean(x, k = 30, fill = NA, align = "right")
par(mfcol = c(1,2), mar = c(5, 5, 2, 1))
# scenarios = c("Hist", "RCP4.5", "RCP8.5")
scenarios = "Hist"
vars = c("PR", "PET")
cols = c("black", "red", "blue") # Observed, Raw, Bias-corrected
raw_list = list(
"Hist" = histCNRM_MBCn_adj
# "RCP4.5" = rcp45CNRM_MBCn_adj,
# "RCP8.5" = rcp85CNRM_MBCn_adj
)
obs_list = list(
"Hist" = hist_obs
# "RCP4.5" = hist_obs,
# "RCP8.5" = hist_obs
)
for(v in vars) {
for(scn in scenarios) {
# Observed, Raw, Bias-corrected
obs_val = daily_climatology_hydro(obs_list[[scn]], vars = v)[[v]]
raw_val = daily_climatology_hydro(raw_list[[scn]], vars = v)[[v]]
adj_val = daily_climatology_hydro(raw_list[[scn]], vars = paste0(v, "_adj"))[[paste0(v, "_adj")]]
# Apply rolling mean
obs_val = rollm(obs_val)
raw_val = rollm(raw_val)
adj_val = rollm(adj_val)
# Axis limits
x_lim = c(1, 365)
y_lim = range(c(obs_val, raw_val, adj_val), na.rm = TRUE)
# Plot panel
plot(1:365, obs_val, type="l", lwd=2, col=cols[1], ylim=y_lim,
xlab="Hydro day of year",
ylab=ifelse(v=="PR", "Precip", "PET"),
main=paste(ifelse(v=="PR", "Precip", "PET"), "-", scn),
cex.lab = 1.5, # axis labels
cex.axis = 1.3, # axis numbers
cex.main = 1.5)
lines(1:365, raw_val, col=cols[2], lwd=1, lty=2)
lines(1:365, adj_val, col=cols[3], lwd=1, lty=3)
# Legend only for top-left panel (PR-Hist)
if(v == "PR" && scn == "Hist") {
legend("topright", legend=c("Observed", "Raw Model", "Bias-corrected"),
col=cols, lwd=c(2,1.5,1.5), lty=c(1,2,3), bty="o", cex=1.2)
}
}
}
As in the previous sections, the bias-corrected climate datasets are saved using the saveRDS() function. This approach preserves the internal structure of the R objects (lists and data frames) and facilitates efficient reloading and reuse in subsequent processing steps.
# Create an output directory if it doesn't exist
if (!dir.exists("output_biascorrection")) dir.create("output_biascorrection")
# Save the bias-corrected climate data while preserving object structure
saveRDS(list("histCNRM_adj" = histCNRM_MBCn_adj,
"rcp45CNRM_adj" = rcp45CNRM_MBCn_adj,
"rcp85CNRM_adj" = rcp85CNRM_MBCn_adj),
file="output_biascorrection/biascorrectedCNRM-MBCn.rds")4.2 Exercise
Now that we have successfully bias-corrected the CNRM dataset, it is time to apply the same rigor to the second Regional Climate Model (RCM) in our database : the SMHI-MPI model.
clim_proj = readRDS("output_preprocessed/processed_climatedata_lez.rds")
climdata_SMHI_MPI = clim_proj[["climdataSHMI_MPI"]]
# lapply(climdata_SMHI_MPI, function(df) head(df, 5))# Apply the calc_hargreaves() function to the 'hist', 'rcp45', and 'rcp85'
climdata_SMHI_MPI = lapply(climdata_SMHI_MPI,
function(df)
{
df$PET = calc_hargreaves(df, lat = 44.1958)
df[, c("Date", "PR", "PET")]
})
# lapply(climdata_SMHI_MPI, function(df) head(df, 5))With the SMHI-MPI projections loaded and the PET calculated, below are your tasks:
Execute the Correction: Run the
seasonalMBCn(...)function across your variable matrix (Precipitation and PET).Validation: Use the provided visualization suite to generate comparative plots.
Analysis: Observe how the distribution of the corrected SMHI-MPI data aligns with the reference observations compared to the raw model output.
Save results: save the bias-corrected data to
output_biascorrection/biascorrectedSMHI-MPI.rds.
Once the seasonalMBCn(...) function has finished processing, your output visualizations should demonstrate the effectiveness of the correction. Specifically, you are looking for the “collapse” of the model’s systematic errors toward the historical observation baseline.
5 What bias correction does NOT do
The days used in climate models often do not correspond exactly to the calendar days of the real world. This is not an error, but rather a deliberate mathematical simplification that makes the extensive computations required for climate projections more manageable. As a result, different climate models use different calendars, relying on specialized calendars, such as the 360-day calendar or the 365-day (no-leap) calendar. Only a few climate models follow the standard (Gregorian) calendar, with 365 days in non-leap years and 366 days in leap years. Therefore, as done with the data used in this practical, it is recommended to convert climate model output to a standard Gregorian calendar before applying bias correction.