--- title: "FEH statistical analysis" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick-start-guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction This quick guide provides an overview of the core FEH functions that can be used to generate extreme flow estimates from single-site or pooling (gauged & ungauged) approaches. It is assumed that the reader has knowledge of the theory of these approaches; this guide is intended to demonstrate how to use the UKFE package rather than to explain the FEH statistical method. The following table provides the key functions for implementing the FEH methods. ```{r echo = FALSE, results = 'asis'} library(knitr) kable(data.frame(Function = c("GetCDs/CDsXML", "GetAM/AMImport", "QuickResults", "QMED", "DonAdj", "Pool", "H2", "DiagPlots", "Zdists", "PoolEst", "Uncertainty", "GenLog/GEV/GenPareto/Gumbel/Kappa3", "POTextract or POTt", "AnnualStat"), Purpose = c("Get catchment descriptors using the NRFA gauge reference number or import them from XML files downloaded from the FEH Web Service", "Get annual maximum data from sites suitable for pooling or from AM files", "Provides default pooling results (gauged, ungauged & fake ungauged), directly from catchment descriptors", "Estimates QMED from catchment descriptors with the option of zero, one or two donors", "Provides a list of donor candidates based on proximity to the study catchment", "Creates a pooling group with the option of excluding one or more sites", "The heterogeneity measure for pooling groups", "Provides a number of diagnostic plots to compare the sites in a pooling group", "The Zdist statistic to determine the distribution with the best fit to the pooling group", "Flood estimates from the pooling group", "Quantifies uncertainty for gauged (enhanced single-site) and ungauged pooling groups", "Four functions for each distribution to fit the distribution and obtain flood frequency estimates and growth factors", "Functions for extracting peaks over threshold data", "Extract annual statistics. The default is maximum, but any statistic can be used such as sum or mean."))) ``` We need to load the package before we can use any of the functions: ```{r setup} library(UKFE) ``` ## Overview and assumptions of the FEH statistical method This section provides a brief overview of the FEH statistical method, and its core assumptions. Examples of how to apply the method can be found later. The FEH statistical method comes under the broader term of 'regional frequency analysis' and employs the "index flood" procedure for estimating peak flows at a single location on a river line. The procedure has two main steps, which are multiplied together for a final result: * Estimate the index flood (in the FEH method, the index flood is QMED - the median annual maximum flow). * Calculate the growth curve (this provides growth factors as a function of return period and is a standardised model of the AMAX distribution). Multiplying the index flood by the growth curve scales the standardised growth factor to produce the flow estimate for the catchment of interest. When the catchment is gauged, the index flood is estimated as the median of the observed annual maximum sample (assuming suitable data quality). When the catchment is ungauged, the index flood is estimated using a multiple regression equation, using catchment descriptors as predictors. The growth curve is estimated by pooling gauged data from catchments which are similar to the catchment of interest. A statistical distribution is assumed. Then, the average parameters of the fitted distribution (L-moment ratios) from the AMAX samples in the pooling group are used to estimate the growth curve. There are numerous choices and possible adjustments which can be made for each step. The UKFE package, in general, provides default settings with no adjustments; however, these can be changed by the user. There are several key underlying assumptions in FEH statistical analysis. As stated in Hosking and Wallis (1997), the index-flood procedure makes the following assumptions: * Observations at any given site are identically distributed. * Observations at any given site are serially independent. * Observations at different sites are independent. * Frequency distributions at different sites are identical apart from a scale factor. * The mathematical form of the regional growth curve is correctly specified. The first of these leads to the assumption of stationarity (i.e. observations of past flood events are assumed to represent the behaviour of future events). Where this is not the case, non-stationary methods can be applied. For example, non-stationary distributions can be fitted using other software such as the `nonstat` R package (Hecht and Zitzmann, 2025). The QMED regression equation was calibrated using an approach similar to the generalised least squares method. This allows for covariance of the model residuals (errors). Firstly, the observed QMEDs (on which it was calibrated) were logarithmically transformed. The catchment descriptors used were also transformed. With this in mind, the assumptions of the QMED equation are as follows: * The relationship between the transformed independent variables (predictors) and the logarithmically transformed dependent variable (QMED) is linear. * Observations (observed QMEDs used in calibration) are independent of each other. * Model and sampling errors are independent of each other. * Model and sampling errors are normally distributed and have a mean of zero. * Predictor variables are independent. * The cross-correlation of model errors can be described by the distance between catchment centroids, and the form of the associated correlation matrix is known for the calibration process. ## Getting catchment descriptors One of the first steps in any FEH analysis is to obtain catchment descriptors (CDs). See the 'Data input' article for information on functions for importing CDs. ## Quick results Once you have some catchment descriptors, you can start to use many of the FEH based functions. The full process, covered shortly, is encompassed in the `QuickResults()` function. This provides results from a default pooling group for gauged or ungauged catchments, automatically accounts for urbanisation, and provides a two donor transfer if `gauged = FALSE`. The default distribution is the generalised logistic distribution. Further details and assumptions are provided in the function's help file. First, we'll obtain some CDs for a site suitable for pooling, allowing us to conduct both a gauged estimate and an as-if ungauged estimate (i.e., treating the gauged site as if no flow record were available, using only catchment descriptors). The following code then provides estimates and plots for both the gauged and as-if ungauged cases. For the gauged case, the plot shows the default gauged growth curve for site 55002. The results from the default enhanced single-site analysis at gauge 55002 are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q) and growth factor estimates (GF). Two additional columns quantify the 95% confidence interval of the peak flow estimates. The second element is the estimated L-CV and L-skew (linear coefficients of variation and skewness). The third and fourth elements provide distribution parameters (location, scale and shape for the default generalised logistic distribution). The third provides distribution parameters for the growth curve, and the fourth provides distribution parameters for the frequency curve. ```{r fig.alt="Plot showing growth curves from FEH QuickResults calculations. Pooled estimates are shown as a solid line, single-site estimates as a dashed green line, and observed flood data as blue circles. The x-axis shows the logistic reduced variate and the y-axis shows Q/QMED, that is, flow divided by the median annual flood. A scale for the return period is displayed at the bottom right. The plot illustrates how well the pooled and single-site growth curves map onto the observed flood data. Neither curve matches all observations exactly, but both provide a reasonable fit overall. Both curves follow the expected growth-curve shape, with increasing discharge at higher return periods."} # Get catchment descriptors for NRFA site 55002 and save to an object called CDs.55002 CDs.55002 <- GetCDs(55002) # Obtain estimates and plots for the gauged scenario using default settings QuickResults(CDs.55002, gauged = TRUE) ``` The fake ungauged quick results estimate removes the pooled site of interest from the process and implements an ungauged assessment. Here, the plot shows the default fake ungauged growth curve for site 55002. The same outputs are again printed to the console, but this time they provide the 68% confidence interval. ```{r fig.alt="Plot showing how multiple single-site growth curves are pooled to form the default ‘fake ungauged’ growth curve. The x-axis shows the logistic reduced variate and the y-axis shows Q/QMED, that is, flow divided by the median annual flood. Several solid black lines represent growth curves from individual sites, and a red dashed line represents the pooled growth curve with the target site excluded. All curves follow the expected growth-curve shape, with increasing discharge at higher return periods. The pooled curve lies within the spread of the individual site curves and, as expected, closely resembles their average, illustrating how pooling smooths variability to produce a representative ungauged growth curve."} # Obtain estimates and plots for the fake ungauged scenario using default settings QuickResults(CDs.55002, FUngauged = TRUE) ``` The default setting for the `QuickResults()` function is to treat the site as ungauged, in which case, for a catchment with no gauge, the code can simply be run with only the first argument (the CDs) provided. If the site is not urban and the CDs are from a site suitable for pooling, the site itself will be included (so it won't be truly ungauged unless we use `FUngauged = TRUE`). Note that if the default ungauged is used, then ESS will not be applied, even if the gauge of interest is at the top of the pooling group. The gauge will be included but won't be weighted according to the ESS method. The QMED is estimated as if ungauged; however, because the CDs are the same as those of the closest donor (which are derived from the gauged site), the QMED will end up being the same as the observed one. ## Estimating QMED There is a `QMED()` function which provides the ungauged QMED (median annual maximum flow) estimate from catchment descriptors. There is an option to adjust this using one or two donor sites. The simplest case is to provide catchment descriptors with no donor transfer: ```{r} # Estimate QMED from the catchment descriptors for site 55002 QMED(CDs.55002) ``` By default, this gives rural QMED results. If the site is urbanised (has an URBEXT2000 value greater than 0.03), there will be a warning and a suggestion to change the `UrbAdj` argument to TRUE. To undertake a donor transfer (with the aim of improving the estimate), a list of donor site candidates is necessary. For this, there is the `DonAdj()` function, which provides the closest *N* gauges (default of 10) to the site (geographically by catchment centroid) that are suitable for QMED, along with catchment descriptors. It also provides the donor adjusted result beside each one in the last column. ```{r} # Identify donor catchments based on the catchment descriptors for site 55002 DonAdj(CDs.55002) ``` You can apply two donors by selecting the two favoured candidates from the list and including a vector of the site references in the `Don2` argument of the `QMED()` function. The following, then, provides a two-donor adjusted QMED (the associated weights are also provided in the output): ```{r} # Estimate QMED from the catchment descriptors for site 55002 using gauges with NRFA # IDs 55007 and 55016 as donor sites QMED(CDs.55002, Don2 = c(55007, 55016)) ``` The user should be aware of the following specifics about the UKFE implementation of QMED estimation. However, note that there is an inherent flexibility in 'R', so any code can be edited. * The `QMED()` function is written to use a maximum of two donors. Note that the `QMEDDonEq()` function can be used, and a number of donors could be used in turn - the user would need to choose the donor weights. * The `DonAdj()` function only provides candidate donor sites from the NRFA "suitable for QMED" dataset. However, the `QMEDDonEq()` can be applied, which is entirely flexible regarding inputs. * By default, the `QMED()` function uses the URBEXT2000 catchment descriptor without an urban expansion. There is an argument within the function called `uef`, which is set to `FALSE`. This can be set to `TRUE` for the urban expansion to be applied for the present year. There is also the `UEF()` function, which can be applied independently, and for any year. * Urban donors should be avoided as far as possible. However, if an urban donor is deemed to be the best choice, the ungauged QMED estimate of the donor (or donors) can be urban adjusted by setting the `DonUrbAdj` argument to `TRUE`. ## Creating a pooling group To create a pooling group, the `Pool()` function can be used. If it is for a gauged site, the site will be automatically included in the group, unless it is urban. If you wish to include the urban site, the `iug` argument can be set to `TRUE`. In which case, it is also standard practice to set the `DeUrb` argument to `TRUE`, to de-urbanise the subject site so that the estimate can undergo urban adjustment afterwards. To create and view the pooling group: ```{r} # Create a pooling group based on the catchment descriptors for site 55002 and # save to an object called Pool.55002 Pool.55002 <- Pool(CDs = CDs.55002) # View the pooling group Pool.55002 ``` You may now wish to assess the group for heterogeneity and see which distribution fits best. The `H2()` function can be used to check for heterogeneity. ```{r, echo = FALSE} set.seed(5) ``` ```{r} # Check for heterogeneity H2(Pool.55002) ``` This group appears to be homogeneous, but if a group is heterogeneous, you may wish to use the `DiagPlots()` function to check for any outliers etc. You can also look at the discordancy measures, which can be found in the second to last column of the pooling group created. ```{r fig.alt="Diagnostic plot consisting of 10 panels used to aid review of catchment descriptors. The first seven panels show histograms of all catchments, with each of the ten sites marked by small crosses, for the following variables: Area, SAAR, PROPWET, FARL, FPEXT, BFIHOST19, and URBEXT2000. Two scatter plots display L-skew versus L-CV and L-skew versus L-kurtosis. The final panel shows the ten catchment locations as points on a UK outline, plotted by Eastings and Northings. Together, the plots allow comparison of these sites against the distribution of all catchments. The ten sites appear generally central and unremarkable across descriptor space, indicating they are fairly typical relative to the wider catchment set."} # Create diagnostic plots to aid review of the pooling group DiagPlots(Pool.55002) ``` Let's pretend that site 8010 and site 76017 are to be removed (because they have the highest discordancy - in reality, a more thorough analysis would be undertaken). To remove the sites, we recreate the pooling group and use the `exclude` option. The function automatically adds the next site(s) with the lowest similarity distance measure if the total number of years of AMAX in the group has dropped below 500 (or whatever we set as the `N` argument in the `Pool()` function, which has a default of 500). ```{r} # Recreate the pooling group, excluding NRFA sites 8010 and 76017 Pool.55002 <- Pool(CDs = CDs.55002, exclude = c(8010, 76017)) # View the new pooling group Pool.55002 # Check for heterogeneity H2(Pool.55002) ``` ### Choosing a distribution. Once you are happy with the pooling group, you can check for the best distribution. The `Zdists()` function can be used to do this. This compares the Generalised Extreme Value (GEV), Generalised logistic (GenLog), Gumbel and Kappa3 distributions. The method behind it is outlined in Hosking and Wallis (1997). Note that it is not the same as the method detailed in Science Report SC050050 (Kjeldsen *et al*., 2008). The difference is described in the function's help file and is necessary to encompass the two-parameter Gumbel distribution. ```{r} # Calculate goodness-of-fit measure Zdists(Pool.55002) ``` The output is a list, where the first element contains the Z-scores for each distribution, and the second element states which distribution has the best fit. In this case, the Kappa3 distribution appears to fit best. The L-CV and L-skew for each site are provided in the pooling group and are derived from the NRFA AMAX data. If the user has further information, for example, an extra year of data to provide a new annual maximum, the L-CV and L-skew can be calculated using the `Lcv()` & `LSkew()` functions. The user's data can be pasted in or read into the 'R' environment. Data can be read in using a function such as `read.csv()`, or a column of data can be pasted in from Excel by running the following code and then pressing "Ctrl+V" in the console (assuming you have already copied the required data): ```{r, eval = FALSE} MyData <- scan() ``` This creates a numeric vector called `MyData` that contains your flow data. The `Lcv()` & `LSkew()` functions take a numeric vector of flows as their only input, so you can then use `Lcv(MyData)` and `LSkew(MyData)` to derive the L-moment ratios. The relevant results can be updated in the pooling group by applying the `LRatioChange()` function. For example, if you wanted a new pooling group with updated L-moment ratios for site 55002, the following code could be used: ```{r} # Update the pooling group to use user-supplied L-CV and L-skew values for site 55002 UpdatedPool <- LRatioChange(Pool.55002, SiteID = 55002, lcv = 0.187, lskew = 0.164) # View the updated pooling group UpdatedPool ``` ## Final estimates (the growth curve and results) To derive the final estimates, the `PoolEst()` function can be applied to the pooling group. It is necessary to provide an input for the `QMED` argument. Therefore, we will first obtain a QMED estimate (in this case, we will use the data available from the NRFA). To estimate QMED, we can either use the `GetAM()` function and assess the data manually, or we can get it more directly from the `QMEDData` data frame embedded within the package (these are calculated as the median of the AMAX data (rejected years excluded) from the NRFA Peak Flow Dataset). The following code provides the latter. ```{r} # Extract QMED from the QMEDdata data frame within the UKFE package GetQMED(55002) ``` We can use functions within functions so we can include the above in `PoolEst()`: ```{r} # Derive the growth curve and final flow estimates based on the pooling group and QMED PoolEst(Pool.55002, QMED = GetQMED(55002), gauged = TRUE) ``` The results are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q), growth factor estimates (GF), and factorial standard error. The latter is calculated using methods detailed in *Circulation* edition 152 (the British Hydrological Society magazine) and you can use it to calculate confidence intervals. The second element is the estimated pooled L-CV and L-skew (linear coefficients of variation and skewness). The third and fourth elements provide parameters (location, scale, and shape) for the standardised and non-standardised distributions, respectively. If the RP argument has been supplied by the user rather than left as the default, then only the first two elements are returned. The user should be aware of the following specifics about how UKFE employs the pooled method: * The default QMED FSE is set to 1.55, as recommended in *Statistical procedures for flood frequency estimation* (*Flood Estimation Handbook*, Volume 3, Section 13.9.2). This value represents the factorial standard error associated with the combined model and sampling uncertainty of the QMED model. You can update FSE if donor catchments are applied or if the QMED is estimated from a gauged sample. * The default distribution applied is the generalised logistic distribution. * The final FSE comprises the FSE associated with QMED estimation and growth curve estimation. The FSE associated with the growth curve quantifies sampling error (aleatoric uncertainty) only. The FSE associated with QMED includes model uncertainty. In general, the uncertainty associated with the modelling process or the hydrometric data collection is not included in the FSE. Many choices are made during the process, and deriving a range of results from a range of justifiable choices is a way to quantify some of the epistemic uncertainty. * By default, urban adjustment for the L-CV and L-skew is not applied. If the user thinks these are necessary, they can set the `UrbAdj` argument to `TRUE`. The `EVPool()` function can be used to plot a growth curve (which should look like the one from the `QuickResults()` example above). ```{r fig.alt="Plot showing growth curves from FEH QuickResults calculations. Pooled estimates are shown as a solid line, single-site estimates as a dashed green line, and observed flood data as blue circles. The x-axis shows the logistic reduced variate and the y-axis shows Q/QMED, that is, flow divided by the median annual flood. A scale for the return period is displayed at the bottom right. The plot illustrates how well the pooled and single-site growth curves map onto the observed flood data. Neither curve matches all observations exactly, but both provide a reasonable fit overall. Both curves follow the expected growth-curve shape, with increasing discharge at higher return periods."} # Plot the growth curve EVPool(Pool.55002, gauged = TRUE) ``` For the **ungauged catchment**, the same process can be followed. First, we'll exclude the site from our pooling group. We will assume we went through a rigorous pooling group analysis and concluded all was well: ```{r fig.alt="Plot showing how multiple single-site growth curves are pooled to form the default ‘fake ungauged’ growth curve. The x-axis shows the logistic reduced variate and the y-axis shows Q/QMED, that is, flow divided by the median annual flood. Several solid black lines represent growth curves from individual sites, and a red dashed line represents the pooled growth curve with the target site excluded. All curves follow the expected growth-curve shape, with increasing discharge at higher return periods. The pooled curve lies within the spread of the individual site curves and, as expected, closely resembles their average, illustrating how pooling smooths variability to produce a representative ungauged growth curve."} # Recreate the pooling group, excluding NRFA gauge 55002 to make it an ungauged analysis PoolUG.55002 <- Pool(CDs.55002, exclude = 55002) # View the new pooling group PoolUG.55002 # Estimate QMED from the catchment descriptors with two donors (note that the QMED # value needs to be extracted using `$QMEDs.adj` as there are three elements to the # output when two donors are used and the PoolEst function expects just the QMED value) CDsQmed.55002 <- QMED(CDs.55002, Don2 = c(55007, 55016))$QMEDs.adj # Use the pooling group and the ungauged QMED estimate to provide the pooled estimates Results55002 <- PoolEst(PoolUG.55002, QMED = CDsQmed.55002) # Print the results Results55002 # Plot a growth curve for the pooling group EVPool(PoolUG.55002) ``` The same results are provided as for the gauged case. ## Single site analysis The quickest way to get results for a single sample are the `GEV()`, `GenLog()`, `Kappa3()`, `Gumbel()` and `GenPareto()` functions. For example, the following code can be used to obtain a 75-year return period estimate for an AMAX sample called `MyAMAX` using the generalised logistic distribution: ```{r, eval = FALSE} GenLogAM(MyAMAX, RP = 75) ``` An annual maximum series can be obtained for sites suitable for pooling using the `GetAM()` or `AMImport()` function (see the 'Data input' article). We will retrieve and plot the AMAX for our site 55002: ```{r fig.alt="Bar chart of annual maximum river flow. The x-axis shows years, and the y-axis shows peak flow in cubic meters per second. Each bar represents the highest flow in that year. The flows vary from year to year, with several notably high peaks in recent years."} # Extract the AMAX data for NRFA site 55002 AM.55002 <- GetAM(55002) # View the head of the AMAX series head(AM.55002) # Plot the AMAX data AMplot(AM.55002) ``` The `AMplot()` function returns a bar plot of the AMAX series. There are four functions for each of the GEV, GenLog, Gumbel, Kappa3, and GenPareto distributions. In the function names below, `XXX` should be replaced by the distribution prefix (`GEV`, `GenLog`, `Gumbel`, `Kapp3`, or `GenPareto`). For the Generalised Pareto distribution, the first function is named `GenParetoPOT()` instead of `GenParetoAM()`. * `XXXAM()` - estimates return levels or return periods directly from an AMAX (or POT for GenPareto) series. * `XXXEst()` - estimates return levels or return periods from user-supplied parameters. * `XXXGF()` - provides the growth factor as a function of L-CV, L-skew, and RP (the Gumbel version only requires the L-CV and RP). * `XXXPars` - estimates distribution parameters from an AMAX or POT series (using L-moments, or MLE in some cases). We can look at the fit of the distributions using the `ERPlot()` or `EVPlot()` functions. By default, `ERPlot()` compares the percentage difference between simulated results with the observed for each rank of the data. Another option (see the `ERType` argument) compares the simulated flows for each rank of the sample with the observed flows of the same rank. For both plots, 500 simulated samples are used. The function's help file provides more information, including how to compare the distribution estimated from pooling with the observed data. This is an updated version of that described in Hammond (2019). The `EVPlot()` function plots the extreme value frequency curve or growth curve with observed sample points. The uncertainty is quantified by bootstrapping. The following code generates an extreme rank plot and an extreme value plot using the GEV distribution: ```{r fig.alt="Extreme rank frequency curve illustrating uncertainty around the estimated growth curve. The x-axis shows the rank and the y-axis shows the percent difference. Observed data are shown as blue circles, plotting along the 0% line. The central modelled curve, shown as a solid line, tracks closely to the observations, while dashed lines represent the 90% bootstrapped confidence interval. The uncertainty bounds remain narrow through the central portion of the curve and expand slightly toward the lower and upper ranks."} # Generate an extreme rank plot for the AMAX data for NRFA site 55002 and the GEV # distribution and set a user-defined title (using the 'main' argument) ERPlot(AM.55002$Flow, dist = "GEV", main = "Extreme rank plot for NRFA site 55002: GEV") ``` ```{r fig.alt="Extreme rank frequency curve illustrating uncertainty around the estimated growth curve. The x-axis shows the logistic reduced variate and the y-axis shows Q/QMED, that is, flow divided by the median annual flood. Observed data are shown as blue circles, forming a concave-up increasing curve. The central modelled estimate, shown as a solid line, tracks these observations closely. Dashed lines represent the 90% bootstrapped confidence interval, which remains tight across the centre of the distribution and widens toward the extremes."} # Generate an extreme value plot for the AMAX data for NRFA site 55002 and the GEV # distribution and set a user-defined title (using the 'Title' argument) EVPlot(AM.55002$Flow, dist = "GEV", Title = "Extreme value plot for NRFA site 55002: GEV") ``` As an example, we can estimate the 100-year flow. Then, estimate the return period of the maximum observed flow. *Make sure to select the "Flow" column*. ```{r} # Estimate the 100-year flow directly from the AMAX data for NRFA site 55002 using # the GEV distribution GEVAM(AM.55002$Flow, RP = 100) # Estimate the return period of the maximum observed flow directly from the AMAX # data using the GEV distribution GEVAM(AM.55002$Flow, q = max(AM.55002$Flow)) ``` We can also estimate the parameters of the GEV from the data, using the default L-moments, and estimate quantiles as a function of return period (and vice versa) from user input parameters. ```{r} # Estimate the parameters of the GEV distribution from the AMAX data for NRFA site 55002 GEVPars55002 <- GEVPars(AM.55002$Flow) # View the parameters GEVPars55002 # Estimate the 75-year flow using these GEV parameters GEVEst(loc = GEVPars55002$Loc, scale = GEVPars55002$Scale, shape = GEVPars55002$Shape, RP = 75) ``` If we derive L-moments for the site, we can then estimate growth factors. The `LMoments()` function does this. We can also calculate L-CV and L-skew separately using the `Lcv()` and `LSkew()` functions. ```{r} # Estimate the 75-year GEV growth factor from the L-moments for the AMAX data for # NRFA site 55002 GEVGF(lcv = Lcv(AM.55002$Flow), lskew= LSkew(AM.55002$Flow), RP = 75) ``` We can estimate the 75-year flow by combining this latter result with QMED (the median of the AMAX record). ```{r} # Estimate the 75-year flow by multiplying the growth factor by the median AMAX flow GEVGF(lcv = Lcv(AM.55002$Flow), lskew= LSkew(AM.55002$Flow), RP = 75) * median(AM.55002$Flow) ``` This gives a different 75-year flow estimate from estimating it directly using the GEV parameters because here we are incorporating QMED from the data. ## Peaks over threshold A peaks over threshold (POT) series can also be obtained from a time series using the `POTextract()` function. A daily sample of the Thames at Kingston flow is included in the package for example purposes (the data frame `ThamesPQ`). The `POTextract()` function can be used on a single vector or a data frame, with dates in the first column and the time series of interest in the second. In the `ThamesPQ` data frame, the Thames flow is in the third column and the date in the first. Therefore, the POT series can be extracted as follows: ```{r fig.alt="Time series plot of flow showing extraction of peaks over threshold events. The x-axis shows time and the y-axis shows flow magnitude. Red circles mark exceedances above the blue threshold line. A green line indicates div, which defines independence between peaks; in this case it is set to the mean. Exceedances occur throughout the record, with a lull around 2005, followed by a marked increase in frequency afterwards."} # Extract a POT series from the Thames at Kingston daily mean flow data, selecting # the first and third columns from the ThamesPQ data frame which contain the date # and flow data respectively. The threshold is set as the 90th percentile. POT.Thames <- POTextract(ThamesPQ[, c(1, 3)], thresh = 0.9) # View the output POT.Thames ``` This produces a figure showing the daily mean flow data with peaks over the threshold shown as red circles. The blue line is the threshold, and the green line is the mean flow. Independent peaks are chosen above the threshold. If peaks are separated by a drop below the mean flow (green line), they are considered independent. More details can be found in the function's help file (including other options for defining independence and how to rename axis labels and the plot title). Note that you can also use the `POTt()` function for extracting peaks over threshold. It works significantly faster, but only has the option of separating peaks by the number of timesteps. The POT approach is similar to the AMAX approach, except that we need to convert from the POT RP scale to the annual RP scale. To do so, the functions have a peaks per year (ppy) argument. The `POTextract()` function provides the number of peaks per year (in this case, 1.867). To estimate the associated 100-year daily mean flow peak, the GenPareto function can be used as follows: ```{r} # Estimate the 100-year flow for the Thames at Kingston using the extracted POT series GenParetoPOT(POT.Thames$peak, ppy = 1.867, RP = 100) ``` ## Uncertainty Methods for quantifying aleatoric (sampling) uncertainty are provided here for three estimation methods: single-site, enhanced single-site (or gauged pooling) and ungauged pooling. The uncertainty is predicated on forming a sampling distribution for the statistic of interest, and in the three cases (single site, gauged pooling, and ungauged pooling), it is done using bootstrapping. In the pooling approaches (gauged and ungauged), the whole pooling group is bootstrapped, and the weighted estimates provided on every iteration. ### Single-site Extract an annual maximum sample with the following code. ```{r} # Extract the AMAX for NRFA site 55002 AM.55002 <- GetAM(55002) ``` The uncertainty for the 100-year flow can be estimated using the `Bootstrap()` function. The 95% confidence intervals are provided by default, but this can be adjusted using the `Conf` argument. The lower limit is provided in the 'Lower' column and the upper limit in the 'Upper' column. Note that the `Bootstrap()` function can also be applied for any function/statistic, and the sampling distribution developed can be returned. ```{r} # Estimate uncertainty for the GEV-estimated 100-year flow Bootstrap(AM.55002$Flow, Stat = GEVAM, RP = 100) ``` ### Ungauged pooling The `Uncertainty()` function can be used for the pooled estimates to quantify the sampling uncertainty. The estimated QMED and the pooling group are required. Furthermore, the factorial standard error needs to be provided for the ungauged case, using the `fseQMED` argument (the default is 1.55, as recommended in *Statistical procedures for flood frequency estimation* (*Flood Estimation Handbook*, Volume 3, Section 13.9.2)). Note that by default the generalised logistic distribution is selected for the estimates, but this can be changed. The following example to estimate uncertainty for the ungauged pooled estimates for NRFA site 55002 uses the catchment descriptors object created earlier (`CDs.55002`). This provides the 95% confidence interval by default, and this can be changed using the `Conf` argument. The lower limit is provided in the 'Lower' column and the upper limit in the 'Upper' column. There is an option to plot the flood frequency curve (return level plot), which includes the confidence interval. ```{r fig.alt="Return level plot showing estimated flow values for a range of return periods. The x-axis shows the return period on a logarithmic scale, and the y-axis shows flow. The central estimate is shown as a solid black line that is nearly straight, sloping upward. Dashed black lines represent the confidence interval: the lower bound lies close to horizontal at around 250, rising only slightly with increasing return period, while the upper bound increases much more rapidly, forming a steep concave-up curve. Together, the lines illustrate how design flows increase with return period and the asymmetric uncertainty in these estimates."} # Estimate QMED for site 55002 from catchment descriptors using two donors QMEDEst <- QMED(CDs.55002, Don2 = c(55007, 55016))$QMEDs.adj # Create an ungauged pooling group for site 55002 PoolUG.55002 <- Pool(CDs.55002, exclude = 55002) # Estimate the uncertainty of the pooling results Uncertainty(PoolUG.55002, qmed = QMEDEst) ``` ### Enhanced single-site The enhanced single-site case is similar to the above, but the `Gauged` argument is set to `TRUE`, and there is no need to set `qmed`; it is derived from the observed AMAX if `Gauged = TRUE`. It is important to note that when `Gauged` equals `TRUE`, the top site in the pooling group is assumed to be the "gauged" site of interest. This will happen by default because the similarity distance measure should be zero - assuming the that "gauged" site is within the `NRFAData` data frame. The following example uses the gauged pooling group we previously derived: ```{r} # Create a gauged pooling group for site 55002 Pool.55002 <- Pool(CDs = CDs.55002, exclude = c(8010, 76017)) ``` The uncertainty for a range of return levels can then be quantified as follows: ```{r fig.alt="Return level plot showing estimated flow values for a range of return periods. The x-axis shows the return period on a logarithmic scale, and the y-axis shows flow. The central estimate is shown as a solid black line, which forms a fairly straight diagonal with a moderate slope, slightly shallower than a one-to-one (x = y) line. Dashed black lines represent the confidence interval and diverge from the central estimate, forming a cone-shaped band that widens as the return period increases. The plot illustrates how design flows increase with return period and highlights the growing uncertainty at longer return periods."} # Estimate the uncertainty of the pooling results Uncertainty(Pool.55002, Gauged = TRUE) ``` ## Exporting results For use outside of 'R', outputs can be saved as objects and then written to CSV files. For example, to save the peak flow estimates from earlier: ```{r, eval = FALSE} # Save the peak flow estimates to an object called 'Results.55002' Results.55002 <- PoolEst(PoolUG.55002, QMED = CDsQmed.55002)[[1]] # Write to csv write.csv(Results.55002, "my/file/path/Results55002.csv", row.names = FALSE) ```