Help for package EnvStats

Type:

Package

Title:

Package for Environmental Statistics, Including US EPA Guidance

Version:

3.1.0

Date:

2025-04-17

Depends:

R (≥ 3.5.0)

Imports:

MASS, ggplot2, nortest

Suggests:

lattice, qcc, sp, boot, tinytest, covr, Hmisc

Description:

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

License:

GPL (≥ 3)

URL:

https://github.com/alexkowa/EnvStats, https://alexkowa.github.io/EnvStats/

LazyLoad:

yes

LazyData:

yes

NeedsCompilation:

Packaged:

2025-04-24 05:09:48 UTC; alex

Author:

Steven P. Millard [aut], Alexander Kowarik

[ctb, cre]

Maintainer:

Alexander Kowarik <alexander.kowarik@statistik.gv.at>

Repository:

CRAN

Date/Publication:

2025-04-24 12:00:07 UTC

Trichloroethylene Concentrations Before and After Remedation

Description

Trichloroethylene (TCE) concentrations (mg/L) at 10 groundwater monitoring wells before and after remediation.

Usage

data(ACE.13.TCE.df)

Format

A data frame with 20 observations on the following 3 variables.

TCE.mg.per.L: TCE concentrations
Well: a factor indicating the well number
Period: a factor indicating the period (before vs. after remediation)

Source

USACE. (2013). Environmental Quality - Environmental Statistics. Engineer Manual EM 200-1-16, 31 May 2013. Department of the Army, U.S. Army Corps of Engineers, Washington, D.C. 20314-1000, p. M-10. https://www.publications.usace.army.mil/Portals/76/Publications/EngineerManuals/EM_200-1-16.pdf.

Randomly sampled measurements of an analyte in soil samples.

Description

Analyte concentrations (\mug/g) in 11 discrete environmental soil samples.

Usage

    BJC.2000.df
    data(BJC.2000.df)

Format

A data frame with 11 observations on the following 4 variables.

Analyte.char: Character vector indicating lead concentrations. Nondetects indicated with the letter U after the measure (e.g., 0.10U)
Analyte: numeric vector indicating analyte concentration.
Censored: logical vector indicating censoring status.
Detect: numeric vector of 0s (nondetects) and 1s (detects) indicating censoring status.

Source

BJC. (2000). Improved Methods for Calculating Concentrations Used in Exposure Assessments. BJC/OR-416, Prepared by the Lockheed Martin Energy Research Corporation. Prepared for the U.S. Department of Energy Office of Environmental Management. Bechtel Jacobs Company, LLC. January, 2000. https://rais.ornl.gov/documents/bjc_or416.pdf.

Lead concentration in soil samples.

Description

Lead (Pb) concentrations (mg/kg) in 29 discrete environmental soil samples from a site suspected to be contaminated with lead.

Usage

    Beal.2010.Pb.df
    data(Beal.2010.Pb.df)

Format

A data frame with 29 observations on the following 3 variables.

Pb.char: Character vector indicating lead concentrations. Nondetects indicated with the less-than sign (e.g., <1)
Pb: numeric vector indicating lead concentration.
Censored: logical vector indicating censoring status.

Source

Beal, D. (2010). A Macro for Calculating Summary Statistics on Left Censored Environmental Data Using the Kaplan-Meier Method. Paper SDA-09, presented at Southeast SAS Users Group 2010, September 26-28, Savannah, GA. https://analytics.ncsu.edu/sesug/2010/SDA09.Beal.pdf.

Benthic Data from Monitoring Program in Chesapeake Bay

Description

Benthic data from a monitoring program in the Chesapeake Bay, Maryland, covering July 1994 - December 1991.

Usage

Benthic.df

Format

A data frame with 585 observations on the following 7 variables.

Site.ID: Site ID
Stratum: Stratum Number (101-131)
Latitude: Latitude (degrees North)
Longitude: Longitude (negative values; degrees West)
Index: Benthic Index (between 1 and 5)
Salinity: Salinity (ppt)
Silt: Silt Content (% clay in soil)

Details

Data from the Long Term Benthic Monitoring Program of the Chesapeake Bay. The data consist of measurements of benthic characteristics and a computed index of benthic health for several locations in the bay. Sampling methods and designs of the program are discussed in Ranasinghe et al. (1992).

The data represent observations collected at 585 separate point locations (sites). The sites are divided into 31 different strata, numbered 101 through 131, each strata consisting of geographically close sites of similar degradation conditions. The benthic index values range from 1 to 5 on a continuous scale, where high values correspond to healthier benthos. Salinity was measured in parts per thousand (ppt), and silt content is expressed as a percentage of clay in the soil with high numbers corresponding to muddy areas.

The United States Environmental Protection Agency (USEPA) established an initiative for the Chesapeake Bay in partnership with the states bordering the bay in 1984. The goal of the initiative is the restoration (abundance, health, and diversity) of living resources to the bay by reducing nutrient loadings, reducing toxic chemical impacts, and enhancing habitats. USEPA's Chesapeake Bay Program Office is responsible for implementing this initiative and has established an extensive monitoring program that includes traditional water chemistry sampling, as well as collecting data on living resources to measure progress towards meeting the restoration goals.

Sampling benthic invertebrate assemblages has been an integral part of the Chesapeake Bay monitoring program due to their ecological importance and their value as biological indicators. The condition of benthic assemblages is a measure of the ecological health of the bay, including the effects of multiple types of environmental stresses. Nevertheless, regional-scale assessment of ecological status and trends using benthic assemblages are limited by the fact that benthic assemblages are strongly influenced by naturally variable habitat elements, such as salinity, sediment type, and depth. Also, different state agencies and USEPA programs use different sampling methodologies, limiting the ability to integrate data into a unified assessment. To circumvent these limitations, USEPA has standardized benthic data from several different monitoring programs into a single database, and from that database developed a Restoration Goals Benthic Index that identifies whether benthic restoration goals are being met.

Source

Ranasinghe, J.A., L.C. Scott, and R. Newport. (1992). Long-term Benthic Monitoring and Assessment Program for the Maryland Portion of the Bay, Jul 1984-Dec 1991. Report prepared for the Maryland Department of the Environment and the Maryland Department of Natural Resources by Versar, Inc., Columbia, MD.

Examples

  attach(Benthic.df)

  # Show station locations
  #-----------------------
  dev.new()
  plot(Longitude, Latitude, 
      xlab = "-Longitude (Degrees West)",
      ylab = "Latitude",
      main = "Sampling Station Locations")


  # Scatterplot matrix of benthic index, salinity, and silt
  #--------------------------------------------------------
  dev.new()
  pairs(~ Index + Salinity + Silt, data = Benthic.df)


  # Contour and perspective plots based on loess fit
  # showing only predicted values within the convex hull
  # of station locations
  #-----------------------------------------------------
  library(sp)

  loess.fit <- loess(Index ~ Longitude * Latitude,
      data=Benthic.df, normalize=FALSE, span=0.25)
  lat <- Benthic.df$Latitude
  lon <- Benthic.df$Longitude
  Latitude <- seq(min(lat), max(lat), length=50)
  Longitude <- seq(min(lon), max(lon), length=50)
  predict.list <- list(Longitude=Longitude,
      Latitude=Latitude)
  predict.grid <- expand.grid(predict.list)
  predict.fit <- predict(loess.fit, predict.grid)
  index.chull <- chull(lon, lat)
  inside <- point.in.polygon(point.x = predict.grid$Longitude, 
      point.y = predict.grid$Latitude, 
      pol.x = lon[index.chull], 
      pol.y = lat[index.chull])
  predict.fit[inside == 0] <- NA

  dev.new()
  contour(Longitude, Latitude, predict.fit,
      levels=seq(1, 5, by=0.5), labcex=0.75,
      xlab="-Longitude (degrees West)",
      ylab="Latitude (degrees North)")
  title(main=paste("Contour Plot of Benthic Index",
      "Based on Loess Smooth", sep="\n"))

  dev.new()
  persp(Longitude, Latitude, predict.fit,
      xlim = c(-77.3, -75.9), ylim = c(38.1, 39.5), zlim = c(0, 6), 
      theta = -45, phi = 30, d = 0.5,
      xlab="-Longitude (degrees West)",
      ylab="Latitude (degrees North)",
      zlab="Benthic Index", ticktype = "detailed")
  title(main=paste("Surface Plot of Benthic Index",
      "Based on Loess Smooth", sep="\n"))

  detach("Benthic.df")

  rm(loess.fit, lat, lon, Latitude, Longitude, predict.list,
      predict.grid, predict.fit, index.chull, inside)

Abstract: Castillo and Hadi (1994)

Description

Detailed abstract of the manuscript:

Castillo, E., and A. Hadi. (1994). Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution. Environmetrics 5, 417–432.

Details

Abstract
Castillo and Hadi (1994) introduce a new way to estimate the parameters and quantiles of the generalized extreme value distribution (GEVD) with parameters location=\eta, scale=\theta, and shape=\kappa. The estimator is based on a two-stage procedure using order statistics, denoted here by “TSOE”, which stands for two-stage order-statistics estimator. Castillo and Hadi (1994) compare the TSOE to the maximum likelihood estimator (MLE; Jenkinson, 1969; Prescott and Walden, 1983) and probability-weighted moments estimator (PWME; Hosking et al., 1985).

Castillo and Hadi (1994) note that for some samples the likelihood may not have a local maximum, and also when \kappa > 1 the likelihood can be made infinite so the MLE does not exist. They also note, as do Hosking et al., 1985), that when \kappa \le -1, the moments and probability-weighed moments of the GEVD do not exist, hence neither does the PWME. (Hosking et al., however, claim that in practice the shape parameter usually lies between -1/2 and 1/2.) On the other hand, the TSOE exists for all values of \kappa.

Based on computer simulations, Castillo and Hadi (1994) found that the performance (bias and root mean squared error) of the TSOE is comparable to the PWME for values of \kappa in the range -1/2 \le \kappa \le 1/2. They also found that the TSOE is superior to the PWME for large values of \kappa. Their results, however, are based on using the PWME computed using the approximation given in equation (14) of Hosking et al. (1985, p.253). The true PWME is computed using equation (12) of Hosking et al. (1985, p.253). Hosking et al. (1985) introduced the approximation as a matter of computational convenience, and noted that it is valid in the range -1/2 \le \kappa \le 1/2. If Castillo and Hadi (1994) had used the true PWME for values of \kappa larger than 1/2, they probably would have gotten very different results for the PWME. (Note: the function egevd with method="pwme" uses the exact equation (12) of Hosking et al. (1985), not the approximation (14)).

Castillo and Hadi (1994) suggest using the bootstrap or jackknife to obtain variance estimates and confidence intervals for the distribution parameters based on the TSOE.

More Details Let \underline{x} = (x_1, x_2, \ldots, x_n) be a vector of n observations from a generalized extreme value distribution with parameters location=\eta, scale=\theta, and shape=\kappa with cumulative distribution function F. Also, let x(1), x(2), \ldots, x(n) denote the ordered values of \underline{x}.

First Stage
Castillo and Hadi (1994) propose as initial estimates of the distribution parameters the solutions to the following set of simultaneous equations based on just three observations from the total sample of size n:

F[x(1); \eta, \theta, \kappa] = p_{1,n}

F[x(j); \eta, \theta, \kappa] = p_{j,n}

F[x(n); \eta, \theta, \kappa] = p_{n,n} \;\;\;\; (1)

where 2 \le j \le n-1, and

p_{i,n} = \hat{F}[x(i); \eta, \theta, \kappa]

denotes the i'th plotting position for a sample of size n; that is, a nonparametric estimate of the value of F at x(i). Typically, plotting positions have the form:

p_{i,n} = \frac{i-a}{n+b} \;\;\;\; (2)

where b > -a > -1. In their simulation studies, Castillo and Hadi (1994) used a=0.35, b=0.

Since j is arbitrary in the above set of equations (1), denote the solutions to these equations by:

\hat{\eta}_j, \hat{\theta}_j, \hat{\kappa}_j

There are thus n-2 sets of estimates.

Castillo and Hadi (1994) show that the estimate of the shape parameter, \kappa, is the solution to the equation:

\frac{x(j) - x(n)}{x(1) - x(n)} = \frac{1 - A_{jn}^\kappa}{1 - A_{1n}^\kappa} \;\;\;\; (3)

where

A_{ik} = C_i / C_k \;\;\;\; (4)

C_i = -log(p_{i,n}) \;\;\;\; (5)

Castillo and Hadi (1994) show how to easily solve equation (3) using the method of bisection.

Once the estimate of the shape parameter is obtained, the other estimates are given by:

\hat{\theta}_j = \frac{\hat{\kappa}_j [x(1) - x(n)]}{(C_n)^{\hat{\kappa}_j} - (C_1)^{\hat{\kappa}_j}} \;\;\;\; (6)

\hat{\eta}_j = x(1) - \frac{\hat{\theta}_j [1 - (C_1)^{\hat{\kappa}_j}]}{\hat{\kappa}_j} \;\;\;\; (7)

Second Stage
Apply a robust function to the n-2 sets of estimates obtained in the first stage. Castillo and Hadi (1994) suggest using either the median or the least median of squares (using a column of 1's as the predictor variable; see the help file for lmsreg in the package MASS). Using the median, for example, the final distribution parameter estimates are given by:

\hat{\eta} = Median(\hat{\eta}_2, \hat{\eta}_3, \ldots, \hat{\eta}_{n-1})

\hat{\theta} = Median(\hat{\theta}_2, \hat{\theta}_3, \ldots, \hat{\theta}_{n-1})

\hat{\kappa} = Median(\hat{\kappa}_2, \hat{\kappa}_3, \ldots, \hat{\kappa}_{n-1})

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Hosking, J.R.M. (1985). Algorithm AS 215: Maximum-Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Applied Statistics 34(3), 301–310.

Jenkinson, A.F. (1969). Statistics of Extremes. Technical Note 98, World Meteorological Office, Geneva.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Prescott, P., and A.T. Walden. (1983). Maximum Likelihood Estimation of the Three-Parameter Generalized Extreme-Value Distribution from Censored Samples. Journal of Statistical Computing and Simulation 16, 241–250.

The Chi Distribution

Description

Density, distribution function, quantile function, and random generation for the chi distribution.

Usage

  dchi(x, df)
  pchi(q, df)
  qchi(p, df)
  rchi(n, df)

Arguments

x

vector of (positive) quantiles.

q

vector of (positive) quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

df

vector of (positive) degrees of freedom (> 0). Non-integer values are allowed.

Details

Elements of x, q, p, or df that are missing will cause the corresponding elements of the result to be missing.

The chi distribution with n degrees of freedom is the distribution of the positive square root of a random variable having a chi-squared distribution with n degrees of freedom.

The chi density function is given by:

f(x, \nu) = g(x^2, \nu) 2x, x > 0

where g(x,\nu) denotes the density function of a chi-square random variable with n degrees of freedom.

Value

density (dchi), probability (pchi), quantile (qchi), or random sample (rchi) for the chi distribution with df degrees of freedom.

Note

The chi distribution takes on positive real values. It is important because for a sample of n observations from a normal distribution, the sample standard deviation multiplied by the square root of the degrees of freedom \nu and divided by the true standard deviation follows a chi distribution with \nu degrees of freedom. The chi distribution is also used in computing exact prediction intervals for the next k observations from a normal distribution (see predIntNorm).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a chi distribution with 4 degrees of freedom, evaluated at 3:

  dchi(3, 4) 
  #[1] 0.1499715

  #----------

  # The 95'th percentile of a chi distribution with 10 degrees of freedom:

  qchi(.95, 10) 
  #[1] 4.278672

  #----------

  # The cumulative distribution function of a chi distribution with 
  # 5 degrees of freedom evaluated at 3:

  pchi(3, 5) 
  #[1] 0.8909358

  #----------

  # A random sample of 2 numbers from a chi distribution with 7 degrees of freedom. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rchi(2, 7) 
  #[1] 3.271632 2.035179

Data Frame Summarizing Available Probability Distributions and Estimation Methods

Description

Data frame summarizing information about available probability distributions in R and the EnvStats package, and which distributions have associated functions for estimating distribution parameters.

Usage

Distribution.df

Format

A data frame with 35 rows corresponding to 35 different available probability distributions, and 25 columns containing information associated with these probability distributions.

Name

a character vector containing the name of the probability distribution (see the column labeled Name in the table below).

Type

a character vector indicating the type of distribution (see the column labeled Type in the table below). Possible values are "Finite Discrete", "Discrete", "Continuous", and "Mixed".

Support.Min

a character vector indicating the minimum value the random variable can assume (see the column labeled Range in the table below). The reason this is a character vector instead of a numeric vector is because some distributions have a lower bound that depends on the value of a distribution parameter. For example, the minimum value for a Uniform distribution is given by the value of the parameter min.

Support.Max

a character vector indicating the maximum value the random variable can assume (see the column labeled Range in the table below). The reason this is a character vector instead of a numeric vector is because some distributions have an upper bound that depends on the value of a distribution parameter. For example, the maximum value for a Uniform distribution is given by the value of the parameter max.

Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution parameter(s) (see the column labeled Estimation Method(s) in the table below). Possible values include "mle" (maximum likelihood), "mme" (method of moments), "mmue" (method of moments based on the unbiased estimate of variance), "mvue" (minimum variance unbiased), "qmle" (quasi-mle), etc., or some combination of these. In cases where an estimator is more than one kind, a slash (/) is used to denote all methods covered by the single estimator. For example, for the Binomial distribution, the sample proportion is the maximum likelihood, method of moments, and minimum variance unbiased estimator, so this method is denoted as "mle/mme/mvue". See the help files for the specific function listed under Estimating Distribution Parameters for an explanation of each of these estimation methods.

Quantile.Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution quantiles. For many distributions, these are the same as Estimation.Method(s). See the help files for the specific function listed under Estimating Distribution Quantiles for an explanation of each of these estimation methods.

Prediction.Interval.Method(s)

a character vector indicating the names of the methods available to create prediction intervals. See the help files for the specific function listed under Prediction Intervals for an explanation of each of these estimation methods.

Singly.Censored.Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution parameter(s) for Type I singly-censored data. See the help files for the specific function listed under Estimating Distribution Parameters in the help file for Censored Data for an explanation of each of these estimation methods.

Multiply.Censored.Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution parameter(s) for Type I multiply-censored data. See the help files for the specific function listed under Estimating Distribution Parameters in the help file for Censored Data for an explanation of each of these estimation methods.

Number.parameters

a numeric vector indicating the number of parameters associated with the distribution (see the column labeled Parameters in the table below).

Parameter.1

the columns labeled Parameter.1, Parameter.2, ..., Parameter.5 are character vectors containing the names of the distribution parameters (see the column labeled Parameters in the table below). If a distribution has n parameters and n < 5, then the columns labeled Parameter.n+1, ..., Parameter.5 are empty. For example, the Normal distribution has only two parameters associated with it (mean and sd), so the fields in Parameter.3, Parameter.4, and Parameter.5 are empty.

Parameter.2

see Parameter.1

Parameter.3

see Parameter.1

Parameter.4

see Parameter.1

Parameter.5

see Parameter.1

Parameter.1.Min

the columns labeled Parameter.1.Min, Parameter.2.Min, ...,
Parameter.5.Min are character vectors containing the minimum values that can be assumed by the distribution parameters (see the column labeled Parameter Range(s) in the table below).

The reason these are character vectors instead of numeric vectors is because some parameters have a lower bound of 0 but must be strictly bigger than 0 (e.g., the parameter sd for the Normal distribution), in which case the lower bound is .Machine$double.eps, which may vary from machine to machine. Also, some parameters have a lower bound that depends on the value of another parameter. For example, the parameter max for a Uniform distribution is bounded below by the value of the parameter min.

If a distribution has n parameters and n < 5, then the columns labeled Parameter.n+1.Min, ..., Parameter.5.Min have the missing value code (NA). For example, the Normal distribution has only two parameters associated with it (mean and sd) so the fields in
Parameter.3.Min, Parameter.4.Min, and Parameter.5.Min have NAs in them.

Parameter.2.Min

see Parameter.1.Min

Parameter.3.Min

see Parameter.1.Min

Parameter.4.Min

see Parameter.1.Min

Parameter.5.Min

see Parameter.1.Min

Parameter.1.Max

the columns labeled Parameter.1.Max, Parameter.2.Max, ...,
Parameter.5.Max are character vectors containing the maximum values that can be assumed by the distribution parameters (see the column labeled Parameter Range(s) in the table below).

The reason these are character vectors instead of numeric vectors is because some parameters have an upper bound that depends on the value of another parameter. For example, the parameter min for a Uniform distribution is bounded above by the value of the parameter max.

If a distribution has n parameters and n < 5, then the columns labeled Parameter.n+1.Max, ..., Parameter.5.Max have the missing value code (NA). For example, the Normal distribution has only two parameters associated with it (mean and sd) so the fields in
Parameter.3.Max, Parameter.4.Max, and Parameter.5.Max have NAs in them.

Parameter.2.Max

see Parameter.1.Max

Parameter.3.Max

see Parameter.1.Max

Parameter.4.Max

see Parameter.1.Max

Parameter.5.Max

see Parameter.1.Max

Details

The table below summarizes the probability distributions available in R and EnvStats. For each distribution, there are four associated functions for computing density values, percentiles, quantiles, and random numbers. The form of the names of these functions are dabb, pabb, qabb, and rabb, where abb is the abbreviated name of the distribution (see table below). These functions are described in the help file with the name of the distribution (see the first column of the table below). For example, the help file for Beta describes the behavior of dbeta, pbeta, qbeta, and rbeta.

For most distributions, there is also an associated function for estimating the distribution parameters, and the form of the names of these functions is eabb, where abb is the abbreviated name of the distribution (see table below). All of these functions are listed in the help file Estimating Distribution Parameters. For example, the function ebeta estimates the shape parameters of a Beta distribution based on a random sample of observations from this distribution.

For some distributions, there are functions to estimate distribution parameters based on Type I censored data. The form of the names of these functions is eabbSinglyCensored for singly censored data and eabbMultiplyCensored for multiply censored data. All of these functions are listed under the heading Estimating Distribution Parameters in the help file Censored Data.

Table 1a. Available Distributions: Name, Abbreviation, Type, and Range

Name	Abbreviation	Type	Range
Beta	`beta`	Continuous	`[0, 1]`

Binomial	`binom`	Finite	`[0, size]`
		Discrete	(integer)

Cauchy	`cauchy`	Continuous	`(-\infty, \infty)`

Chi	`chi`	Continuous	`[0, \infty)`

Chi-square	`chisq`	Continuous	`[0, \infty)`

Exponential	`exp`	Continuous	`[0, \infty)`

Extreme	`evd`	Continuous	`(-\infty, \infty)`
Value

F	`f`	Continuous	`[0, \infty)`

Gamma	`gamma`	Continuous	`[0, \infty)`

Gamma	`gammaAlt`	Continuous	`[0, \infty)`
(Alternative)

Generalized	`gevd`	Continuous	`(-\infty, \infty)`
Extreme			for `shape = 0`
Value
			`(-\infty, location + \frac{scale}{shape}]`
			for `shape > 0`

			`[location + \frac{scale}{shape}, \infty)`
			for `shape < 0`

Geometric	`geom`	Discrete	`[0, \infty)`
			(integer)

Hypergeometric	`hyper`	Finite	`[0, min(k,m)]`
		Discrete	(integer)

Logistic	`logis`	Continuous	`(-\infty, \infty)`

Lognormal	`lnorm`	Continuous	`(0, \infty)`

Lognormal	`lnormAlt`	Continuous	`(0, \infty)`
(Alternative)

Lognormal	`lnormMix`	Continuous	`(0, \infty)`
Mixture

Lognormal	`lnormMixAlt`	Continuous	`(0, \infty)`
Mixture
(Alternative)

Three-	`lnorm3`	Continuous	`[threshold, \infty)`
Parameter
Lognormal

Truncated	`lnormTrunc`	Continuous	`[min, max]`
Lognormal

Truncated	`lnormTruncAlt`	Continuous	`[min, max]`
Lognormal
(Alternative)

Negative	`nbinom`	Discrete	`[0, \infty)`
Binomial			(integer)

Normal	`norm`	Continuous	`(-\infty, \infty)`

Normal	`normMix`	Continuous	`(-\infty, \infty)`
Mixture

Truncated	`normTrunc`	Continuous	`[min, max]`
Normal

Pareto	`pareto`	Continuous	`[location, \infty)`

Poisson	`pois`	Discrete	`[0, \infty)`
			(integer)

Student's t	`t`	Continuous	`(-\infty, \infty)`

Triangular	`tri`	Continuous	`[min, max]`

Uniform	`unif`	Continuous	`[min, max]`

Weibull	`weibull`	Continuous	`[0, \infty)`

Wilcoxon	`wilcox`	Finite	`[0, m n]`
Rank Sum		Discrete	(integer)

Zero-Modified	`zmlnorm`	Mixed	`[0, \infty)`
Lognormal
(Delta)

Zero-Modified	`zmlnormAlt`	Mixed	`[0, \infty)`
Lognormal
(Delta)
(Alternative)

Zero-Modified	`zmnorm`	Mixed	`(-\infty, \infty)`
Normal

Table 1b. Available Distributions: Name, Parameters, Parameter Default Values, Parameter Ranges, Estimation Method(s)

		Default	Parameter	Estimation
Name	Parameter(s)	Value(s)	Range(s)	Method(s)
Beta	`shape1`		`(0, \infty)`	mle, mme, mmue
	`shape2`		`(0, \infty)`
	`ncp`	`0`	`(0, \infty)`

Binomial	`size`		`[0, \infty)`	mle/mme/mvue
	`prob`		`[0, 1]`

Cauchy	`location`	`0`	`(-\infty, \infty)`
	`scale`	`1`	`(0, \infty)`

Chi	`df`		`(0, \infty)`

Chi-square	`df`		`(0, \infty)`
	`ncp`	`0`	`(-\infty, \infty)`

Exponential	`rate`	`1`	`(0, \infty)`	mle/mme

Extreme	`location`	`0`	`(-\infty, \infty)`	mle, mme, mmue, pwme
Value	`scale`	`1`	`(0, \infty)`

F	`df1`		`(0, \infty)`
	`df2`		`(0, \infty)`
	`ncp`	`0`	`(0, \infty)`

Gamma	`shape`		`(0, \infty)`	mle, bcmle, mme, mmue
	`scale`	`1`	`(0, \infty)`

Gamma	`mean`		`(0, \infty)`	mle, bcmle, mme, mmue
(Alternative)	`cv`	`1`	`(0, \infty)`

Generalized	`location`	`0`	`(-\infty, \infty)`	mle, pwme, tsoe
Extreme	`scale`	`1`	`(0, \infty)`
Value	`shape`	`0`	`(-\infty, \infty)`

Geometric	`prob`		`(0, 1)`	mle/mme, mvue

Hypergeometric	`m`		`[0, \infty)`	mle, mvue
	`n`		`[0, \infty)`
	`k`		`[1, m+n]`

Logistic	`location`	`0`	`(-\infty, \infty)`	mle, mme, mmue
	`scale`	`1`	`(0, \infty)`

Lognormal	`meanlog`	`0`	`(-\infty, \infty)`	mle/mme, mvue
	`sdlog`	`1`	`(0, \infty)`

Lognormal	`mean`	`exp(1/2)`	`(0, \infty)`	mle, mme, mmue,
(Alternative)	`cv`	`sqrt(exp(1)-1)`	`(0, \infty)`	mvue, qmle

Lognormal	`meanlog1`	`0`	`(-\infty, \infty)`
Mixture	`sdlog1`	`1`	`(0, \infty)`
	`meanlog2`	`0`	`(-\infty, \infty)`
	`sdlog2`	`1`	`(0, \infty)`
	`p.mix`	`0.5`	`[0, 1]`

Lognormal	`mean1`	`exp(1/2)`	`(0, \infty)`
Mixture	`cv1`	`sqrt(exp(1)-1)`	`(0, \infty)`
(Alternative)	`mean2`	`exp(1/2)`	`(0, \infty)`
	`cv2`	`sqrt(exp(1)-1)`	`(0, \infty)`
	`p.mix`	`0.5`	`[0, 1]`

Three-	`meanlog`	`0`	`(-\infty, \infty)`	lmle, mme,
Parameter	`sdlog`	`1`	`(0, \infty)`	mmue, mmme,
Lognormal	`threshold`	`0`	`(-\infty, \infty)`	royston.skew,
				zero.skew

Truncated	`meanlog`	`0`	`(-\infty, \infty)`
Lognormal	`sdlog`	`1`	`(0, \infty)`
	`min`	`0`	`[0, max)`
	`max`	`Inf`	`(min, \infty)`

Truncated	`mean`	`exp(1/2)`	`(0, \infty)`
Lognormal	`cv`	`sqrt(exp(1)-1)`	`(0, \infty)`
(Alternative)	`min`	`0`	`[0, max)`
	`max`	`Inf`	`(min, \infty)`

Negative	`size`		`[1, \infty)`	mle/mme, mvue
Binomial	`prob`		`(0, 1]`
	`mu`		`(0, \infty)`

Normal	`mean`	`0`	`(-\infty, \infty)`	mle/mme, mvue
	`sd`	`1`	`(0, \infty)`

Normal	`mean1`	`0`	`(-\infty, \infty)`
Mixture	`sd1`	`1`	`(0, \infty)`
	`mean2`	`0`	`(-\infty, \infty)`
	`sd2`	`1`	`(0, \infty)`
	`p.mix`	`0.5`	`[0, 1]`

Truncated	`mean`	`0`	`(-\infty, \infty)`
Normal	`sd`	`1`	`(0, \infty)`
	`min`	`-Inf`	`(-\infty, max)`
	`max`	`Inf`	`(min, \infty)`

Pareto	`location`		`(0, \infty)`	lse, mle
	`shape`	`1`	`(0, \infty)`

Poisson	`lambda`		`(0, \infty)`	mle/mme/mvue

Student's t	`df`		`(0, \infty)`
	`ncp`	`0`	`(-\infty, \infty)`

Triangular	`min`	`0`	`(-\infty, max)`
	`max`	`1`	`(min, \infty)`
	`mode`	`0.5`	`(min, max)`

Uniform	`min`	`0`	`(-\infty, max)`	mle, mme, mmue
	`max`	`1`	`(min, \infty)`

Weibull	`shape`		`(0, \infty)`	mle, mme, mmue
	`scale`	`1`	`(0, \infty)`

Wilcoxon	`m`		`[1, \infty)`
Rank Sum	`n`		`[1, \infty)`

Zero-Modified	`meanlog`	`0`	`(-\infty, \infty)`	mvue
Lognormal	`sdlog`	`1`	`(0, \infty)`
(Delta)	`p.zero`	`0.5`	`[0, 1]`

Zero-Modified	`mean`	`exp(1/2)`	`(0, \infty)`	mvue
Lognormal	`cv`	`sqrt(exp(1)-1)`	`(0, \infty)`
(Delta)	`p.zero`	`0.5`	`[0, 1]`
(Alternative)

Zero-Modified	`mean`	`0`	`(-\infty, \infty)`	mvue
Normal	`sd`	`1`	`(0, \infty)`
	`p.zero`	`0.5`	`[0, 1]`

Source

The EnvStats package.

References

Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York. https://link.springer.com/book/10.1007/978-1-4614-8456-1.

Concentrations in Exhibit 2 of 2002d USEPA Guidance Document

Description

Concentrations (mug/L) from an exposure unit.

Usage

data(EPA.02d.Ex.2.ug.per.L.vec)

Format

a numeric vector of concentrations (mug/L)

Source

USEPA. (2002d). Calculating Upper Confidence Limits for Exposure Point Concentrations at Hazardous Waste Sites. OSWER 9285.6-10, December 2002. Office of Emergency and Remedial Response, U.S. Environmental Protection Agency, Washington, D.C., p. 9.

Concentrations in Exhibit 4 of 2002d USEPA Guidance Document

Description

Concentrations (mg/kg) from an exposure unit.

Usage

data(EPA.02d.Ex.4.mg.per.kg.vec)

Format

a numeric vector of concentrations (mg/kg)

Source

Concentrations in Exhibit 6 of 2002d USEPA Guidance Document

Description

Concentrations (mg/kg) from an exposure unit.

Usage

data(EPA.02d.Ex.6.mg.per.kg.vec)

Format

a numeric vector of concentrations (mg/kg)

Source

Concentrations in Exhibit 9 of 2002d USEPA Guidance Document

Description

Concentrations (mg/L) from an exposure unit.

Usage

data(EPA.02d.Ex.9.mg.per.L.vec)

Format

a numeric vector of concentrations (mg/L)

Source

Nickel Concentrations from Example 10-1 of 2009 USEPA Guidance Document

Description

Nickel concentrations (ppb) from four wells (five observations per year for each well). The Guidance Document has the label “Year” instead of “Well”; corrected in Errata.

Usage

EPA.09.Ex.10.1.nickel.df

Format

A data frame with 20 observations on the following 3 variables.

Month: a numeric vector indicating the month the sample was taken
Well: a factor indicating the well number
Nickel.ppb: a numeric vector of nickel concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery, Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C., p.10-12.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Arsenic Concentrations from Example 11-1 of 2009 USEPA Guidance Document

Description

Arsenic concentrations (ppb) at six wells (four observations per well).

Usage

EPA.09.Ex.11.1.arsenic.df

Format

A data frame with 24 observations on the following 3 variables.

Arsenic.ppb: a numeric vector of arsenic concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.11-3.

Carbon Tetrachloride Concentrations from Example 12-1 of 2009 USEPA Guidance Document

Description

Carbon tetrachloride (CCL4) concentrations (ppb) at five background wells (four measures at each well).

Usage

EPA.09.Ex.12.1.ccl4.df

Format

A data frame with 20 observations on the following 2 variables.

Well: a factor indicating the well number
CCL4.ppb: a numeric vector of CCL4 concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.12-3.

Naphthalene Concentrations from Example 12-4 of 2009 USEPA Guidance Document

Description

Naphthalene concentrations (ppb) at five background wells (five quarterly measures at each well).

Usage

EPA.09.Ex.12.4.naphthalene.df

Format

A data frame with 25 observations on the following 3 variables.

Quarter: a numeric vector indicating the quarter the sample was taken
Well: a factor indicating the well number
Naphthalene.ppb: a numeric vector of naphthalene concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.12-12.

Iron Concentrations from Example 13-1 of 2009 USEPA Guidance Document

Description

Dissolved iron (Fe) concentrations (ppm) at six upgradient wells (four quarterly measures at each well).

Usage

EPA.09.Ex.13.1.iron.df

Format

A data frame with 24 observations on the following 4 variables.

Month: a numeric vector indicating the month the sample was taken
Year: a numeric vector indicating the year the sample was taken
Well: a factor indicating the well number
Iron.ppm: a numeric vector if iron concentrations (ppm)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.13-3.

Manganese Concentrations from Example 14-1 of 2009 USEPA Guidance Document

Description

Manganese concentrations (ppm) at four background wells (eight quarterly measures at each well).

Usage

EPA.09.Ex.14.1.manganese.df

Format

A data frame with 32 observations on the following 3 variables.

Quarter: a numeric vector indicating the quarter the sample was taken
Well: a factor indicating the well number
Manganese.ppm: a numeric vector of manganese concentrations (ppm)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.14-5.

Alkalinity Measures from Example 14-3 of 2009 USEPA Guidance Document

Description

Alkalinity measures (mg/L) collected from leachate at a solid waste landfill during a four and a half year period.

Usage

EPA.09.Ex.14.3.alkalinity.df

Format

A data frame with 54 observations on the following 2 variables.

Date: a Date object indicating the date of collection
Alkalinity.mg.per.L: a numeric vector of alkalinity measures (mg/L)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.14-14.

Arsenic Concentrations from Example 14-4 of 2009 USEPA Guidance Document

Description

Sixteen quarterly measures of arsenic concentrations (ppb).

Usage

EPA.09.Ex.14.4.arsenic.df

Format

A data frame with 16 observations on the following 4 variables.

Sample.Date: a factor indicating the month and year of collection
Month: a factor indicating the month of collection
Year: a factor indicating the year of collection
Arsenic.ppb: a numeric vector of arsenic concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.14-18.

Analyte Concentrations from Example 14-8 of 2009 USEPA Guidance Document

Description

Monthly unadjusted and adjusted analyte concentrations over a 3-year period. Adjusted concentrations are computed by subtracting the monthly mean and adding the overall mean.

Usage

EPA.09.Ex.14.8.df

Format

A data frame with 36 observations on the following 4 variables.

Month: a factor indicating the month of collection
Year: a numeric vector indicating the year of collection
Unadj.Conc: a numeric vector of unadjusted concentrations
Adj.Conc: a numeric vector adjusted concentrations

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.14-32.

Manganese Concentrations from Example 15-1 of 2009 USEPA Guidance Document

Description

Manganese concentrations (ppb) at five background wells (five measures at each well).

Usage

EPA.09.Ex.15.1.manganese.df

Format

A data frame with 25 observations on the following 5 variables.

Sample: a numeric vector indicating the sample number (1-5)
Well: a factor indicating the well number
Manganese.Orig.ppb: a character vector of the original manganese concentrations (ppb)
Manganese.ppb: a numeric vector of manganese concentrations with non-detects coded to their detecion limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.15-10.

Sulfate Concentrations from Example 16-1 of 2009 USEPA Guidance Document

Description

Sulfate concentrations (ppm) at one background well and one downgradient well (eight quarterly measures at each well).

Usage

EPA.09.Ex.16.1.sulfate.df

Format

A data frame with 16 observations on the following 4 variables.

Month: a factor indicating the month of collection
Year: a factor indicating the year of collection
Well.type: a factor indicating the well type (background vs. downgradient)
Sulfate.ppm: a numeric vector of sulfate concentrations (ppm)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.16-6.

Benzene Concentrations from Example 16-2 of 2009 USEPA Guidance Document

Description

Benzene concentrations (ppb) at one background and one downgradient well (eight monthly measures at each well).

Usage

EPA.09.Ex.16.2.benzene.df

Format

A data frame with 16 observations on the following 3 variables.

Month: a factor indicating the month of collection
Well.type: a factor indicating the well type (background vs. downgradient)
Benzene.ppb: a numeric vector of benzene concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.16-9.

Copper Concentrations from Example 16-4 of 2009 USEPA Guidance Document

Description

Copper concentrations (ppb) at two background wells and one compliance well (six measures at each well).

Usage

EPA.09.Ex.16.4.copper.df

Format

A data frame with 18 observations on the following 4 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Copper.ppb: a numeric vector of copper concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.16-19.

Tetrachloroethylene Concentrations from Example 16-5 of 2009 USEPA Guidance Document

Description

Tetrachloroethylene (PCE) concentrations (ppb) at one background well and one compliance well.

Usage

EPA.09.Ex.16.5.PCE.df

Format

A data frame with 14 observations on the following 4 variables.

Well.type: a factor with levels Background Compliance
PCE.Orig.ppb: a character vector of original PCE concentrations (ppb)
PCE.ppb: a numeric vector of PCE concentrations (ppb) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.16-22.

Log-transformed Lead Concentrations from Example 17-1 of 2009 USEPA Guidance Document

Description

Log-transformed lead concentrations (ppb) at two background and four compliance wells (four quarterly measures at each well).

Usage

EPA.09.Ex.17.1.loglead.df

Format

A data frame with 24 observations on the following 4 variables.

Month: a factor indicating the month of collection; 1 = Jan, 2 = Apr, 3 = Jul, 4 = Oct
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
LogLead: a numeric vector of log-transformed lead concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-7.

Toluene Concentrations from Example 17-2 of 2009 USEPA Guidance Document

Description

Toluene concentrations (ppb) at two background and three compliance wells (five monthly measures at each well).

Usage

EPA.09.Ex.17.2.toluene.df

Format

A data frame with 25 observations on the following 6 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Toluene.ppb.orig: a character vector of original toluene concentrations (ppb)
Toluene.ppb: a numeric vector of toluene concentrations (ppb) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-13.

Chrysene Concentrations from Example 17-3 of 2009 USEPA Guidance Document

Description

Chrysene concentrations (ppb) at two background and three compliance wells (four monthly measures at each well).

Usage

EPA.09.Ex.17.3.chrysene.df

Format

A data frame with 20 observations on the following 4 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Chrysene.ppb: a numeric vector of chrysene concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-17.

Log-transformed Chrysene Concentrations from Example 17-3 of 2009 USEPA Guidance Document

Description

Log-transformed chrysene concentrations (ppb) at two background and three compliance wells (four monthly measures at each well).

Usage

EPA.09.Ex.17.3.log.chrysene.df

Format

A data frame with 20 observations on the following 4 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Log.Chrysene.ppb: a numeric vector of log-transformed chrysene concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-18.

Copper Concentrations from Example 17-4 of 2009 USEPA Guidance Document

Description

Copper concentrations (ppb) at three background and two compliance wells (eight monthly measures at the background wells, four monthly measures at the compliance wells).

Usage

EPA.09.Ex.17.4.copper.df

Format

A data frame with 40 observations on the following 6 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Copper.ppb.orig: a character vector of original copper concentrations (ppb)
Copper.ppb: a numeric vector of copper concentrations with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-21.

Chloride Concentrations from Example 17-5 of 2009 USEPA Guidance Document

Description

Chloride concentrations (ppm) collected over a five-year period at a solid waste landfill.

Usage

EPA.09.Ex.17.5.chloride.df

Format

A data frame with 19 observations on the following 4 variables.

Date: a Date object indicating the date of collection
Chloride.ppm: a numeric vector of chloride concentrations (ppm)
Elapsed.Days: a numeric vector indicating the number of days since January 1, 2002
Residuals: a numeric vector of residuals from a linear regression trend fit

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-26.

Sulfate Concentrations from Example 17-6 of 2009 USEPA Guidance Document

Description

Sulfate concentrations (ppm) collected over several years. The date of collection is simply indicated by month and year of collection. The column Date is a Date object where the day of the month has been arbitrarily set to 1.

Usage

EPA.09.Ex.17.6.sulfate.df

Format

A data frame with 23 observations on the following 6 variables.

Sample.No: a numeric vector indicating the sample number
Year: a numeric vector indicating the year of collection
Month: a numeric vector indicating the month of collection
Sampling.Date: a numeric vector indicating the year and month of collection
Date: a Date object indicating the date of collection, where the day of the month is arbitrarily set to 1
Sulfate.ppm: a numeric vector of sulfate concentrations (ppm)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-33.

Sodium Concentrations from Example 17-7 of 2009 USEPA Guidance Document

Description

Sodium concentrations (ppm) collected over several years. The sample dates are recorded as the year of collection (2-digit format) plus a fractional part indicating when during the year the sample was collected.

Usage

EPA.09.Ex.17.7.sodium.df

Format

A data frame with 10 observations on the following 2 variables.

Year: a numeric vector indicating the year of collection (a fractional number)
Sodium.ppm: a numeric vector of sodium concentrations (ppm)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-36.

Arsenic Concentrations from Example 18-1 of 2009 USEPA Guidance Document

Description

Arsenic concentrations (ppb) in a single well at a solid waste landfill. Four observations per year over four years. Years 1-3 are the background period and Year 4 is the compliance period.

Usage

EPA.09.Ex.18.1.arsenic.df

Format

A data frame with 16 observations on the following 3 variables.

Year: a factor indicating the year of collection
Sampling.Period: a factor indicating the sampling period (background vs. compliance)
Arsenic.ppb: a numeric vector of arsenic concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.18-10.

Chrysene Concentrations from Example 18-2 of 2009 USEPA Guidance Document

Description

Chrysene concentrations (ppb) at two background wells and one compliance well (four monthly measures at each well).

Usage

EPA.09.Ex.18.2.chrysene.df

Format

A data frame with 12 observations on the following 4 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Chrysene.ppb: a numeric vector of chrysene concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.18-15.

Trichloroethylene Concentrations from Example 18-3 of 2009 USEPA Guidance Document

Description

Trichloroethylene (TCE) concentrations (ppb) at three background wells and one compliance well. Six monthly measures at each background well, three monthly measures at the compliance well.

Usage

EPA.09.Ex.18.3.TCE.df

Format

A data frame with 24 observations on the following 6 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
TCE.ppb.orig: a character vector of original TCE concentrations (ppb)
TCE.ppb: a numeric vector of TCE concentrations (ppb) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.18-19.

Xylene Concentrations from Example 18-4 of 2009 USEPA Guidance Document

Description

Xylene concentrations (ppb) at three background wells and one compliance well. Eight monthly measures at each complaince well; three monthly measures at the compliance well.

Usage

EPA.09.Ex.18.4.xylene.df

Format

A data frame with 32 observations on the following 6 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Xylene.ppb.orig: a character vector of original xylene concentrations (ppb)
Xylene.ppb: a numeric vector of xylene concentrations (ppb) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.18-22.

Sulfate Concentrations from Example 19-1 of 2009 USEPA Guidance Document

Description

Sulfate concentrations (mg/L) at four background wells.

Usage

EPA.09.Ex.19.1.sulfate.df

Format

A data frame with 25 observations on the following 7 variables.

Well: a factor indicating the well number
Month: a numeric vector indicating the month of collection
Day: a numeric vector indicating the day of the month of collection
Year: a numeric vector indicating the year of collection
Date: a Date object indicating the date of collection
Sulfate.mg.per.l: a numeric vector of sulfate concentrations (mg/L)
log.Sulfate.mg.per.l: a numeric vector of log-transformed sulfate concentrations (mg/L)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.19-17.

Chloride Concentrations from Example 19-2 of 2009 USEPA Guidance Document

Description

Chloride concentrations (mg/L) at 10 compliance wells at a solid waste landfill. One year of quarterly measures at each well.

Usage

EPA.09.Ex.19.2.chloride.df

Format

A data frame with 40 observations on the following 2 variables.

Well: a factor indicating the well number
Chloride.mg.per.l: a numeric vector of chloride concentrations (mg/L)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.19-19.

Mercury Concentrations from Example 19-5 of 2009 USEPA Guidance Document

Description

Mercury concentrations (ppb) at four background and two compliance wells.

Usage

EPA.09.Ex.19.5.mercury.df

Format

A data frame with 36 observations on the following 6 variables.

Event: a factor indicating the time of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)
Mercury.ppb.orig: a character vector of original mercury concentrations (ppb)
Mercury.ppb: a numeric vector of mercury concentrations (ppb) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.19-33.

Nickel Concentrations from Example 20-1 of 2009 USEPA Guidance Document

Description

Nickel concentrations (ppb) at a single well. Eight monthly measures during the background period and eight monthly measures during the compliance period.

Usage

EPA.09.Ex.20.1.nickel.df

Format

A data frame with 16 observations on the following 4 variables.

Month: a factor indicating the month of collection
Year: a factor indicating the year of collection
Period: a factor indicating the period (baseline vs. compliance)
Nickel.ppb: a numeric vector of nickel concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.20-4.

Aldicarb Concentrations from Example 21-1 of 2009 USEPA Guidance Document

Description

Aldicarb concentrations (ppb) at three compliance wells (four monthly measures at each well).

Usage

EPA.09.Ex.21.1.aldicarb.df

Format

A data frame with 12 observations on the following 3 variables.

Month: a factor indicating the month of collection
Well: a factor indicating the well number
Aldicarb.ppb: a numeric vector of aldicarb concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-4.

Benzene Concentrations from Example 21-2 of 2009 USEPA Guidance Document

Description

Benzene concentrations (ppb) collected at a landfill that previously handled smelter waste and is now undergoing remediation efforts.

Usage

EPA.09.Ex.21.2.benzene.df

Format

A data frame with 8 observations on the following 4 variables.

Month: a numeric vector indicating the month of collection
Benzene.ppb.orig: a character vector of original benzene concentrations (ppb)
Benzene.ppb: a numeric vector of benzene concentrations (ppb) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-7.

Beryllium Concentrations from Example 21-5 of 2009 USEPA Guidance Document

Description

Beryllium concentrations (ppb) at one well (four years of quarterly measures).

Usage

data(EPA.09.Ex.21.5.beryllium.df)

Format

A data frame with 16 observations on the following 3 variables.

Year: a factor indicating the year of collection
Quarter: a factor indicating the quarter of collection
Beryllium.ppb: a numeric vector of beryllium concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-18.

Nitrate Concentrations from Example 21-6 of 2009 USEPA Guidance Document

Description

Nitrate concentrations (mg/L) at a well used for drinking water.

Usage

EPA.09.Ex.21.6.nitrate.df

Format

A data frame with 12 observations on the following 5 variables.

Sampling.Date: a character vector indicating the sampling date
Date: a Date object indicating the sampling date
Nitrate.mg.per.l.orig: a character vector of original nitrate concentrations (mg/L)
Nitrate.mg.per.l: a numeric vector of nitrate concentrations (mg/L) with nondetects set to their detection limit
Censored: a logical vector indicating which observations are censored

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-22.

Trichloroethylene Concentrations from Example 21-7 of 2009 USEPA Guidance Document

Description

Trichloroethylene (TCE) concentrations (ppb) at a site undergoing remediation.

Usage

EPA.09.Ex.21.7.TCE.df

Format

A data frame with 10 observations on the following 2 variables.

Month: a numeric vector indicating the month of collection
TCE.ppb: a numeric vector of TCE concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-26.

Vinyl Chloride Concentrations from Example 22-1 of 2009 USEPA Guidance Document

Description

Vinyl Chloride (VC) concentrations (ppb) during detection monitoring for two compliance wells. Four years of quarterly measures at each well. Compliance monitoring began with Year 2 of the sampling record.

Usage

EPA.09.Ex.22.1.VC.df

Format

A data frame with 32 observations on the following 5 variables.

Year: a factor indicating the year of collection
Quarter: a factor indicating the quarter of collection
Period: a factor indicating the period (background vs. compliance)
Well: a factor indicating the well number
VC.ppb: a numeric vector of VC concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.22-6.

Specific Conductance from Example 22-2 of 2009 USEPA Guidance Document

Description

Specific conductance (\mumho) collected over several years at two wells at a hazardous waste facility.

Usage

EPA.09.Ex.22.2.Specific.Conductance.df

Format

A data frame with 43 observations on the following 3 variables.

Well: a factor indicating the well number
Date: a Date object indicating the date of collection
Specific.Conductance.umho: a numeric vector of specific conductance (\mumho)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.22-11.

Sulfate Concentrations from Example 6-3 of 2009 USEPA Guidance Document

Description

Sulfate concentrations (ppm) at two background wells (five quarterly measures at each well).

Usage

EPA.09.Ex.6.3.sulfate.df

Format

A data frame with 10 observations on the following 4 variables.

Month: a numeric vector indicating the month the observations was taken
Year: a numeric vector indicating the year the observation was taken
Well: a factor indicating the well number
Sulfate.ppm: a numeric vector of sulfate concentrations (ppm)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.6-20.

Arsenic concentrations from Example 7.1 of 2009 USEPA Guidance Document

Description

Arsenic concentrations (\mug/L) at a single well, consisting of: 8 historical observations, 4 future observations for Case 1, and 4 future observations for Case 2.

Usage

EPA.09.Ex.7.1.arsenic.df

Format

A data frame with 16 observations on the following 2 variables.

Data.Source: a factor with levels Historical, Case.1, Case.2
Arsenic.ug.per.l: a numeric vector of arsenic concentrations (\mug/L)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.7-26.

Trichloroethene concentrations in Table 9.1 of 2009 USEPA Guidance Document

Description

Time series of trichloroethene (TCE) concentrations (mg/L) taken at 2 separate wells. Some observations are annotated with a data qualifier of U (nondetect) or J (estimated detected concentration).

Usage

EPA.09.Table.9.1.TCE.df

Format

A data frame with 30 observations on the following 5 variables.

Date.Collected: a factor indicating the date of collection
Date: a Date object indicating the date of collection
Well: a factor indicating the well number
TCE.mg.per.L: a numeric vector indicating the TCE concnetrations (mg/L)
Data.Qualifier: a factor indicating the data qualifier

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.9-3.

Arsenic, Mercury and Strontium Concentrations in Table 9-3 of 2009 USEPA Guidance Document

Description

Arsenic, mercury, and strontium concentrations (mg/L) from a single well collected approximately quarterly. Nondetects are indicated by the data qualifier U.

Usage

EPA.09.Table.9.3.df

Format

A data frame with 15 observations on the following 8 variables.

Date.Collected: a factor indicating the date of collection
Date: a Date object indicating the date of collection
Arsenic.mg.per.L: a numeric vector of arsenic concentrations (mg/L)
Arsenic.Data.Qualifier: a factor indicating the data qualifier for arsenic
Mercury.mg.per.L: a numeric vector of mercury concentrations (mg/L)
Mercury.Data.Qualifier: a factor indicating the data qualifier for mercury
Strontium.mg.per.L: a numeric vector of strontium concentrations
Strontium.Data.Qualifier: a factor indicating the data qualifier for strontium

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.9-13.

Nickel Concentrations in Table 9-4 of 2009 USEPA Guidance Document

Description

Nickel concentrations (ppb) from a single well.

Usage

EPA.09.Table.9.4.nickel.vec

Format

a numeric vector of nickel concentrations (ppb)

Source

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.9-18.

Aldicarb Concentrations from 1989 USEPA Guidance Document

Description

Aldicarb concentrations (ppb) at three compliance wells (four monthly samples at each well).

Usage

EPA.89b.aldicarb1.df

Format

A data frame with 12 observations on the following 3 variables.

Aldicarb: Aldicarb concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

USEPA. (1989b). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities, Interim Final Guidance. EPA/530-SW-89-026. Office of Solid Waste, U.S. Environmental Protection Agency, Washington, D.C. p.6-4.

Aldicarb Concentrations from 1989 USEPA Guidance Document

Description

Aldicarb concentrations (ppm) at three compliance wells (four monthly samples at each well).

Usage

EPA.89b.aldicarb2.df

Format

A data frame with 12 observations on the following 3 variables.

Aldicarb: Aldicarb concentrations (ppm)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Benzene Concentrations from 1989 USEPA Guidance Document

Description

Benzene concentrations (ppm) at one background and five compliance wells (four monthly samples for each well).

Usage

EPA.89b.benzene.df

Format

A data frame with 24 observations on the following 6 variables.

Benzene.orig: a character vector of the original observations
Benzene: a numeric vector with <1 observations coded as 1
Censored: a logical vector indicating which observations are censored
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Cadmium Concentrations from 1989 USEPA Guidance Document

Description

Cadmium concentrations (mg/L) at one set of background and one set of compliance wells. Nondetects reported as "BDL". Detection limit not given.

Usage

EPA.89b.cadmium.df

Format

A data frame with 88 observations on the following 4 variables.

Cadmium.orig: a character vector of the original cadmium observations (mg/L)
Cadmium: a numeric vector with BDL coded as 0
Censored: a logical vector indicating which observations are censored
Well.type: a factor indicating the well type (background vs. compliance)

Source

Chlordane Concentrations from 1989 USEPA Guidance Document

Description

Chlordane concentrations (ppm) in 24 water samples. Two possible phases: dissolved (18 observations) and immiscible (6 observations).

Usage

EPA.89b.chlordane1.df

Format

A data frame with 24 observations on the following 2 variables.

Chlordane: Chlordane concentrations (ppm)
Phase: a factor indicating the phase (dissolved vs. immiscible)

Source

Chlordane Concentrations from 1989 USEPA Guidance Document

Description

Chlordane concentrations (ppb) at one background and one compliance well. Observations taken during four separate months over two years. Four replicates taken for each “month/year/well type” combination.

Usage

data(EPA.89b.chlordane2.df)

Format

A data frame with 32 observations on the following 5 variables.

Chlordane: Chlordane concentration (ppb)
Month: a factor indicating the month of collection
Year: a numeric vector indicating the year of collection (85 or 86)
Replicate: a factor indicating the replicate number
Well.type: a factor indicating the well type (background vs. compliance)

Source

EDB Concentrations from 1989 USEPA Guidance Document

Description

EDB concentrations (ppb) at three compliance wells (four monthly samples at each well).

Usage

EPA.89b.edb.df

Format

A data frame with 12 observations on the following 3 variables.

EDB: EDB concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Lead Concentrations from 1989 USEPA Guidance Document

Description

Lead concentrations (ppm) at two background and four compliance wells (four monthly samples for each well).

Usage

EPA.89b.lead.df

Format

A data frame with 24 observations on the following 4 variables.

Lead: Lead concentrations (ppm)
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Log-transformed Lead Concentrations from 1989 USEPA Guidance Document

Description

Log-transformed lead concentrations (\mug/L) at two background and four compliance wells (four monthly samples for each well).

Usage

EPA.89b.loglead.df

Format

A data frame with 24 observations on the following 4 variables.

LogLead: Natural logarithm of lead concentrations (\mug/L)
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Manganese Concentrations from 1989 USEPA Guidance Document

Description

Manganese concentrations at six monitoring wells (four monthly samples for each well).

Usage

EPA.89b.manganese.df

Format

A data frame with 24 observations on the following 3 variables.

Manganese: Manganese concentrations
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Sulfate Concentrations from 1989 USEPA Guidance Document

Description

Sulfate concentrations (mg/L). Nondetects reported as <1450.

Usage

data(EPA.89b.sulfate.df)

Format

A data frame with 24 observations on the following 3 variables.

Sulfate.orig: a character vector of original sulfate concentration (mg/L)
Sulfate: a numeric vector of sulfate concentations with <1450 coded as 1450
Censored: a logical vector indicating which observations are censored

Source

T-29 Concentrations from 1989 USEPA Guidance Document

Description

T-29 concentrations (ppm) at two compliance wells (four monthly samples at each well, four replicates within each month). Detection limit is not given.

Usage

EPA.89b.t29.df

Format

A data frame with 32 observations on the following 6 variables.

T29.orig: a character vector of the original T-29 concentrations (ppm)
T29: a numeric vector of T-29 concentrations with <? coded as 0
Censored: a logical vector indicating which observations are censored
Month: a factor indicating the month of collection
Replicate: a factor indicating the replicate number
Well: a factor indicating the well number

Source

Total Organic Carbon Concentrations from 1989 USEPA Guidance Document

Description

Numeric vector containing total organic carbon (TOC) concentrations (mg/L).

Usage

EPA.89b.toc.vec

Format

A numeric vector with 19 elements containing TOC concentrations (mg/L).

Source

Arsenic Concentrations from 1992 USEPA Guidance Document

Description

Arsenic concentrations (ppm) at six monitoring wells (four monthly samples for each well).

Usage

EPA.92c.arsenic1.df

Format

A data frame with 24 observations on the following 3 variables.

Arsenic: Arsenic concentrations (ppm)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

USEPA. (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, U.S. Environmental Protection Agency, Washington, D.C. p.21.

Arsenic Concentrations from 1992 USEPA Guidance Document

Description

Arsenic concentrations (ppb) at three background wells and one compliance well (six monthly samples for each well; first four missing at compliance well). Nondetects reported as <5.

Usage

EPA.92c.arsenic2.df

Format

A data frame with 24 observations on the following 6 variables.

Arsenic.orig: a character vector of original arsenic concentrations (ppb)
Arsenic: a numeric vector of arsenic concentrations with <5 coded as 5
Censored: a logical vector indicating which observations are censored
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Arsenic Concentrations from 1992 USEPA Guidance Document

Description

Arsenic concentrations at one background and one compliance monitoring well. Three years of observations for background well, two years of observations for compliance well, four samples per year for each well.

Usage

EPA.92c.arsenic3.df

Format

A data frame with 20 observations on the following 3 variables.

Arsenic: a numeric vector of arsenic concentrations
Year: a factor indicating the year of collection
Well.type: a factor indicating the well type (background vs. compliance)

Source

Benzene Concentrations from 1992 USEPA Guidance Document

Description

Benzene concentrations (ppb) at six background wells (six monthly samples for each well). Nondetects reported as <2.

Usage

EPA.92c.benzene1.df

Format

A data frame with 36 observations on the following 5 variables.

Benzene.orig: a character vector of original benzene concentrations (ppb)
Benzene: a numeric vector of benzene concentrations with <2 coded as 2
Censored: a logical vector indicating which observations are censored
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Benzene Concentrations from 1992 USEPA Guidance Document

Description

Benzene concentrations (ppb) at one background and one compliance well. Four observations per month for each well. Background well sampled in months 1,2, and 3; compliance well sampled in months 4 and 5.

Usage

EPA.92c.benzene2.df

Format

A data frame with 20 observations on the following 3 variables.

Benzene: a numeric vector of benzene concentrations (ppb)
Month: a factor indicating the month of collection
Well.type: a factor indicating the well type (background vs. compliance)

Source

Carbon Tetrachloride Concentrations from 1992 USEPA Guidance Document

Description

Carbon tetrachloride (CCL4) concentrations (ppb) at five wells (four monthly samples at each well).

Usage

EPA.92c.ccl4.df

Format

A data frame with 20 observations on the following 3 variables.

CCL4: a numeric vector of carbon tetrachloride concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Chrysene Concentrations from 1992 USEPA Guidance Document

Description

Chrysene concentrations (ppb) at five compliance wells (four monthly samples for each well).

Usage

EPA.92c.chrysene.df

Format

A data frame with 20 observations on the following 3 variables.

Chrysene: a numeric vector of chrysene concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Copper Concentrations from 1992 USEPA Guidance Document

Description

Copper concentrations (ppb) at two background and one compliance wells (six monthly samples for each well).

Usage

EPA.92c.copper1.df

Format

A data frame with 18 observations on the following 4 variables.

Copper: a numeric vector of copper concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Copper Concentrations from 1992 USEPA Guidance Document

Description

Copper concentrations (ppb) at three background and two compliance wells (eight monthly samples for each well; first four missing at compliance wells). Nondetects reported as <5.

Usage

EPA.92c.copper2.df

Format

A data frame with 40 observations on the following 6 variables.

Copper.orig: a character vector of original copper concentrations (ppb)
Copper: a numeric vector of copper concentrations with <5 coded as 5
Censored: a logical vector indicating which observations are censored
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Log-transformed Nickel Concentrations from 1992 USEPA Guidance Document

Description

Log-transformed nickel concentrations (ppb) at four monitoring wells (five monthly samples for each well).

Usage

EPA.92c.lognickel1.df

Format

A data frame with 20 observations on the following 3 variables.

LogNickel: a numeric vector of log-transformed nickel concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Nickel Concentrations from 1992 USEPA Guidance Document

Description

Nickel concentrations (ppb) at four monitoring wells (five monthly samples for each well).

Usage

EPA.92c.nickel1.df

Format

A data frame with 20 observations on the following 3 variables.

Nickel: a numeric vector of nickel concentrations (ppb)
Month: a factor indicating the month of collection
Well: a factor indicating the well number

Source

Nickel Concentrations from 1992 USEPA Guidance Document

Description

Nickel concentrations (ppb) at a monitoring well (eight months of samples, two samples for each sampling occasion).

Usage

EPA.92c.nickel2.df

Format

A data frame with 16 observations on the following 3 variables.

Nickel: a numeric vector of nickel concentrations (ppb)
Month: a factor indicating the month of collection
Sample: a factor indicating the sample (replicate) number

Source

Toluene Concentrations from 1992 USEPA Guidance Document

Description

Toluene concentrations (ppb) at two background and three compliance wells (five monthly samples at each well). Nondetects reported as <5.

Usage

EPA.92c.toluene.df

Format

A data frame with 25 observations on the following 6 variables.

Toluene.orig: a character vector of original toluene concentrations (ppb)
Toluene: a numeric vector of toluene concentrations with <5 coded as 5
Censored: a logical vector indicating which observations are censored
Month: a factor indicating the month of collection
Well: a factor indicating the well number
Well.type: a factor indicating the well type (background vs. compliance)

Source

Zinc Concentrations from 1992 USEPA Guidance Document

Description

Zinc concentrations (ppb) at five background wells (eight samples for each well). Nondetects reported as <7.

Usage

EPA.92c.zinc.df

Format

A data frame with 40 observations on the following 5 variables.

Zinc.orig: a character vector of original zinc concentrations (ppb)
Zinc: a numeric vector of zinc concentrations with <7 coded as 7
Censored: a logical vector indicating which observations are censored
Sample: a factor indicating the sample number
Well: a factor indicating the well number

Source

Chromium Concentrations from 1992 USEPA Guidance Document

Description

Chromium concentrations (mg/kg) in soil samples collected randomly over a Superfund site.

Usage

EPA.92d.chromium.df

Format

A data frame with 15 observations on the following variable.

Cr: a numeric vector of chromium concentrations (mg/kg)

Source

USEPA. (1992d). Supplemental Guidance to RAGS: Calculating the Concentration Term. Publication 9285.7-081, May 1992. Intermittent Bulletin, Volume 1, Number 1. Office of Emergency and Remedial Response, Hazardous Site Evaluation Division, OS-230. Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.

Chromium Concentrations from 1992 USEPA Guidance Document

Description

Chromium concentrations (mg/kg) in soil samples collected randomly over a Superfund site.

Usage

EPA.92d.chromium.vec

Format

A numeric vector with 15 observations.

Source

Lead Concentrations from 1994 USEPA Guidance Document

Description

Lead concentrations (mg/Kg) in soil samples at a reference area and a cleanup area. Nondetects reported as <39. There are 14 observations for each area.

Usage

EPA.94b.lead.df

Format

A data frame with 28 observations on the following 4 variables.

Lead.orig: a character vector of original lead concentrations (mg/Kg)
Lead: a numeric vector of lead concentrations with <39 coded as 39
Censored: a logical vector indicating which observations are censored
Area: a factor indicating the area (cleanup vs. reference)

Source

USEPA. (1994b). Statistical Methods for Evaluating the Attainment of Cleanup Standards, Volume 3: Reference-Based Standards for Soils and Solid Media. EPA/230-R-94-004. Office of Policy, Planning, and Evaluation, U.S. Environmental Protection Agency, Washington, D.C. pp.6.20–6.21.

1,2,3,4-Tetrachlorobenzene Concentrations from 1994 USEPA Guidance Document

Description

1,2,3,4-Tetrachlorobenzene (TcCB) concentrations (ppb) in soil samples at a reference area and a cleanup area. There are 47 observations for the reference area and 77 for the cleanup area. There is only one nondetect in the dataset (it's in the cleanup area), and it is reported as ND. Here it is assumed the nondetect is less than the smallest reported value, which is 0.09 ppb. Note that on page 6.23 of USEPA (1994b), a value of 25.5 for the Cleanup Unit was erroneously omitted.

Usage

EPA.94b.tccb.df

Format

A data frame with 124 observations on the following 4 variables.

TcCB.orig: a character vector with the original tetrachlorobenzene concentrations (ppb)
TcCB: a numeric vector of tetrachlorobenzene with <0.99 coded as 0.99
Censored: a logical vector indicating which observations are censored
Area: a factor indicating the area (cleanup vs. reference)

Source

Calibration Data for Cadmium at Mass 111

Description

Calibration data for cadmium at mass 111 (ng/L; method 1638 ICPMS) that appeared in Gibbons et al. (1997b) and were provided to them by the U.S. EPA.

Usage

EPA.97.cadmium.111.df

Format

A data frame with 35 observations on the following 2 variables.

Cadmium: Observed concentation of cadmium (ng/L)
Spike: “True” concentration of cadmium taken from a standard (ng/L)

Source

Gibbons, R.D., D.E. Coleman, and R.F. Maddalone. (1997b). Response to Comment on "An Alternative Minimum Level Definition for Analytical Quantification". Environmental Science and Technology, 31(12), 3729–3731.

The Extreme Value (Gumbel) Distribution

Description

Density, distribution function, quantile function, and random generation for the (largest) extreme value distribution.

Usage

  devd(x, location = 0, scale = 1)
  pevd(q, location = 0, scale = 1)
  qevd(p, location = 0, scale = 1)
  revd(n, location = 0, scale = 1)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

location

vector of location parameters.

scale

vector of positive scale parameters.

Details

Let X be an extreme value random variable with parameters location=\eta and scale=\theta. The density function of X is given by:

f(x; \eta, \theta) = \frac{1}{\theta} e^{-(x-\eta)/\theta} exp[-e^{-(x-\eta)/\theta}]

where -\infty < x, \eta < \infty and \theta > 0.

The cumulative distribution function of X is given by:

F(x; \eta, \theta) = exp[-e^{-(x-\eta)/\theta}]

The p^{th} quantile of X is given by:

x_{p} = \eta - \theta log[-log(p)]

The mode, mean, variance, skew, and kurtosis of X are given by:

Mode(X) = \eta

E(X) = \eta + \epsilon \theta

Var(X) = \theta^2 \pi^2 / 6

Skew(X) = \sqrt{\beta_1} = 1.139547

Kurtosis(X) = \beta_2 = 5.4

where \epsilon denotes Euler's constant, which is equivalent to -digamma(1).

Value

density (devd), probability (pevd), quantile (qevd), or random sample (revd) for the extreme value distribution with location parameter(s) determined by location and scale parameter(s) determined by scale.

Note

There are three families of extreme value distributions. The one described here is the Type I, also called the Gumbel extreme value distribution or simply Gumbel distribution. The name “extreme value” comes from the fact that this distribution is the limiting distribution (as n approaches infinity) of the greatest value among n independent random variables each having the same continuous distribution.

The Gumbel extreme value distribution is related to the exponential distribution as follows. Let Y be an exponential random variable with parameter rate=\lambda. Then X = \eta - log(Y) has an extreme value distribution with parameters location=\eta and scale=1/\lambda.

The distribution described above and used by devd, pevd, qevd, and revd is the largest extreme value distribution. The smallest extreme value distribution is the limiting distribution (as n approaches infinity) of the smallest value among n independent random variables each having the same continuous distribution. If X has a largest extreme value distribution with parameters
location=\eta and scale=\theta, then Y = -X has a smallest extreme value distribution with parameters location=-\eta and scale=\theta. The smallest extreme value distribution is related to the Weibull distribution as follows. Let Y be a Weibull random variable with parameters shape=\beta and scale=\alpha. Then X = log(Y) has a smallest extreme value distribution with parameters location=log(\alpha) and scale=1/\beta.

The extreme value distribution has been used extensively to model the distribution of streamflow, flooding, rainfall, temperature, wind speed, and other meteorological variables, as well as material strength and life data.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of an extreme value distribution with location=0, scale=1, 
  # evaluated at 0.5:

  devd(.5) 
  #[1] 0.3307043

  #----------

  # The cdf of an extreme value distribution with location=1, scale=2, 
  # evaluated at 0.5:

  pevd(.5, 1, 2) 
  #[1] 0.2769203

  #----------

  # The 25'th percentile of an extreme value distribution with 
  # location=-2, scale=0.5:

  qevd(.25, -2, 0.5) 
  #[1] -2.163317

  #----------

  # Random sample of 4 observations from an extreme value distribution with 
  # location=5, scale=2. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  revd(4, 5, 2) 
  #[1] 9.070406 7.669139 4.511481 5.903675

The Empirical Distribution Based on a Set of Observations

Description

Density, distribution function, quantile function, and random generation for the empirical distribution based on a set of observations

Usage

  demp(x, obs, discrete = FALSE, density.arg.list = NULL)
  pemp(q, obs, discrete = FALSE, 
    prob.method = ifelse(discrete, "emp.probs", "plot.pos"), 
    plot.pos.con = 0.375) 
  qemp(p, obs, discrete = FALSE, 
    prob.method = ifelse(discrete, "emp.probs", "plot.pos"), 
    plot.pos.con = 0.375)
  remp(n, obs)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

obs

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

discrete

logical scalar indicating whether the assumed parent distribution of x is discrete (discrete=TRUE) or continuous (discrete=FALSE). The default value is FALSE.

density.arg.list

list with arguments to the R density function. The default value is NULL. (See the help file for density for more information on the arguments to density.) The argument density.arg.list is ignored if discrete=TRUE.

prob.method

character string indicating what method to use to compute the empirical probabilities. Possible values are "emp.probs" (empirical probabilities, default if discrete=TRUE) and "plot.pos" (plotting positions, default if discrete=FALSE). See the DETAILS section for more explanation.

plot.pos.con

numeric scalar between 0 and 1 containing the value of the plotting position constant. The default value is plot.pos.con=0.375. See the DETAILS section for more information. This argument is ignored if prob.method="emp.probs".

Details

Let x_1, x_2, \ldots, x_n denote a random sample of n observations from some unknown probability distribution (i.e., the elements of the argument obs), and let x_{(i)} denote the i^{th} order statistic, that is, the i^{th} largest observation, for i = 1, 2, \ldots, n.

Estimating Density
The function demp computes the empirical probability density function. If the observations are assumed to come from a discrete distribution, the probability density (mass) function is estimated by:

\hat{f}(x) = \widehat{Pr}(X = x) = \frac{\sum^n_{i=1} I_{[x]}(x_i)}{n}

where I is the indicator function:

`I_{[x]}(y) =`	`1`	if `y = x`,
	`0`	if `y \ne x`

That is, the estimated probability of observing the value x is simply the observed proportion of observations equal to x.

If the observations are assumed to come from a continuous distribution, the function demp calls the R function density to compute the estimated density based on the values specified in the argument obs, and then uses linear interpolation to estimate the density at the values specified in the argument x. See the R help file for density for more information on how the empirical density is computed in the continuous case.

Estimating Probabilities
The function pemp computes the estimated cumulative distribution function (cdf), also called the empirical cdf (ecdf). If the observations are assumed to come from a discrete distribution, the value of the cdf evaluated at the i^{th} order statistic is usually estimated by:

\hat{F}[x_{(i)}] = \widehat{Pr}(X \le x_{(i)}) = \hat{p}_i = \frac{\sum^n_{j=1} I_{(-\infty, x_{(i)}]}(x_j)}{n}

where:

`I_{(-\infty, x]}(y) =`	`1`	if `y \le x`,
	`0`	if `y > x`

(D'Agostino, 1986a). That is, the estimated value of the cdf at the i^{th} order statistic is simply the observed proportion of observations less than or equal to the i^{th} order statistic. This estimator is sometimes called the “empirical probabilities” estimator and is intuitively appealing. The function pemp uses the above equations to compute the empirical cdf when prob.method="emp.probs".

For any general value of x, when the observations are assumed to come from a discrete distribution, the value of the cdf is estimated by:

`\hat{F}(x) =`	`0`	if `x < x_{(1)}`,
	`\hat{p}_i`	if `x_{(i)} \le x < x_{(i+1)}`,
	`1`	if `x \ge x_{(n)}`

The function pemp uses the above equation when discrete=TRUE.

If the observations are assumed to come from a continuous distribution, the value of the cdf evaluated at the i^{th} order statistic is usually estimated by:

\hat{F}[x_{(i)}] = \hat{p}_i = \frac{i - a}{n - 2a + 1}

where a denotes the plotting position constant and 0 \le a \le 1 (Cleveland, 1993, p.18; D'Agostino, 1986a, pp.8,25). The estimators defined by the above equation are called plotting positions and are used to construct probability plots. The function pemp uses the above equation when
prob.method="plot.pos".

For any general value of x, the value of the cdf is estimated by linear interpolation:

`\hat{F}(x) =`	`\hat{p}_1`	if `x < x_{(1)}`,
	`(1 - r)\hat{p}_i + r\hat{p}_{i+1}`	if `x_{(i)} \le x < x_{(i+1)}`,
	`\hat{p}_n`	if `x \ge x_{(n)}`

where

r = \frac{x - x_{(i)}}{x_{(i+1)} - x_{(i)}}

(Chambers et al., 1983). The function pemp uses the above two equations when discrete=FALSE.

Estimating Quantiles
The function qemp computes the estimated quantiles based on the observed data. If the observations are assumed to come from a discrete distribution, the p^{th} quantile is usually estimated by:

`\hat{x}_p =`	`x_{(1)}`	if `p \le \hat{p}_1`,
	`x_{(i)}`	if `\hat{p}_{i-1} < p \le \hat{p}_i`,
	`x_n`	if `p > \hat{p}_n`

The function qemp uses the above equation when discrete=TRUE.

If the observations are assumed to come from a continuous distribution, the p^{th} quantile is usually estimated by linear interpolation:

`\hat{x}_p =`	`x_{(1)}`	if `p \le \hat{p}_1`,
	`(1 - r)x_{(i-1)} + rx_{(i)}`	if `\hat{p}_{i-1} < p \le \hat{p}_i`,
	`x_n`	if `p > \hat{p}_n`

where

r = \frac{p - \hat{p}_{i-1}}{\hat{p}_i - \hat{p}_{i-1}}

The function qemp uses the above two equations when discrete=FALSE.

Generating Random Numbers From the Empirical Distribution
The function remp simply calls the R function sample to sample the elements of obs with replacement.

Value

density (demp), probability (pemp), quantile (qemp), or random sample (remp) for the empirical distribution based on the data contained in the vector obs.

Note

The function demp let's you perform nonparametric density estimation. The function pemp computes the value of the empirical cumulative distribution function (ecdf) for user-specified quantiles. The ecdf is a nonparametric estimate of the true cdf (see ecdfPlot). The function qemp computes nonparametric estimates of quantiles (see the help files for eqnpar and quantile). The function remp let's you sample a set of observations with replacement, which is often done while bootstrapping or performing some other kind of Monte Carlo simulation.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11–16.

Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.

D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7–62.

Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley and Sons, New York.

Sheather, S. J. and Jones M. C. (1991). A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. Journal of the Royal Statististical Society B, 683–690.

Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.

Wegman, E.J. (1972). Nonparametric Probability Density Estimation. Technometrics 14, 533-546.

Examples

  # Create a set of 100 observations from a gamma distribution with 
  # parameters shape=4 and scale=5. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(3) 
  obs <- rgamma(100, shape=4, scale=5)

  # Now plot the empirical distribution (with a histogram) and the true distribution:

  dev.new()
  hist(obs, col = "cyan", xlim = c(0, 65), freq = FALSE, 
    ylab = "Relative Frequency") 

  pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE) 

  box()

  # Now plot the empirical distribution (based on demp) with the 
  # true distribution:

  x <- qemp(p = seq(0, 1, len = 100), obs = obs) 
  y <- demp(x, obs) 

  dev.new()
  plot(x, y, xlim = c(0, 65), type = "n", 
    xlab = "Value of Random Variable", 
    ylab = "Relative Frequency") 
  lines(x, y, lwd = 2, col = "cyan") 

  pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)

  # Alternatively, you can create the above plot with the function 
  # epdfPlot:

  dev.new()
  epdfPlot(obs, xlim = c(0, 65), epdf.col = "cyan", 
    xlab = "Value of Random Variable", 
    main = "Empirical and Theoretical PDFs")

  pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)

  


  # Clean Up
  #---------
  rm(obs, x, y)

Internal EnvStats Objects

Description

Internal EnvStats objects

Details

These are not to be called by the user. They have been exported to allow advanced users to see their structure.

Atmospheric Environmental Conditions in New York City

Description

Daily measurements of ozone concentration, wind speed, temperature, and solar radiation in New York City for 153 consecutive days between May 1 and September 30, 1973.

Usage

  Environmental.df
  Air.df

Format

The data frame Environmental.df has 153 observations on the following 4 variables.

ozone: Average ozone concentration (of hourly measurements) of in parts per billion.
radiation: Solar radiation (from 08:00 to 12:00) in langleys.
temperature: Maximum daily temperature in degrees Fahrenheit.
wind: Average wind speed (at 07:00 and 10:00) in miles per hour.

Row names are the dates the data were collected.

The data frame Air.df is the same as Environmental.df except that the column ozone is the cube root of average ozone concentration.

Details

Data on ozone (ppb), solar radiation (langleys), temperature (degrees Fahrenheit), and wind speed (mph) for 153 consecutive days between May 1 and September 30, 1973. These data are a superset of the data contained in the data frame environmental in the package lattice.

Source

Chambers et al. (1983), pp. 347-349.

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, 395pp.

Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.

Cleveland, W.S. (1994). The Elements of Graphing Data. Revised Edition. Hobart Press, Summit, New Jersey, 297pp.

Examples

# Scatterplot matrix
pairs(Environmental.df)

pairs(Air.df)


# Time series plot for ozone
attach(Environmental.df)
dates <- as.Date(row.names(Environmental.df), format = "%m/%d/%Y")
plot(dates, ozone, type = "l", 
    xlab = "Time (Year = 1973)", ylab = "Ozone (ppb)",
    main = "Time Series Plot of Daily Ozone Measures")
detach("Environmental.df")
rm(dates)

Euler's Constant

Description

Explanation of Euler's Constant.

Details

Euler's Constant, here denoted \epsilon, is a real-valued number that can be defined in several ways. Johnson et al. (1992, p. 5) use the definition:

\epsilon = \lim_{n \to \infty}[1 + \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n} - log(n)]

and note that it can also be expressed as

\epsilon = -\Psi(1)

where \Psi() is the digamma function (Johnson et al., 1992, p.8).

The value of Euler's Constant, to 10 decimal places, is 0.5772156649.

The expression for the mean of a Type I extreme value (Gumbel) distribution involves Euler's constant; hence Euler's constant is used to compute the method of moments estimators for this distribution (see eevd).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, pp.4-8.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

EnvStats Functions Listed by Category

Description

Hyperlink list of EnvStats functions by category.

Details

Calibration
Censored Data
Data Transformations
Estimating Distribution Parameters
Estimating Distribution Quantiles
Goodness-of-Fit Tests
Hypothesis Tests
Monte Carlo Simulation and Risk Assessment
Plotting Probability Distributions
Plotting Using ggplot2
Power and Sample Size Calculations
Prediction Intervals
Printing and Plotting Methods
Probability Distributions and Random Numbers
Summary Statistics
Tolerance Intervals
Trend Analysis

EnvStats Functions for Calibration

Description

The EnvStats functions listed below are useful for performing calibration and inverse prediction to determine the concentration of a chemical based on a machine signal.

Details

Function Name	Description
`anovaPE`	Compute lack-of-fit and pure error ANOVA table for a
	linear model.
`calibrate`	Fit a calibration line or curve.
`detectionLimitCalibrate`	Determine detection limit based on using a calibration
	line (or curve) and inverse regression.
`inversePredictCalibrate`	Predict concentration using a calibration line (or curve)
	and inverse regression.
`pointwise`	Pointwise confidence limits for predictions.
`predict.lm`	Predict method for linear model fits.

EnvStats Functions for Censored Data

Description

The EnvStats functions listed below are useful for dealing with Type I censored data.

Details

Data Transformations

Function Name	Description
`boxcoxCensored`	Compute values of an objective for Box-Cox Power
	transformations, or compute optimal transformation,
	for Type I censored data.

`print.boxcoxCensored`	Print an object of class `"boxcoxCensored"`.

`plot.boxcoxCensored`	Plot an object of class `"boxcoxCensored"`.

Estimating Distribution Parameters

Function Name	Description
`egammaCensored`	Estimate shape and scale parameters for a gamma distribution
	based on Type I censored data.

`egammaAltCensored`	Estimate mean and CV for a gamma distribution
	based on Type I censored data.

`elnormCensored`	Estimate parameters for a lognormal distribution (log-scale)
	based on Type I censored data.

`elnormAltCensored`	Estimate parameters for a lognormal distribution (original scale)
	based on Type I censored data.

`enormCensored`	Estimate parameters for a Normal distribution based on Type I
	censored data.

`epoisCensored`	Estimate parameter for a Poisson distribution based on Type I
	censored data.

`enparCensored`	Estimate the mean and standard deviation nonparametrically.

`gpqCiNormSinglyCensored`	Generate the generalized pivotal quantity used to construct a
	confidence interval for the mean of a Normal distribution based
	on Type I singly censored data.

`gpqCiNormMultiplyCensored`	Generate the generalized pivotal quantity used to construct a
	confidence interval for the mean of a Normal distribution based
	on Type I multiply censored data.

`print.estimateCensored`	Print an object of class `"estimateCensored"`.

Estimating Distribution Quantiles

Function Name	Description
`eqlnormCensored`	Estimate quantiles of a Lognormal distribution (log-scale)
	based on Type I censored data, and optionally construct
	a confidence interval for a quantile.

`eqnormCensored`	Estimate quantiles of a Normal distribution
	based on Type I censored data, and optionally construct
	a confidence interval for a quantile.

All of the functions for computing quantiles (and associated confidence intervals) for complete (uncensored) data are listed in the help file Estimating Distribution Quantiles. All of these functions, with the exception of eqnpar, will accept an object of class "estimateCensored". Thus, you may estimate quantiles (and construct approximate confidence intervals) for any distribution for which:

There exists a function to estimate distribution parameters using censored data (see the section Estimating Distribution Parameters above).
There exists a function to estimate quantiles for that distribution based on complete data (see the help file Estimating Distribution Quantiles).

Nonparametric estimates of quantiles (and associated confidence intervals) can be constructed from censored data as long as the order statistics used in the results are above all left-censored observations or below all right-censored observations. See the help file for eqnpar for more information and examples.

Goodness-of-Fit Tests

Function Name	Description
`gofTestCensored`	Perform a goodness-of-fit test based on Type I left- or
	right-censored data.

`print.gofCensored`	Print an object of class `"gofCensored"`.

`plot.gofCensored`	Plot an object of class `"gofCensored"`.

Hypothesis Tests

Function Name	Description
`twoSampleLinearRankTestCensored`	Perform two-sample linear rank tests based on
	censored data.

`print.htestCensored`	Printing method for object of class
	`"htestCensored"`.

Plotting Probability Distributions

Function Name	Description
`cdfCompareCensored`	Plot two cumulative distribution functions based on Type I
	censored data.

`ecdfPlotCensored`	Plot an empirical cumulative distribution function based on
	Type I censored data.

`ppointsCensored`	Compute plotting positions for Type I censored data.

`qqPlotCensored`	Produce quantile-quantile (Q-Q) plots, also called probability
	plots, based on Type I censored data.

Prediction and Tolerance Intervals

Function Name	Description
`gpqTolIntNormSinglyCensored`	Generate the generalized pivotal quantity used to construct a
	tolerance interval for a Normal distribution based
	on Type I singly censored data.

`gpqTolIntNormMultiplyCensored`	Generate the generalized pivotal quantity used to construct a
	tolerance interval for a Normal distribution based
	on Type I multiply censored data.

`tolIntLnormCensored`	Tolerance interval for a lognormal distribution (log-scale)
	based on Type I censored data.

`tolIntNormCensored`	Tolerance interval for a Normal distribution based on Type I
	censored data.

All of the functions for computing prediction and tolerance intervals for complete (uncensored) data are listed in the help files Prediction Intervals and Tolerance Intervals. All of these functions, with the exceptions of predIntNpar and tolIntNpar, will accept an object of class "estimateCensored". Thus, you may construct approximate prediction or tolerance intervals for any distribution for which:

There exists a function to estimate distribution parameters using censored data (see the section Estimating Distribution Parameters above).
There exists a function to create a prediction or tolerance interval for that distribution based on complete data (see the help files Prediction Intervals and Tolerance Intervals).

Nonparametric prediction and tolerance intervals can be constructed from censored data as long as the order statistics used in the results are above all left-censored observations or below all right-censored observations. See the help files for predIntNpar, predIntNparSimultaneous, and tolIntNpar for more information and examples.

EnvStats Functions for Data Transformations

Description

The EnvStats functions listed below are useful for deciding on data transformations.

Details

Function Name	Description
`boxcox`	Compute values of an objective for Box-Cox transformations, or
	compute optimal transformation based on raw observations
	or residuals from a linear model.
`boxcoxTransform`	Apply a Box-Cox Power transformation to a set of data.
`plot.boxcox`	Plotting method for an object of class `"boxcox"`.
`plot.boxcoxLm`	Plotting method for an object of class `"boxcoxLm"`.
`print.boxcox`	Printing method for an object of class `"boxcox"`.
`print.boxcoxLm`	Printing method for an object of class `"boxcoxLm"`.

EnvStats Functions for Estimating Distribution Parameters

Description

The EnvStats functions listed below are useful for estimating distribution parameters and optionally constructing confidence intervals.

Details

Function Name	Description
`ebeta`	Estimate parameters of a Beta distribution
`ebinom`	Estimate parameter of a Binomial distribution
`eexp`	Estimate parameter of an Exponential distribution
`eevd`	Estimate parameters of an Extreme Value distribution
`egamma`	Estimate shape and scale parameters of a Gamma distribution
`egammaAlt`	Estimate mean and CV parameters of a Gamma distribution
`egevd`	Estimate parameters of a Generalized Extreme Value distribution
`egeom`	Estimate parameter of a Geometric distribution
`ehyper`	Estimate parameter of a Hypergeometric distribution
`elogis`	Estimate parameters of a Logistic distribution
`elnorm`	Estimate parameters of a Lognormal distribution (log-scale)
`elnormAlt`	Estimate parameters of a Lognormal distribution (original scale)
`elnorm3`	Estimate parameters of a Three-Parameter Lognormal distribution
`enbinom`	Estimate parameter of a Negative Binomial distribution
`enorm`	Estimate parameters of a Normal distribution
`enpar`	Estimate Mean, Standard Deviation, and Standard Error Nonparametrically
`epareto`	Estimate parameters of a Pareto distribution
`epois`	Estimate parameter of a Poisson distribution
`eunif`	Estimate parameters of a Uniform distribution
`eweibull`	Estimate parameters of a Weibull distribution
`ezmlnorm`	Estimate parameters of a Zero-Modified Lognormal (Delta)
	distribution (log-Scale)
`ezmlnormAlt`	Estimate parameters of a Zero-Modified Lognormal (Delta)
	distribution (original Scale)
`ezmnorm`	Estimate parameters of a Zero-Modified Normal distribution

EnvStats Functions for Estimating Distribution Quantiles

Description

The EnvStats functions listed below are useful for estimating distribution quantiles and, for some functions, optionally constructing confidence intervals for a quantile.

Details

Function Name	Description
`eqbeta`	Estimate quantiles of a Beta distribution.
`eqbinom`	Estimate quantiles of a Binomial distribution.
`eqexp`	Estimate quantiles of an Exponential distribution.
`eqevd`	Estimate quantiles of an Extreme Value distribution.
`eqgamma`	Estimate quantiles of a Gamma distribution
	using the Shape and Scale Parameterization, and optionally
	construct a confidence interval for a quantile.
`eqgammaAlt`	Estimate quantiles of a Gamma distribution
	using the mean and CV Parameterization, and optionally
	construct a confidence interval for a quantile.
`eqgevd`	Estimate quantiles of a Generalized Extreme Value distribution.
`eqgeom`	Estimate quantiles of a Geometric distribution.
`eqhyper`	Estimate quantiles of a Hypergeometric distribution.
`eqlogis`	Estimate quantiles of a Logistic distribution.
`eqlnorm`	Estimate quantiles of a Lognormal distribution (log-scale),
	and optionally construct a confidence interval for a quantile.
`eqlnorm3`	Estimate quantiles of a Three-Parameter Lognormal distribution.
`eqnbinom`	Estimate quantiles of a Negative Binomial distribution.
`eqnorm`	Estimate quantiles of a Normal distribution,
	and optionally construct a confidence interval for a quantile.
`eqpareto`	Estimate quantiles of a Pareto distribution.
`eqpois`	Estimate quantiles of a Poisson distribution,
	and optionally construct a confidence interval for a quantile.
`equnif`	Estimate quantiles of a Uniform distribution.
`eqweibull`	Estimate quantiles of a Weibull distribution.
`eqzmlnorm`	Estimate quantiles of a Zero-Modified Lognormal (Delta)
	distribution (log-scale).
`eqzmlnormAlt`	Estimate quantiles of a Zero-Modified Lognormal (Delta)
	distribution (original scale).
`eqzmnorm`	Estimate quantiles of a Zero-Modified Normal distribution.

EnvStats Functions for Goodness-of-Fit Tests

Description

The EnvStats functions listed below are useful for performing goodness-of-fit tests for user-specified probability distributions.

Details

Goodness-of-Fit Tests

Function Name	Description
`gofTest`	Perform a goodness-of-fit test for a specified probability distribution.
	The resulting object is of class `"gof"` unless the test is the
	two-sample Kolmogorov-Smirnov test, in which case the resulting
	object is of class `"gofTwoSample"`.
`plot.gof`	S3 class method for plotting an object of class `"gof"`.
`print.gof`	S3 class method for printing an object of class `"gof"`.
`plot.gofTwoSample`	S3 class method for plotting an object of class `"gofTwoSample"`.
`print.gofTwoSample`	S3 class method for printing an object of class `"gofTwoSample"`.
`gofGroupTest`	Perform a goodness-of-fit test to determine whether data in a set of groups
	appear to all come from the same probability distribution
	(with possibly different parameters for each group).
	The resulting object is of class `"gofGroup"`.
`plot.gofGroup`	S3 class method for plotting an object of class `"gofGroup"`.
`print.gofGroup`	S3 class method for printing an object of class `"gofGroup"`.

Tests for Outliers

Function Name	Description
`rosnerTest`	Perform Rosner's test for outliers assuming a normal (Gaussian) distribution.
`print.gofOutlier`	S3 class method for printing an object of class `"gofOutlier"`.

Choose a Distribution

Function Name	Description
`distChoose`	Choose best fitting distribution based on goodness-of-fit tests.
`print.distChoose`	S3 class method for printing an object of class `"distChoose"`.

EnvStats Functions for Hypothesis Tests

Description

The EnvStats functions listed below are useful for performing hypothesis tests not already built into R. See Power and Sample Size Calculations for a list of functions you can use to perform power and sample size calculations based on various hypothesis tests.

Details

For goodness-of-fit tests, see Goodness-of-Fit Tests.

Function Name	Description
`chenTTest`	Chen's modified one-sided t-test for skewed
	distributions.
`kendallTrendTest`	Nonparametric test for monotonic trend
	based on Kendall's tau statistic (and
	optional confidence interval for slope).
`kendallSeasonalTrendTest`	Nonparametric test for monotonic trend
	within each season based on Kendall's tau
	statistic (and optional confidence interval
	for slope).
`oneSamplePermutationTest`	Fisher's one-sample randomization
	(permutation) test for location.
`quantileTest`	Two-sample rank test to detect a shift in
	a proportion of the “treated” population.
`quantileTestPValue`	Compute p-value associated with a specified
	combination of `m`, `n`, `r` and `k`
	for the quantile test.
	Useful for determining `r` and `k` for a
	given significance level `\alpha`.
`serialCorrelationTest`	Test for the presence of serial correlation.
`signTest`	One- or paired-sample sign test on the
	median.
`twoSampleLinearRankTest`	Two-sample linear rank test to detect a
	shift in the “treated” population.
`twoSamplePermutationTestLocation`	Two-sample or paired-sample randomization
	(permutation) test for location.
`twoSamplePermutationTestProportion`	Randomization (permutation) test to compare
	two proportions (Fisher's exact test).
`varTest`	One-sample test on variance or two-sample
	test to compare variances.
`varGroupTest`	Test for homogeneity of variance among two
	or more groups.
`zTestGevdShape`	Estimate the shape parameter of a
	Generalized Extreme Value distribution and
	test the null hypothesis that the true
	value is equal to 0.

EnvStats Functions for Monte Carlo Simulation and Risk Assessment

Description

The EnvStats functions listed below are useful for performing Monte Carlo simulations and risk assessment.

Details

Function Name	Description
Empirical	Empirical distribution based on a set of observations.
`simulateVector`	Simulate a vector of random numbers from a specified theoretical
	probability distribution or empirical probability distribution
	using either Latin hypercube sampling or simple random sampling.
`simulateMvMatrix`	Simulate a multivariate matrix of random numbers from specified
	theoretical probability distributions and/or empirical probability
	distributions based on a specified rank correlation matrix, using
	either Latin hypercube sampling or simple random sampling.

EnvStats Functions for Plotting Probability Distributions

Description

The EnvStats functions listed below are useful for plotting probability distributions.

Details

Function Name	Description
`cdfCompare`	Plot two cumulative distribution functions with the same `x`-axis
	in order to compare them.
`cdfPlot`	Plot a cumulative distribution function.
`ecdfPlot`	Plot empirical cumulative distribution function.
`epdfPlot`	Plot empirical probability density function.
`pdfPlot`	Plot probability density function.
`qqPlot`	Produce a quantile-quantile (Q-Q) plot, also called a probability plot.
`qqPlotGestalt`	Plot several Q-Q plots from the same distribution in order to
	develop a Gestalt of Q-Q plots for that distribution.

EnvStats Functions for Creating Plots Using the ggplot2 Package

Description

The EnvStats functions listed below are useful for creating plots with the ggplot2 package.

Details

Function Name	Description
`geom_stripchart`	Adaptation of the EnvStats function `stripChart`,
	used to create a strip plot using functions from the package
	ggplot2.
`stat_n_text`	Add text indicating the sample size
	to a ggplot2 plot.
`stat_mean_sd_text`	Add text indicating the mean and standard deviation
	to a ggplot2 plot.
`stat_median_iqr_text`	Add text indicating the median and interquartile range
	to a ggplot2 plot.
`stat_test_text`	Add text indicating the results of a hypothesis test
	comparing groups to a ggplot2 plot.

EnvStats Functions for Power and Sample Size Calculations

Description

The EnvStats functions listed below are useful for power and sample size calculations.

Details

Confidence Intervals

Function Name	Description
`ciTableProp`	Confidence intervals for binomial proportion, or
	difference between two proportions, following Bacchetti (2010)
`ciBinomHalfWidth`	Compute the half-width of a confidence interval for a
	Binomial proportion or the difference between two proportions.
`ciBinomN`	Compute the sample size necessary to achieve a specified
	half-width of a confidence interval for a Binomial proportion or
	the difference between two proportions.
`plotCiBinomDesign`	Create plots for a sampling design based on a confidence interval
	for a Binomial proportion or the difference between two proportions.
`ciTableMean`	Confidence intervals for mean of normal distribution, or
	difference between two means, following Bacchetti (2010)
`ciNormHalfWidth`	Compute the half-width of a confidence interval for the mean of a
	Normal distribution or the difference between two means.
`ciNormN`	Compute the sample size necessary to achieve a specified half-width
	of a confidence interval for the mean of a Normal distribution or
	the difference between two means.
`plotCiNormDesign`	Create plots for a sampling design based on a confidence interval
	for the mean of a Normal distribution or the difference between
	two means.
`ciNparConfLevel`	Compute the confidence level associated with a nonparametric
	confidence interval for a percentile.
`ciNparN`	Compute the sample size necessary to achieve a specified
	confidence level for a nonparametric confidence interval for
	a percentile.
`plotCiNparDesign`	Create plots for a sampling design based on a nonparametric
	confidence interval for a percentile.

Hypothesis Tests

Function Name	Description
`aovN`	Compute the sample sizes necessary to achieve a
	specified power for a one-way fixed-effects analysis
	of variance test.
`aovPower`	Compute the power of a one-way fixed-effects analysis of
	variance test.
`plotAovDesign`	Create plots for a sampling design based on a one-way
	analysis of variance.
`propTestN`	Compute the sample size necessary to achieve a specified
	power for a one- or two-sample proportion test.
`propTestPower`	Compute the power of a one- or two-sample proportion test.
`propTestMdd`	Compute the minimal detectable difference associated with
	a one- or two-sample proportion test.
`plotPropTestDesign`	Create plots involving sample size, power, difference, and
	significance level for a one- or two-sample proportion test.
`tTestAlpha`	Compute the Type I Error associated with specified values for
	for power, sample size(s), and scaled MDD for a one- or
	two-sample t-test.
`tTestN`	Compute the sample size necessary to achieve a specified
	power for a one- or two-sample t-test.
`tTestPower`	Compute the power of a one- or two-sample t-test.
`tTestScaledMdd`	Compute the scaled minimal detectable difference
	associated with a one- or two-sample t-test.
`plotTTestDesign`	Create plots for a sampling design based on a one- or
	two-sample t-test.
`tTestLnormAltN`	Compute the sample size necessary to achieve a specified
	power for a one- or two-sample t-test, assuming lognormal
	data.
`tTestLnormAltPower`	Compute the power of a one- or two-sample t-test, assuming
	lognormal data.
`tTestLnormAltRatioOfMeans`	Compute the minimal or maximal detectable ratio of means
	associated with a one- or two-sample t-test, assuming
	lognormal data.
`plotTTestLnormAltDesign`	Create plots for a sampling design based on a one- or
	two-sample t-test, assuming lognormal data.
`linearTrendTestN`	Compute the sample size necessary to achieve a specified
	power for a t-test for linear trend.
`linearTrendTestPower`	Compute the power of a t-test for linear trend.
`linearTrendTestScaledMds`	Compute the scaled minimal detectable slope for a t-test
	for linear trend.
`plotLinearTrendTestDesign`	Create plots for a sampling design based on a t-test for
	linear trend.

Prediction Intervals

Normal Distribution Prediction Intervals

Function Name	Description
`predIntNormHalfWidth`	Compute the half-width of a prediction
	interval for a normal distribution.
`predIntNormK`	Compute the required value of `K` for
	a prediction interval for a Normal
	distribution.
`predIntNormN`	Compute the sample size necessary to
	achieve a specified half-width for a
	prediction interval for a Normal
	distribution.
`plotPredIntNormDesign`	Create plots for a sampling design
	based on the width of a prediction
	interval for a Normal distribution.
`predIntNormTestPower`	Compute the probability that at least
	one future observation (or mean)
	falls outside a prediction interval
	for a Normal distribution.
`plotPredIntNormTestPowerCurve`	Create plots for a sampling
	design based on a prediction interval
	for a Normal distribution.
`predIntNormSimultaneousTestPower`	Compute the probability that at
	least one set of future observations
	(or means) violates the given rule
	based on a simultaneous prediction
	interval for a Normal distribution.
`plotPredIntNormSimultaneousTestPowerCurve`	Create plots for a sampling design
	based on a simultaneous prediction
	interval for a Normal distribution.

Lognormal Distribution Prediction Intervals

Function Name	Description
`predIntLnormAltTestPower`	Compute the probability that at least
	one future observation (or geometric
	mean) falls outside a prediction
	interval for a lognormal distribution.
`plotPredIntLnormAltTestPowerCurve`	Create plots for a sampling design
	based on a prediction interval for a
	lognormal distribution.
`predIntLnormAltSimultaneousTestPower`	Compute the probability that at least
	one set of future observations (or
	geometric means) violates the given
	rule based on a simultaneous
	prediction interval for a lognormal
	distribution.
`plotPredIntLnormAltSimultaneousTestPowerCurve`	Create plots for a sampling design
	based on a simultaneous prediction
	interval for a lognormal distribution.

Nonparametric Prediction Intervals

Function Name	Description

`predIntNparConfLevel`	Compute the confidence level associated with
	a nonparametric prediction interval.
`predIntNparN`	Compute the required sample size to achieve
	a specified confidence level for a
	nonparametric prediction interval.
`plotPredIntNparDesign`	Create plots for a sampling design based on
	the confidence level and sample size of a
	nonparametric prediction interval.
`predIntNparSimultaneousConfLevel`	Compute the confidence level associated with
	a simultaneous nonparametric prediction
	interval.
`predIntNparSimultaneousN`	Compute the required sample size for a
	simultaneous nonparametric prediction
	interval.
`plotPredIntNparSimultaneousDesign`	Create plots for a sampling design based on
	a simultaneous nonparametric prediction
	interval.
`predIntNparSimultaneousTestPower`	Compute the probability that at least one
	set of future observations violates the
	given rule based on a nonparametric
	simultaneous prediction interval.
`plotPredIntNparSimultaneousTestPowerCurve`	Create plots for a sampling design based on
	a simultaneous nonparametric prediction
	interval.

Tolerance Intervals

Function Name	Description
`tolIntNormHalfWidth`	Compute the half-width of a tolerance
	interval for a normal distribution.
`tolIntNormK`	Compute the required value of `K` for a
	tolerance interval for a Normal distribution.
`tolIntNormN`	Compute the sample size necessary to achieve a
	specified half-width for a tolerance interval
	for a Normal distribution.
`plotTolIntNormDesign`	Create plots for a sampling design based on a
	tolerance interval for a Normal distribution.

`tolIntNparConfLevel`	Compute the confidence level associated with a
	nonparametric tolerance interval for a specified
	sample size and coverage.
`tolIntNparCoverage`	Compute the coverage associated with a
	nonparametric tolerance interval for a specified
	sample size and confidence level.
`tolIntNparN`	Compute the sample size required for a nonparametric
	tolerance interval with a specified coverage and
	confidence level.
`plotTolIntNparDesign`	Create plots for a sampling design based on a
	nonparametric tolerance interval.

EnvStats Functions for Prediction Intervals

Description

The EnvStats functions listed below are useful for computing prediction intervals and simultaneous prediction intervals. See Power and Sample Size for a list of functions useful for computing power and sample size for a design based on a prediction interval width, or a design based on a hypothesis test for future observations falling outside of a prediciton interval.

Details

Function Name	Description
`predIntGamma`,	Prediction interval for the next `k`
`predIntGammaAlt`	observations or next set of `k` means for a
	Gamma distribution.
`predIntGammaSimultaneous`,	Construct a simultaneous prediction interval for the
`predIntGammaAltSimultaneous`	next `r` sampling occasions based on a
	Gamma distribution.

`predIntLnorm`,	Prediction interval for the next `k`
`predIntLnormAlt`	observations or geometric means from a
	Lognormal distribution.
`predIntLnormSimultaneous`,	Construct a simultaneous prediction interval for the
`predIntLnormAltSimultaneous`	next `r` sampling occasions based on a
	Lognormal distribution.

`predIntNorm`	Prediction interval for the next `k` observations
	or means from a Normal (Gaussian) distribution.
`predIntNormK`	Compute the value of `K` for a prediction interval
	for a Normal distribution.
`predIntNormSimultaneous`	Construct a simultaneous prediction interval for the
	next `r` sampling occasions based on a
	Normal distribution.
`predIntNormSimultaneousK`	Compute the value of `K` for a simultaneous
	prediction interval for the next `r` sampling
	occasions based on a Normal distribution.

`predIntNpar`	Nonparametric prediction interval for the next `k`
	of `K` observations.
`predIntNparSimultaneous`	Construct a nonparametric simultaneous prediction
	interval for the next `r` sampling occasions.

`predIntPois`	Prediction interval for the next `k` observations
	or sums from a Poisson distribution.

EnvStats Functions for Printing and Plotting Objects of Various S3 Classes

Description

The EnvStats functions listed below are printing and plotting methods for various S3 classes.

Details

Printing Methods

Function Name	Description
`print.boxcox`	Print an object that inherits from class `"boxcox"`.
`print.boxcoxCensored`	Print an object that inherits from class
	`"boxcoxCensored"`.
`print.boxcoxLm`	Print an object that inherits from class `"boxcoxLm"`.

`print.estimate`	Print an object that inherits from class `"estimate"`.
`print.estimateCensored`	Print an object that inherits from class
	`"estimateCensored"`.

`print.gof`	Print an object that inherits from class `"gof"`.
`print.gofCensored`	Print an object that inherits from class `"gofCensored"`.
`print.gofGroup`	Print an object that inherits from class `"gofGroup"`.
`print.gofTwoSample`	Print an object that inherits from class
	`"gofTwoSample"`.

`print.htest`	Print an object that inherits from class `"htest"`.
`print.htestCensored`	Print an object that inherits from class
	`"htestCensored"`.
`print.permutationTest`	Print an object that inherits from class
	`"permutationTest"`.

`print.summaryStats`	Print an object that inherits from class
	`"summaryStats"`.

Plotting Methods

Function Name	Description
`plot.boxcox`	Plot an object that inherits from class `"boxcox"`.
`plot.boxcoxCensored`	Plot an object that inherits from class `"boxcoxCensored"`.
`plot.boxcoxLm`	Plot an object that inherits from class `"boxcoxLm"`.

`plot.gof`	Plot an object that inherits from class `"gof"`.
`plot.gofCensored`	Plot an object that inherits from class `"gofCensored"`.
`plot.gofGroup`	Plot an object that inherits from class `"gofGroup"`.
`plot.gofTwoSample`	Plot an object that inherits from class `"gofTwoSample"`.

`plot.permutationTest`	Plot an object that inherits from class `"permutationTest"`.

EnvStats Probability Distributions and Random Numbers

Description

Listed below are all of the probability distributions available in R and EnvStats. Distributions with a description in bold are new ones that are part of EnvStats. For each distribution, there are functions for generating: values for the probability density function, values for the cumulative distribution function, quantiles, and random numbers.

The data frame Distribution.df contains information about all of these probability distributions.

Details

Distribution Abbreviation	Description
`beta`	Beta distribution.
`binom`	Binomial distribution.
`cauchy`	Cauchy distribution.
`chi`	Chi distribution.
`chisq`	Chi-squared distribution.
`exp`	Exponential distribution.
`evd`	Extreme value distribution.
`f`	F-distribution.
`gamma`	Gamma distribution.
`gammAlt`	Gamma distribution parameterized with mean and CV.
`gevd`	Generalized extreme value distribution.
`geom`	Geometric distribution.
`hyper`	Hypergeometric distribution.
`logis`	Logistic distribution.
`lnorm`	Lognormal distribution.
`lnormAlt`	Lognormal distribution parameterized with mean and CV.
`lnormMix`	Mixture of two lognormal distributions.
`lnormMixAlt`	Mixture of two lognormal distributions
	parameterized by their means and CVs.
`lnorm3`	Three-parameter lognormal distribution.
`lnormTrunc`	Truncated lognormal distribution.
`lnormTruncAlt`	Truncated lognormal distribution
	parameterized by mean and CV.
`nbinom`	Negative binomial distribution.
`norm`	Normal distribution.
`normMix`	Mixture of two normal distributions.
`normTrunc`	Truncated normal distribution.
`pareto`	Pareto distribution.
`pois`	Poisson distribution.
`t`	Student's t-distribution.
`tri`	Triangular distribution.
`unif`	Uniform distribution.
`weibull`	Weibull distribution.
`wilcox`	Wilcoxon rank sum distribution.
`zmlnorm`	Zero-modified lognormal (delta) distribution.
`zmlnormAlt`	Zero-modified lognormal (delta) distribution
	parameterized with mean and CV.
`zmnorm`	Zero-modified normal distribution.

In addition, the functions evNormOrdStats and evNormOrdStatsScalar compute expected values of order statistics from a standard normal distribution.

EnvStats Functions for Summary Statistics and Plots

Description

The EnvStats functions listed below create summary statistics and plots.

Details

Summary Statistics
R comes with several functions for computing summary statistics, including mean, var, median, range, quantile, and summary. The following functions in EnvStats complement these R functions.

Function Name	Description
`cv`	Coefficient of variation
`geoMean`	Geometric mean
`geoSD`	Geometric standard deviation
`iqr`	Interquartile range
`kurtosis`	Kurtosis
`lMoment`	`L`-moments
`pwMoment`	Probability-weighted moments
`skewness`	Skew
`summaryFull`	Extensive summary statistics
`summaryStats`	Summary statistics

Summary Plots
R comes with several functions for creating plots to summarize data, including hist, barplot, boxplot, dotchart, stripchart, and numerous others.

The help file Plotting Probability Distributions lists several EnvStats functions useful for producing summary plots as well.

In addition, the EnvStats function stripChart is a modification of stripchart that allows you to include summary statistics on the plot itself.

Finally, the help file Plotting Using ggplot2 lists several EnvStats functions for adding information to plots produced with the ggplot function, including the function geom_stripchart, which is an adaptation of the EnvStats function stripChart.

EnvStats Functions for Tolerance Intervals

Description

The EnvStats functions listed below are useful for computing tolerance intervals. See Power and Sample Size for a list of functions useful for computing power and sample size for a design based on a tolerance interval width.

Details

Function Name	Description
`tolIntGamma`,	Tolerance interval for a Gamma distribution.
`tolIntGammaAlt`

`tolIntLnorm`,	Tolerance interval for a lognormal distribution.
`tolIntLnormAlt`

`tolIntNorm`	Tolerance interval for a Normal (Gaussian) distribution.
`tolIntNormK`	Compute the constant `K` for a Normal (Gaussian)
	tolerance interval.

`tolIntNpar`	Nonparametric tolerance interval.

`tolIntPois`	Tolerance interval for a Poisson distribution.

EnvStats Functions for Trend Analysis

Description

See Hypothesis Tests.

The Generalized Extreme Value Distribution

Description

Density, distribution function, quantile function, and random generation for the generalized extreme value distribution.

Usage

  dgevd(x, location = 0, scale = 1, shape = 0)
  pgevd(q, location = 0, scale = 1, shape = 0)
  qgevd(p, location = 0, scale = 1, shape = 0)
  rgevd(n, location = 0, scale = 1, shape = 0)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

location

vector of location parameters.

scale

vector of positive scale parameters.

shape

vector of shape parameters.

Details

Let X be a generalized extreme value random variable with parameters location=\eta, scale=\theta, and shape=\kappa. When the shape parameter \kappa = 0, the generalized extreme value distribution reduces to the extreme value distribution. When the shape parameter \kappa \ne 0, the cumulative distribution function of X is given by:

F(x; \eta, \theta, \kappa) = exp\{-[1 - \kappa(x-\eta)/\theta]^{1/\kappa}\}

where -\infty < \eta, \kappa < \infty and \theta > 0. When \kappa > 0, the range of x is:

-\infty < x \le \eta + \theta/\kappa

and when \kappa < 0 the range of x is:

\eta + \theta/\kappa \le x < \infty

The p^th quantile of X is given by:

x_{p} = \eta + \frac{\theta \{1 - [-log(p)]^{\kappa}\}}{\kappa}

Value

density (devd), probability (pevd), quantile (qevd), or random sample (revd) for the generalized extreme value distribution with location parameter(s) determined by location, scale parameter(s) determined by scale, and shape parameter(s) determined by shape.

Note

Two-parameter extreme value distributions (EVD) have been applied extensively since the 1930's to several fields of study, including the distributions of hydrological and meteorological variables, human lifetimes, and strength of materials. The three-parameter generalized extreme value distribution (GEVD) was introduced by Jenkinson (1955) to model annual maximum and minimum values of meteorological events. Since then, it has been used extensively in the hydological and meteorological fields.

The three families of EVDs are all special kinds of GEVDs. When the shape parameter \kappa = 0, the GEVD reduces to the Type I extreme value (Gumbel) distribution. (The function zTestGevdShape allows you to test the null hypothesis that the shape parameter is equal to 0.) When \kappa > 0, the GEVD is the same as the Type II extreme value distribution, and when \kappa < 0 it is the same as the Type III extreme value distribution.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Jenkinson, A.F. (1955). The Frequency Distribution of the Annual Maximum (or Minimum) of Meteorological Events. Quarterly Journal of the Royal Meteorological Society, 81, 158–171.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a generalized extreme value distribution with 
  # location=0, scale=1, and shape=0, evaluated at 0.5: 

  dgevd(.5) 
  #[1] 0.3307043

  #----------

  # The cdf of a generalized extreme value distribution with 
  # location=1, scale=2, and shape=0.25, evaluated at 0.5: 

  pgevd(.5, 1, 2, 0.25) 
  #[1] 0.2795905

  #----------

  # The 90'th percentile of a generalized extreme value distribution with 
  # location=-2, scale=0.5, and shape=-0.25: 

  qgevd(.9, -2, 0.5, -0.25) 
  #[1] -0.4895683

  #----------

  # Random sample of 4 observations from a generalized extreme value 
  # distribution with location=5, scale=2, and shape=1. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rgevd(4, 5, 2, 1) 
  #[1] 6.738692 6.473457 4.446649 5.727085

The Gamma Distribution (Alternative Parameterization)

Description

Density, distribution function, quantile function, and random generation for the gamma distribution with parameters mean and cv.

Usage

  dgammaAlt(x, mean, cv = 1, log = FALSE)
  pgammaAlt(q, mean, cv = 1, lower.tail = TRUE, log.p = FALSE)
  qgammaAlt(p, mean, cv = 1, lower.tail = TRUE, log.p = FALSE)
  rgammaAlt(n, mean, cv = 1)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean

vector of (positive) means of the distribution of the random variable.

cv

vector of (positive) coefficients of variation of the random variable.

log, log.p

logical; if TRUE, probabilities/densities p are returned as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X \le x], otherwise, P[X > x].

Details

Let X be a random variable with a gamma distribution with parameters shape=\alpha and scale=\beta. The relationship between these parameters and the mean (mean=\mu) and coefficient of variation (cv=\tau) of this distribution is given by:

\alpha = \tau^{-2} \;\;\;\;\;\; (1)

\beta = \mu/\alpha \;\;\;\;\;\; (2)

\mu = \alpha\beta \;\;\;\;\;\; (3)

\tau = \alpha^{-1/2} \;\;\;\;\;\; (4)

Thus, the functions dgammaAlt, pgammaAlt, qgammaAlt, and rgammaAlt call the R functions dgamma, pgamma, qgamma, and rgamma, respectively, using the values for the shape and scale parameters given by: shape <- cv^-2, scale <- mean/shape.

Value

dgammaAlt gives the density, pgammaAlt gives the distribution function, qgammaAlt gives the quantile function, and rgammaAlt generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

Note

The gamma distribution takes values on the positive real line. Special cases of the gamma are the exponential distribution and the chi-square distribution. Applications of the gamma include life testing, statistical ecology, queuing theory, inventory control and precipitation processes. A gamma distribution starts to resemble a normal distribution as the shape parameter \alpha tends to infinity or the cv parameter \tau tends to 0.

Some EPA guidance documents (e.g., Singh et al., 2002; Singh et al., 2010a,b) discourage using the assumption of a lognormal distribution for some types of environmental data and recommend instead assessing whether the data appear to fit a gamma distribution.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions, Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Singh, A., A.K. Singh, and R.J. Iaci. (2002). Estimation of the Exposure Point Concentration Term Using a Gamma Distribution. EPA/600/R-02/084. October 2002. Technology Support Center for Monitoring and Site Characterization, Office of Research and Development, Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., R. Maichle, and N. Armbya. (2010a). ProUCL Version 4.1.00 User Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Examples

  # Density of a gamma distribution with parameters mean=10 and cv=2, 
  # evaluated at 7:

  dgammaAlt(7, mean = 10, cv = 2) 
  #[1] 0.02139335

  #----------

  # The cdf of a gamma distribution with parameters mean=10 and cv=2, 
  # evaluated at 12:

  pgammaAlt(12, mean = 10, cv = 2) 
  #[1] 0.7713307

  #----------

  # The 25'th percentile of a gamma distribution with parameters 
  # mean=10 and cv=2:

  qgammaAlt(0.25, mean = 10, cv = 2) 
  #[1] 0.1056871

  #----------

  # A random sample of 4 numbers from a gamma distribution with 
  # parameters mean=10 and cv=2. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(10) 
  rgammaAlt(4, mean = 10, cv = 2) 
  #[1] 3.772004230 1.889028078 0.002987823 8.179824976

Alkilinity Data from Gibbons et al. (2009)

Description

Alkilinity concentrations (mg/L) in groundwater.

Usage

data(Gibbons.et.al.09.Alkilinity.vec)

Format

A numeric vector with 27 elements.

Source

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley & Sons, Hoboken. Table 5.5, p. 107.

Vinyl Chloride Data from Gibbons et al. (2009)

Description

Vinyl chloride concentrations (mug/L) in groundwater from upgradient background monitoring wells.

Usage

data(Gibbons.et.al.09.Vinyl.Chloride.vec)

Format

A numeric vector with 34 elements.

Source

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley & Sons, Hoboken. Table 4.3, p. 87.

Ethylene Thiourea Dose-Response Data

Description

These data are the results of an experiment in which different groups of rats were exposed to different concentration levels of ethylene thiourea (ETU), which is a decomposition product of a certain class of fungicides that can be found in treated foods (Graham et al., 1975; Rodricks, 1992, p.133). In this experiment, the outcome of concern was the number of rats that developed thyroid tumors.

Usage

Graham.et.al.75.etu.df

Format

A data frame with 6 observations on the following 4 variables.

dose: a numeric vector of dose (ppm/day) of ETU.
tumors: a numeric vector indicating number of rats that developed thyroid tumors.
n: a numeric vector indicating the number of rats in the dose group.
proportion: a numeric vector indicating proportion of rats that developed thyroid tumors.

Source

Graham, S.L., K.J. Davis, W.H. Hansen, and C.H. Graham. (1975). Effects of Prolonged Ethylene Thiourea Ingestion on the Thyroid of the Rat. Food and Cosmetics Toxicology, 13(5), 493–499.

References

Rodricks, J.V. (1992). Calculated Risks: The Toxicity and Human Health Risks of Chemicals in Our Environment. Cambridge University Press, New York, p.133.

Adjusted Alpha Levels to Compute Confidence Intervals for the Mean of a Gamma Distribution

Description

Adjusted alpha levels to compute confidence intervals for the mean of a gamma distribution, as presented in Table 2 of Grice and Bain (1980).

Usage

data("Grice.Bain.80.mat")

Format

A matrix of dimensions 5 by 7, with the first dimension indicating the sample size (between 5 and Inf), and the second dimension indicating the assumed significance level associated with the confidence interval (between 0.005 and 0.25). The assumed confidence level is 1 - assumed significance level.

Details

See Grice and Bain (1980) and the help file for egamma for more information. The data in this matrix are used when the function egamma is called with ci.method="chisq.adj".

Source

Grice, J.V., and L.J. Bain. (1980). Inferences Concerning the Mean of the Gamma Distribution. Journal of the American Statistical Association 75, 929-933.

References

Grice, J.V., and L.J. Bain. (1980). Inferences Concerning the Mean of the Gamma Distribution. Journal of the American Statistical Association 75, 929-933.

USEPA. (2002). Estimation of the Exposure Point Concentration Term Using a Gamma Distribution. EPA/600/R-02/084. October 2002. Technology Support Center for Monitoring and Site Characterization, Office of Research and Development, Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2015). ProUCL Version 5.1.002 Technical Guide. EPA/600/R-07/041, October 2015. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C.

Examples

  # Look at Grice.Bain.80.mat

  Grice.Bain.80.mat
  #         alpha.eq.005 alpha.eq.01 alpha.eq.025 alpha.eq.05 alpha.eq.075
  #n.eq.5         0.0000      0.0000       0.0010      0.0086       0.0234
  #n.eq.10        0.0003      0.0015       0.0086      0.0267       0.0486
  #n.eq.20        0.0017      0.0046       0.0159      0.0380       0.0619
  #n.eq.40        0.0030      0.0070       0.0203      0.0440       0.0685
  #n.eq.Inf       0.0050      0.0100       0.0250      0.0500       0.0750

  #         alpha.eq.10 alpha.eq.25
  #n.eq.5        0.0432      0.2038
  #n.eq.10       0.0724      0.2294
  #n.eq.20       0.0866      0.2403
  #n.eq.40       0.0934      0.2453
  #n.eq.Inf      0.1000      0.2500

Example of Multiply Left-censored Data from Literature

Description

Made up multiply left-censored data. There are 9 observations out of a total of 18 that are reported as <DL, where DL denotes a detection limit. There are 2 distinct detection limits.

Usage

Helsel.Cohn.88.app.b.df

Format

A data frame with 18 observations on the following 3 variables.

Conc.orig: a character vector of original observations
Conc: a numeric vector of observations with censored values coded to censoring levels
Censored: a logical vector indicating which values are censored

Source

Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997–2004, Appendix B.

Silver Concentrations From An Interlab Comparison

Description

Silver concentrations (mg/L) from an interlab comparison. There are 34 observations out of a total of 56 that are reported as <DL, where DL denotes a detection limit. There are 12 distinct detection limits.

Usage

Helsel.Cohn.88.silver.df

Format

A data frame with 56 observations on the following 4 variables.

Ag.orig: a character vector of original silver concentrations (mg/L)
Ag: a numeric vector with nondetects coded to the detection limit
Censored: a logical vector indicating which observations are censored
log.Ag: the natural logarithm of Ag

Source

Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997–2004.

References

Janzer, V.J. (1986). Report of the U.S. Geological Survey's Analytical Evaluation Program–Standard Reference Water Samples M6, M94, T95, N16, P8, and SED3. Technical Report, Branch of Quality Assurance, U.S. Geological Survey, Arvada, CO.

Paired Counts of Mayfly Nymphs Above and Below Industrial Outfalls

Description

Counts of mayfly nymphs at low flow in 12 small streams. In each stream, counts were recorded above and below industrial outfalls.

Usage

data(Helsel.Hirsch.02.Mayfly.df)

Format

A data frame with 24 observations on the following 3 variables.

Mayfly.Count: Number of mayfly nymphs counted
Stream: a factor indicating the stream number
Location: a factor indicating the location of the count (above vs. below)

Source

Helsel, D.R., and R.M. Hirsch. (2002). Statistical Methods in Water Resources Research. Techniques of Water Resources Investigations, Book 4, Chapter A3. U.S. Geological Survey, 139–140. https://pubs.usgs.gov/tm/04/a03/tm4a3.pdf.

Abstract: Hosking et al. (1985)

Description

Detailed abstract of the manuscript:

Hosking, J.R.M., J.R. Wallis, and E.F. Wood. (1985). Estimation of the Generalized Extreme-Value Distribution by the Method of Probability-Weighted Moments. Technometrics 27(3), 251–261.

Details

Abstract
Hosking et al. (1985) use the method of probability-weighted moments, introduced by Greenwood et al. (1979), to estimate the parameters of the generalized extreme value distribution (GEVD) with parameters location=\eta, scale=\theta, and shape=\kappa. Hosking et al. (1985) derive the asymptotic distributions of the probability-weighted moment estimators (PWME), and compare the asymptotic and small-sample statistical properties (via computer simulation) of the PWME with maximum likelihood estimators (MLE) and Jenkinson's (1969) method of sextiles estimators (JSE). They also compare the statistical properties of quantile estimators (which are based on the distribution parameter estimators). Finally, they derive a test of the null hypothesis that the shape parameter is zero, and assess its performance via computer simulation.

Hosking et al. (1985) note that when \kappa \le -1, the moments and probability-weighted moments of the GEVD do not exist. They also note that in practice the shape parameter usually lies between -1/2 and 1/2.

Hosking et al. (1985) found that the asymptotic efficiency of the PWME (the limit as the sample size approaches infinity of the ratio of the variance of the MLE divided by the variance of the PWME) tends to 0 as the shape parameter approaches 1/2 or -1/2. For values of \kappa within the range [-0.2, 0.2], however, the efficiency of the estimator of location is close to 100 are greater than 70 Hosking et al. (1985) found that the asymptotic efficiency of the PWME is poor for \kappa outside the range [-0.2, 0.2].

For the small sample results, Hosking et al. (1985) considered several possible forms of the PWME (see equations (8)-(10) below). The best overall results were given by the plotting-position PWME defined by equations (9) and (10) with a=0.35 and b=0.

Small sample results for estimating the parameters show that for n \ge 50 all three methods give almost identical results. For n < 50 the results for the different estimators are a bit different, but not dramatically so. The MLE tends to be slightly less biased than the other two methods. For estimating the shape parameter, the MLE has a slightly larger standard deviation, and the PWME has consistently the smallest standard deviation.

Small sample results for estimating large quantiles show that for n \ge 100 all three methods are comparable. For n < 100 the PWME and JSE are comparable and in general have much smaller standard deviations than the MLE. All three methods are very inaccurate for estimating large quantiles in small samples, especially when \kappa < 0.

Hosking et al. (1985) derive a test of the null hypothesis H_0: \kappa=0 based on the PWME of \kappa. The test is performed by computing the statistic:

z = \frac{\hat{\kappa_{pwme}}}{\sqrt{0.5663/n}} \;\;\;\; (1)

and comparing z to a standard normal distribution (see zTestGevdShape). Based on computer simulations using the plotting-position PWME, they found that a sample size of n \ge 25 ensures an adequate normal approximation. They also found this test has power comparable to the modified likelihood-ratio test, which was found by Hosking (1984) to be the best overall test of H_0: \kappa=0 of the thirteen tests he considered.

More Details

Probability-Weighted Moments and Parameters of the GEVD
The definition of a probability-weighted moment, introduced by Greenwood et al. (1979), is as follows. Let X denote a random variable with cdf F, and let x(p) denote the p'th quantile of the distribution. Then the ijk'th probability-weighted moment is given by:

M(i, j, k) = E[X^i F^j (1 - F)^k] = \int^1_0 [x(F)]^i F^j (1 - F)^k \, dF \;\;\;\; (2)

where i, j, and k are real numbers.

Hosking et al. (1985) set

\beta_j = M(i, j, 0) \;\;\;\; (3)

and Greenwood et al. (1979) show that

\beta_j = \frac{1}{j+1} E[X_{j+1:j+1}] \;\;\;\; (4)

where

E[X_{j+1:j+1}]

denotes the expected value of the j+1'th order statistic (i.e., the maximum) in a sample of size j+1. Hosking et al. (1985) show that if X has a GEVD with parameters location=\eta, scale=\theta, and shape=\kappa, where \kappa \ne 0, then

\beta_j = \frac{1}{j+1} \{\eta + \frac{\theta [1 - (j+1)^{-\kappa} \Gamma(1+\kappa)]}{\kappa} \} \;\;\;\; (5)

for \kappa > -1, where \Gamma() denotes the gamma function. Thus,

\beta_0 = \eta + \frac{\theta [1 - \Gamma(1+\kappa)]}{\kappa} \;\;\;\; (6)

2\beta_1 - \beta_0 = \frac{\theta [\Gamma(1+\kappa)] (1 - 2^{-\kappa})}{\kappa} \;\;\;\; (7)

\frac{3\beta_2 - \beta_0}{2\beta_1 - \beta_0} = \frac{1 - 3^{-\kappa}}{1 - 2^{-kappa}} \;\;\;\; (8)

Estimating Distribution Parameters
Using the results of Landwehr et al. (1979), Hosking et al. (1985) show that given a random sample of n values from some arbitrary distribution, an unbiased, distribution-free, and parameter-free estimator of the probability-weighted moment \beta_j = M(i, j, 0) defined above is given by:

b_j = \frac{1}{n} \sum^n_{i=j+1} x_{i,n} \frac{{i-1 \choose j}}{{n-1 \choose j}} \;\;\;\; (9)

An alternative “plotting position” estimator is given by:

\hat{\beta}_j[p_{i,n}] = \frac{1}{n} \sum^n_{i=1} p^j_{i,n} x_{i,n} \;\;\;\; (10)

where

p_{i,n} = \hat{F}(x_{i,n}) \;\;\;\; (11)

denotes the plotting position of the i'th order statistic in the random sample of size n, that is, a distribution-free estimate of the cdf of X evaluated at the i'th order statistic. Typically, plotting positions have the form:

p_{i,n} = \frac{i-a}{n+b} \;\;\;\; (12)

where b > -a > -1. For this form of plotting position, the plotting-position estimators in (10) are asymptotically equivalent to the U-statistic estimators in (9).

Although the unbiased and plotting position estimators are asymptotically equivalent (Hosking, 1990), Hosking and Wallis (1995) recommend using the unbiased estimator for almost all applications because of its superior performance in small and moderate samples.

Using equations (6)-(8) above, i.e., the three equations involving \beta_0, \beta_1, and \beta_2, Hosking et al. (1985) define the probability-weighted moment estimators of \eta, \theta, and \kappa as the solutions to these three simultaneous equations, with the values of the probability-weighted moments replaced by their estimated values (using either the unbiased or plotting posistion estiamtors in (9) and (10) above). Hosking et al. (1985) note that the third equation (equation (8)) must be solved iteratively for the PWME of \kappa. Using the unbiased estimators of the PWMEs to solve for \kappa, the PWMEs of \eta and \theta are given by:

\hat{\eta}_{pwme} = b_0 + \frac{\hat{\theta}_{pwme} [\Gamma(1 + \hat{\kappa}_{pwme}) - 1]}{\hat{\kappa}_{pwme}} \;\;\;\; (13)

\hat{\theta}_{pwme} = \frac{(2b_1 - b_0)\hat{\kappa}_{pwme}}{\Gamma(1 + \hat{\kappa}_{pwme}) (1 - 2^{-\hat{\kappa}_{pwme}})} \;\;\;\; (14)

Hosking et al. (1985) show that when the unbiased estimates of the PWMEs are used to estimate the probability-weighted moments, the estimates of \theta and \kappa satisfy the feasibility criteria

\hat{\theta}_{pwme} > 0; \, \hat{\kappa}_{pwme} > -1

almost surely.

Hosking et al. (1985) show that the asymptotic distribution of the PWME is multivariate normal with mean equal to (\eta, \theta, \kappa), and they derive the formula for the asymptotic variance-covariance matrix as:

V_{\hat{\eta}, \hat{\theta}, \hat{\kappa}} = \frac{1}{n} G V_{\hat{\beta}_0, \hat{\beta}_1, \hat{\beta}_2} G^T \;\;\;\; (15)

where

V_{\hat{\beta}_0, \hat{\beta}_1, \hat{\beta}_2}

denotes the variance-covariance matrix of the estimators of the probability-weighted moments defined in either equation (9) or (10) above (recall that these two estimators are asymptotically equivalent), and the matrix G is defined by:

G_{i1} = \frac{\partial \eta}{\partial \beta_{i-1}}, \, G_{i2} = \frac{\partial \theta}{\partial \beta_{i-1}}, \, G_{i3} = \frac{\partial \kappa}{\partial \beta_{i-1}} \;\;\;\; (16)

for i = 1, 2, 3. Hosking et al. (1985) provide formulas for the matrix

V_{\hat{\beta}_0, \hat{\beta}_1, \hat{\beta}_2}

in Appendix C of their manuscript. Note that there is a typographical error in equation (C.11) (Jon Hosking, personal communication, 1996). In the second line of this equation, the quantity -(r+s)^{-k} should be replaced with -(r+s)^{-2k}.

The matrix G in equation (16) is not easily computed. Its inverse, however, is easy to compute and then can be inverted numerically (Jon Hosking, 1996, personal communication). The inverse of G is given by:

G^{-1}_{i1} = \frac{\partial \beta_{i-1}{\partial \eta}}, \, G^{-1}_{i2} = \frac{\partial \beta_{i-1}{\partial \theta}}, \, G^{-1}_{i3} = \frac{\partial \beta_{i-1}{\partial \kappa}} \;\;\;\; (17)

and by equation (5) above it can be shown that:

\frac{\partial \beta_j}{\partial \eta} = \frac{1}{j+1} \;\;\;\; (18)

\frac{\partial \beta_j}{\partial \theta} =\frac{1 - (j+1)^{-\kappa}\Gamma(1+\kappa)}{(j+1)\kappa} \;\;\;\; (19)

\frac{\partial \beta_j}{\partial \kappa} = \frac{\theta}{j+1} \{ \frac{(j+1)^{-\kappa}[log(j+1)\Gamma(1+\kappa)-\Gamma^{'}(1+\kappa)]}{\kappa} - \frac{1 - (j+1)^{-\kappa}\Gamma(1+\kappa)}{\kappa^2} \} \;\;\;\; (20)

for i = 1, 2, 3.

Estimating Distribution Quantiles
If X has a GEVD with parameters location=\eta, scale=\theta, and shape=\kappa, where \kappa \ne 0, then the p'th quantile of the distribution is given by:

x(p) = \eta + \frac{\theta \{1 - [-log(p)]^{\kappa} \}}{\kappa} \;\;\;\; (21)

(0 \le p \le 1). Given estimated values of the location, scale, and shape parameters, the p'th quantile of the distribution is estimated as:

\hat{x}(p) = \hat{\eta} + \frac{\hat{\theta} \{1 - [-log(p)]^{\hat{\kappa}} \}}{\hat{\kappa}} \;\;\;\; (22)

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Greenwood, J.A., J.M. Landwehr, N.C. Matalas, and J.R. Wallis. (1979). Probability Weighted Moments: Definition and Relation to Parameters of Several Distributions Expressible in Inverse Form. Water Resources Research 15(5), 1049–1054.

Hoeffding, W. (1948). A Class of Statistics with Asymptotically Normal Distribution. Annals of Mathematical Statistics 19, 293–325.

Hosking, J.R.M. (1985). Algorithm AS 215: Maximum-Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Applied Statistics 34(3), 301–310.

Hosking, J.R.M. (1990). L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. Journal of the Royal Statistical Society, Series B 52(1), 105–124.

Hosking, J.R.M., and J.R. Wallis (1995). A Comparison of Unbiased and Plotting-Position Estimators of L Moments. Water Resources Research 31(8), 2019–2025.

Jenkinson, A.F. (1969). Statistics of Extremes. Technical Note 98, World Meteorological Office, Geneva.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, pp.4-8.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Oakland, CA, 457pp.

Fecal Coliform Data from the Illinois River

Description

Lin and Evans (1980) reported fecal coliform measures (organisms per 100 ml) from the Illinois River taken between 1971 and 1976. The object Lin.Evans.80.df is a small subset of these data that were reported by Helsel and Hirsch (1992, p.162).

Usage

Lin.Evans.80.df

Format

A data frame with 24 observations on the following 2 variables.

Fecal.Coliform: a numeric vector of fecal coliform measure (organisms per 100 ml).
Season: an ordered factor indicating the season of collection

Source

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, p.162.

References

Lin, S.D., and R.L. Evans. (1980). Coliforms and fecal streptococcus in the Illinois River at Peoria, 1971-1976. Illinois State Water Survey Report of Investigations No. 93. Urbana, IL, 28pp.

The Three-Parameter Lognormal Distribution

Description

Density, distribution function, quantile function, and random generation for the three-parameter lognormal distribution with parameters meanlog, sdlog, and threshold.

Usage

  dlnorm3(x, meanlog = 0, sdlog = 1, threshold = 0)
  plnorm3(q, meanlog = 0, sdlog = 1, threshold = 0)
  qlnorm3(p, meanlog = 0, sdlog = 1, threshold = 0)
  rlnorm3(n, meanlog = 0, sdlog = 1, threshold = 0)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

meanlog

vector of means of the distribution of the random variable on the log scale. The default is meanlog=0.

sdlog

vector of (positive) standard deviations of the random variable on the log scale. The default is sdlog=1.

threshold

vector of thresholds of the random variable on the log scale. The default is threshold=0.

Details

The three-parameter lognormal distribution is simply the usual two-parameter lognormal distribution with a location shift.

Let X be a random variable with a three-parameter lognormal distribution with parameters meanlog=\mu, sdlog=\sigma, and threshold=\gamma. Then the random variable Y = X - \gamma has a lognormal distribution with parameters meanlog=\mu and sdlog=\sigma. Thus,

dlnorm3 calls dlnorm using the arguments x = x - threshold, meanlog = meanlog, sdlog = sdlog
plnorm3 calls plnorm using the arguments q = q - threshold, meanlog = meanlog, sdlog = sdlog
qlnorm3 calls qlnorm using the arguments q = q, meanlog = meanlog, sdlog = sdlog and then adds the argument threshold to the result.
rlnorm3 calls rlnorm using the arguments n = n, meanlog = meanlog, sdlog = sdlog and then adds the argument threshold to the result.

The threshold parameter \gamma affects only the location of the three-parameter lognormal distribution; it has no effect on the variance or the shape of the distribution.

Denote the mean, variance, and coefficient of variation of Y = X - \gamma by:

E(Y) = \theta

Var(Y) = \eta^2

CV(Y) = \tau = \eta/\theta

Then the mean, variance, and coefficient of variation of X are given by:

E(X) = \theta + \eta

Var(X) = \eta^2

CV(X) = \frac{\eta}{\theta + \gamma} = \frac{\tau \theta}{\theta + \gamma}

The relationships between the parameters \mu, \sigma, \theta, \eta, and \tau are as follows:

\theta = \beta \sqrt{\omega}

\eta = \beta \sqrt{\omega (\omega - 1)}

\tau = \sqrt{\omega - 1}

\mu = log(\frac{\theta}{\sqrt{\tau^2 + 1}})

\sigma = \sqrt{log(\tau^2 + 1)}

where

\beta = e^\mu, \omega = exp(\sigma^2)

Since quantiles of a distribution are preserved under monotonic transformations, the median of X is:

Median(X) = \gamma + \beta

Value

dlnorm3 gives the density, plnorm3 gives the distribution function, qlnorm3 gives the quantile function, and rlnorm3 generates random deviates.

Note

The two-parameter lognormal distribution is the distribution of a random variable whose logarithm is normally distributed. The two major characteristics of the two-parameter lognormal distribution are that it is bounded below at 0, and it is skewed to the right. The three-parameter lognormal distribution is a generalization of the two-parameter lognormal distribution in which the distribution is shifted so that the threshold parameter is some arbitrary number, not necessarily 0.

The three-parameter lognormal distribution was introduced by Wicksell (1917) in a study of the distribution of ages at first marriage. Both the two- and three-parameter lognormal distributions have been used in a variety of fields, including economics and business, industry, biology, ecology, atmospheric science, and geology (Crow and Shimizu, 1988). Royston (1992) has discussed the application of the three-parameter lognormal distribution in the field of medicine.

The two-parameter lognormal distribution is often used to characterize chemical concentrations in the environment. Ott (1990) has shown mathematically how a series of successive random dilutions gives rise to a distribution that can be approximated by a two-parameter lognormal distribution.

The three-pararameter lognormal distribution starts to resemble a normal distribution as the parameter \sigma (the standard deviation of log(X-\gamma) tends to 0.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special references to its uses in economics). Cambridge University Press, London, 176pp.

Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, 387pp.

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Ott, W.R. (1990). A Physical Explanation of the Lognormality of Pollutant Concentrations. Journal of the Air and Waste Management Association 40, 1378–1383.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL, Chapter 9.

Royston, J.P. (1992b). Estimation, Reference Ranges and Goodness of Fit for the Three-Parameter Log-Normal Distribution. Statistics in Medicine 11, 897–912.

Wicksell, S.D. (1917). On Logarithmic Correlation with an Application to the Distribution of Ages at First Marriage. Medd. Lunds. Astr. Obs. 84, 1–21.

Examples

  # Density of the three-parameter lognormal distribution with 
  # parameters meanlog=1, sdlog=2, and threshold=10, evaluated at 10.5:

  dlnorm3(10.5, 1, 2, 10) 
  #[1] 0.278794

  #----------

  # The cdf of the three-parameter lognormal distribution with 
  # parameters meanlog=2, sdlog=3, and threshold=5, evaluated at 9: 

  plnorm3(9, 2, 3, 5) 
  #[1] 0.4189546

  #----------

  # The median of the three-parameter lognormal distribution with 
  # parameters meanlog=2, sdlog=3, and threshold=20: 

  qlnorm3(0.5, 2, 3, 20) 
  #[1] 27.38906

  #----------

  # Random sample of 3 observations from the three-parameter lognormal 
  # distribution with parameters meanlog=2, sdlog=1, and threshold=-5. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rlnorm3(3, 2, 1, -5) 
  #[1] 18.6339749 -0.8873173 39.0561521

The Lognormal Distribution (Alternative Parameterization)

Description

Density, distribution function, quantile function, and random generation for the lognormal distribution with parameters mean and cv.

Usage

  dlnormAlt(x, mean = exp(1/2), cv = sqrt(exp(1) - 1), log = FALSE)
  plnormAlt(q, mean = exp(1/2), cv = sqrt(exp(1) - 1), 
      lower.tail = TRUE, log.p = FALSE)
  qlnormAlt(p, mean = exp(1/2), cv = sqrt(exp(1) - 1), 
      lower.tail = TRUE, log.p = FALSE)
  rlnormAlt(n, mean = exp(1/2), cv = sqrt(exp(1) - 1))

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean

vector of (positive) means of the distribution of the random variable.

cv

vector of (positive) coefficients of variation of the random variable.

log, log.p

logical; if TRUE, probabilities/densities p are returned as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X \le x], otherwise, P[X > x].

Details

Let X be a random variable with a lognormal distribution with parameters meanlog=\mu and sdlog=\sigma. That is, \mu and \sigma denote the mean and standard deviation of the random variable on the log scale. The relationship between these parameters and the mean (mean=\theta) and coefficient of variation (cv=\tau) of the distribution on the original scale is given by:

\mu = log(\frac{\theta}{\sqrt{\tau^2 + 1}}) \;\;\;\; (1)

\sigma = [log(\tau^2 + 1)]^{1/2} \;\;\;\; (2)

\theta = exp[\mu + (\sigma^2/2)] \;\;\;\; (3)

\tau = [exp(\sigma^2) - 1]^{1/2} \;\;\;\; (4)

Thus, the functions dlnormAlt, plnormAlt, qlnormAlt, and rlnormAlt call the R functions dlnorm, plnorm, qlnorm, and rlnorm, respectively using the following values for the meanlog and sdlog parameters:
sdlog <- sqrt(log(1 + cv^2)),
meanlog <- log(mean) - (sdlog^2)/2

Value

dlnormAlt gives the density, plnormAlt gives the distribution function, qlnormAlt gives the quantile function, and rlnormAlt generates random deviates.

Note

The two-parameter lognormal distribution is the distribution of a random variable whose logarithm is normally distributed. The two major characteristics of the lognormal distribution are that it is bounded below at 0, and it is skewed to the right.

Because the empirical distribution of many variables is inherently positive and skewed to the right (e.g., size of organisms, amount of rainfall, size of income, etc.), the lognormal distribution has been widely applied in several fields, including economics, business, industry, biology, ecology, atmospheric science, and geology (Aitchison and Brown, 1957; Crow and Shimizu, 1988).

Gibrat (1930) derived the lognormal distribution from theoretical assumptions, calling it the "law of proportionate effect", but Kapteyn (1903) had described a machine that was the mechanical equivalent. The basic idea is that the Central Limit Theorem states that the distribution of the sum of several independent random variables tends to look like a normal distribution, no matter what the underlying distribution(s) of the original random variables, hence the product of several independent random variables tends to look like a lognormal distribution.

The lognormal distribution is often used to characterize chemical concentrations in the environment. Ott (1990) has shown mathematically how a series of successive random dilutions gives rise to a distribution that can be approximated by a lognormal distribution.

A lognormal distribution starts to resemble a normal distribution as the parameter \sigma (the standard deviation of the log of the distribution) tends to 0.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Limpert, E., W.A. Stahel, and M. Abbt. (2001). Log-Normal Distributions Across the Sciences: Keys and Clues. BioScience 51, 341–352.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

Examples

  # Density of the lognormal distribution with parameters 
  # mean=10 and cv=1, evaluated at 5: 

  dlnormAlt(5, mean = 10, cv = 1) 
  #[1] 0.08788173

  #----------

  # The cdf of the lognormal distribution with parameters mean=2 and cv=3, 
  # evaluated at 4: 

  plnormAlt(4, 2, 3) 
  #[1] 0.8879132

  #----------

  # The median of the lognormal distribution with parameters 
  # mean=10 and cv=1: 

  qlnormAlt(0.5, mean = 10, cv = 1) 
  #[1] 7.071068

  #----------

  # Random sample of 3 observations from a lognormal distribution with 
  # parameters mean=10 and cv=1. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rlnormAlt(3, mean = 10, cv = 1) 
  #[1] 18.615797  4.341402 31.265293

Mixture of Two Lognormal Distributions

Description

Density, distribution function, quantile function, and random generation for a mixture of two lognormal distribution with parameters meanlog1, sdlog1, meanlog2, sdlog2, and p.mix.

Usage

  dlnormMix(x, meanlog1 = 0, sdlog1 = 1, meanlog2 = 0, sdlog2 = 1, p.mix = 0.5)
  plnormMix(q, meanlog1 = 0, sdlog1 = 1, meanlog2 = 0, sdlog2 = 1, p.mix = 0.5) 
  qlnormMix(p, meanlog1 = 0, sdlog1 = 1, meanlog2 = 0, sdlog2 = 1, p.mix = 0.5) 
  rlnormMix(n, meanlog1 = 0, sdlog1 = 1, meanlog2 = 0, sdlog2 = 1, p.mix = 0.5)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

meanlog1

vector of means of the first lognormal random variable on the log scale. The default is meanlog1=0.

sdlog1

vector of standard deviations of the first lognormal random variable on the log scale. The default is sdlog1=1.

meanlog2

vector of means of the second lognormal random variable on the log scale. The default is meanlog2=0.

sdlog2

vector of standard deviations of the second lognormal random variable on the log scale. The default is sdlog2=1.

p.mix

vector of probabilities between 0 and 1 indicating the mixing proportion. For rlnormMix this must be a single, non-missing number.

Details

Let f(x; \mu, \sigma) denote the density of a lognormal random variable with parameters meanlog=\mu and sdlog=\sigma. The density, g, of a lognormal mixture random variable with parameters meanlog1=\mu_1, sdlog1=\sigma_1, meanlog2=\mu_2, sdlog2=\sigma_2, and p.mix=p is given by:

g(x; \mu_1, \sigma_1, \mu_2, \sigma_2, p) = (1 - p) f(x; \mu_1, \sigma_1) + p f(x; \mu_2, \sigma_2)

Value

dlnormMix gives the density, plnormMix gives the distribution function, qlnormMix gives the quantile function, and rlnormMix generates random deviates.

Note

A lognormal mixture distribution is often used to model positive-valued data that appear to be “contaminated”; that is, most of the values appear to come from a single lognormal distribution, but a few “outliers” are apparent. In this case, the value of meanlog2 would be larger than the value of meanlog1, and the mixing proportion p.mix would be fairly close to 0 (e.g., p.mix=0.1). The value of the second standard deviation (sdlog2) may or may not be the same as the value for the first (sdlog1).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135-146.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, pp.53-54, and Chapter 8.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a lognormal mixture with parameters meanlog1=0, sdlog1=1, 
  # meanlog2=2, sdlog2=3, p.mix=0.5, evaluated at 1.5: 

  dlnormMix(1.5, meanlog1 = 0, sdlog1 = 1, meanlog2 = 2, sdlog2 = 3, p.mix = 0.5) 
  #[1] 0.1609746

  #----------

  # The cdf of a lognormal mixture with parameters meanlog1=0, sdlog1=1, 
  # meanlog2=2, sdlog2=3, p.mix=0.2, evaluated at 4: 

  plnormMix(4, 0, 1, 2, 3, 0.2) 
  #[1] 0.8175281

  #----------

  # The median of a lognormal mixture with parameters meanlog1=0, sdlog1=1, 
  # meanlog2=2, sdlog2=3, p.mix=0.2: 

  qlnormMix(0.5, 0, 1, 2, 3, 0.2) 
  #[1] 1.156891

  #----------

  # Random sample of 3 observations from a lognormal mixture with 
  # parameters meanlog1=0, sdlog1=1, meanlog2=3, sdlog2=4, p.mix=0.2. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rlnormMix(3, 0, 1, 2, 3, 0.2) 
  #[1] 0.08975283 1.07591103 7.85482514

Mixture of Two Lognormal Distributions (Alternative Parameterization)

Description

Density, distribution function, quantile function, and random generation for a mixture of two lognormal distribution with parameters mean1, cv1, mean2, cv2, and p.mix.

Usage

  dlnormMixAlt(x, mean1 = exp(1/2), cv1 = sqrt(exp(1) - 1), 
      mean2 = exp(1/2), cv2 = sqrt(exp(1) - 1), p.mix = 0.5)
  plnormMixAlt(q, mean1 = exp(1/2), cv1 = sqrt(exp(1) - 1), 
      mean2 = exp(1/2), cv2 = sqrt(exp(1) - 1), p.mix = 0.5) 
  qlnormMixAlt(p, mean1 = exp(1/2), cv1 = sqrt(exp(1) - 1), 
      mean2 = exp(1/2), cv2 = sqrt(exp(1) - 1), p.mix = 0.5) 
  rlnormMixAlt(n, mean1 = exp(1/2), cv1 = sqrt(exp(1) - 1), 
      mean2 = exp(1/2), cv2 = sqrt(exp(1) - 1), p.mix = 0.5)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean1

vector of means of the first lognormal random variable. The default is
meanlog1=sqrt(exp(1) - 1).

cv1

vector of coefficient of variations of the first lognormal random variable. The default is sdlog1=sqrt(exp(1) - 1).

mean2

vector of means of the second lognormal random variable. The default is
mean2=sqrt(exp(1) - 1).

cv2

vector of coefficient of variations of the second lognormal random variable. The default is sdlog2=sqrt(exp(1) - 1).

p.mix

vector of probabilities between 0 and 1 indicating the mixing proportion. For rlnormMixAlt this must be a single, non-missing number.

Details

Let f(x; \eta, \theta) denote the density of a lognormal random variable with parameters mean=\eta and cv=\theta. The density, g, of a lognormal mixture random variable with parameters mean1=\eta_1, cv1=\theta_1, mean2=\eta_2, cv2=\theta_2, and p.mix=p is given by:

g(x; \eta_1, \theta_1, \eta_2, \theta_2, p) = (1 - p) f(x; \eta_1, \theta_1) + p f(x; \eta_2, \theta_2)

The default values for mean1 and cv1 correspond to a lognormal distribution with parameters meanlog=0 and sdlog=1. Similarly for the default values of mean2 and cv2.

Value

dlnormMixAlt gives the density, plnormMixAlt gives the distribution function, qlnormMixAlt gives the quantile function, and rlnormMixAlt generates random deviates.

Note

A lognormal mixture distribution is often used to model positive-valued data that appear to be “contaminated”; that is, most of the values appear to come from a single lognormal distribution, but a few “outliers” are apparent. In this case, the value of mean2 would be larger than the value of mean1, and the mixing proportion p.mix would be fairly close to 0 (e.g., p.mix=0.1).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135-146.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, pp.53-54, and Chapter 8.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a lognormal mixture with parameters mean=2, cv1=3, 
  # mean2=4, cv2=5, p.mix=0.5, evaluated at 1.5: 

  dlnormMixAlt(1.5, mean1 = 2, cv1 = 3, mean2 = 4, cv2 = 5, p.mix = 0.5) 
  #[1] 0.1436045

  #----------

  # The cdf of a lognormal mixture with parameters mean=2, cv1=3, 
  # mean2=4, cv2=5, p.mix=0.5, evaluated at 1.5: 

  plnormMixAlt(1.5, mean1 = 2, cv1 = 3, mean2 = 4, cv2 = 5, p.mix = 0.5) 
  #[1] 0.6778064

  #----------

  # The median of a lognormal mixture with parameters mean=2, cv1=3, 
  # mean2=4, cv2=5, p.mix=0.5: 

  qlnormMixAlt(0.5, 2, 3, 4, 5, 0.5) 
  #[1] 0.6978355

  #----------

  # Random sample of 3 observations from a lognormal mixture with 
  # parameters mean1=2, cv1=3, mean2=4, cv2=5, p.mix=0.5. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rlnormMixAlt(3, 2, 3, 4, 5, 0.5) 
  #[1]  0.70672151 14.43226313  0.05521329

The Truncated Lognormal Distribution

Description

Density, distribution function, quantile function, and random generation for the truncated lognormal distribution with parameters meanlog, sdlog, min, and max.

Usage

  dlnormTrunc(x, meanlog = 0, sdlog = 1, min = 0, max = Inf)
  plnormTrunc(q, meanlog = 0, sdlog = 1, min = 0, max = Inf)
  qlnormTrunc(p, meanlog = 0, sdlog = 1, min = 0, max = Inf)
  rlnormTrunc(n, meanlog = 0, sdlog = 1, min = 0, max = Inf)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

meanlog

vector of means of the distribution of the non-truncated random variable on the log scale. The default is meanlog=0.

sdlog

vector of (positive) standard deviations of the non-truncated random variable on the log scale. The default is sdlog=1.

min

vector of minimum values for truncation on the left. The default value is min=0.

max

vector of maximum values for truncation on the right. The default value is max=Inf.

Details

See the help file for the lognormal distribution for information about the density and cdf of a lognormal distribution.

Probability Density and Cumulative Distribution Function
Let X denote a random variable with density function f(x) and cumulative distribution function F(x), and let Y denote the truncated version of X where Y is truncated below at min=A and above atmax=B. Then the density function of Y, denoted g(y), is given by:

g(y) = frac{f(y)}{F(B) - F(A)}, A \le y \le B

and the cdf of Y, denoted G(y), is given by:

`G(y) =`	0	for `y < A`
	`\frac{F(y) - F(A)}{F(B) - F(A)}`	for `A \le y \le B`
	1	for `y > B`

Quantiles
The p^{th} quantile y_p of Y is given by:

`y_p =`	`A`	for `p = 0`
	`F^{-1}\{p[F(B) - F(A)] + F(A)\}`	for `0 < p < 1`
	`B`	for `p = 1`

Random Numbers
Random numbers are generated using the inverse transformation method:

y = G^{-1}(u)

where u is a random deviate from a uniform [0, 1] distribution.

Value

dlnormTrunc gives the density, plnormTrunc gives the distribution function, qlnormTrunc gives the quantile function, and rlnormTrunc generates random deviates.

Note

A truncated lognormal distribution is sometimes used as an input distribution for probabilistic risk assessment.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Schneider, H. (1986). Truncated and Censored Samples from Normal Populations. Marcel Dekker, New York, Chapter 2.

Examples

  # Density of a truncated lognormal distribution with parameters 
  # meanlog=1, sdlog=0.75, min=0, max=10, evaluated at 2 and 4:

  dlnormTrunc(c(2, 4), 1, 0.75, 0, 10) 
  #[1] 0.2551219 0.1214676

  #----------

  # The cdf of a truncated lognormal distribution with parameters 
  # meanlog=1, sdlog=0.75, min=0, max=10, evaluated at 2 and 4:

  plnormTrunc(c(2, 4), 1, 0.75, 0, 10) 
  #[1] 0.3558867 0.7266934

  #----------

  # The median of a truncated lognormal distribution with parameters 
  # meanlog=1, sdlog=0.75, min=0, max=10:

  qlnormTrunc(.5, 1, 0.75, 0, 10) 
  #[1] 2.614945

  #----------

  # A random sample of 3 observations from a truncated lognormal distribution 
  # with parameters meanlog=1, sdlog=0.75, min=0, max=10. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rlnormTrunc(3, 1, 0.75, 0, 10) 
  #[1] 5.754805 4.372218 1.706815

The Truncated Lognormal Distribution (Alternative Parameterization)

Description

Density, distribution function, quantile function, and random generation for the truncated lognormal distribution with parameters mean, cv, min, and max.

Usage

  dlnormTruncAlt(x, mean = exp(1/2), cv = sqrt(exp(1) - 1), min = 0, max = Inf)
  plnormTruncAlt(q, mean = exp(1/2), cv = sqrt(exp(1) - 1), min = 0, max = Inf)
  qlnormTruncAlt(p, mean = exp(1/2), cv = sqrt(exp(1) - 1), min = 0, max = Inf)
  rlnormTruncAlt(n, mean = exp(1/2), cv = sqrt(exp(1) - 1), min = 0, max = Inf)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean

vector of means of the distribution of the non-truncated random variable. The default is mean=exp(1/2).

cv

vector of (positive) coefficient of variations of the non-truncated random variable. The default is cv=sqrt(exp(1)-1).

min

vector of minimum values for truncation on the left. The default value is min=0.

max

vector of maximum values for truncation on the right. The default value is max=Inf.

Details

See the help file for LognormalAlt for information about the density and cdf of a lognormal distribution with this alternative parameterization.

Let X denote a random variable with density function f(x) and cumulative distribution function F(x), and let Y denote the truncated version of X where Y is truncated below at min=A and above atmax=B. Then the density function of Y, denoted g(y), is given by:

g(y) = frac{f(y)}{F(B) - F(A)}, A \le y \le B

and the cdf of Y, denoted G(y), is given by:

`G(y) =`	0	for `y < A`
	`\frac{F(y) - F(A)}{F(B) - F(A)}`	for `A \le y \le B`
	1	for `y > B`

The p^{th} quantile y_p of Y is given by:

`y_p =`	`A`	for `p = 0`
	`F^{-1}\{p[F(B) - F(A)] + F(A)\}`	for `0 < p < 1`
	`B`	for `p = 1`

Random numbers are generated using the inverse transformation method:

y = G^{-1}(u)

where u is a random deviate from a uniform [0, 1] distribution.

Value

dlnormTruncAlt gives the density, plnormTruncAlt gives the distribution function, qlnormTruncAlt gives the quantile function, and rlnormTruncAlt generates random deviates.

Note

A truncated lognormal distribution is sometimes used as an input distribution for probabilistic risk assessment.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Schneider, H. (1986). Truncated and Censored Samples from Normal Populations. Marcel Dekker, New York, Chapter 2.

Examples

  # Density of a truncated lognormal distribution with parameters 
  # mean=10, cv=1, min=0, max=20, evaluated at 2 and 12:

  dlnormTruncAlt(c(2, 12), 10, 1, 0, 20) 
  #[1] 0.08480874 0.03649884

  #----------

  # The cdf of a truncated lognormal distribution with parameters 
  # mean=10, cv=1, min=0, max=20, evaluated at 2 and 12:

  plnormTruncAlt(c(2, 4), 10, 1, 0, 20) 
  #[1] 0.07230627 0.82467603

  #----------

  # The median of a truncated lognormal distribution with parameters 
  # mean=10, cv=1, min=0, max=20:

  qlnormTruncAlt(.5, 10, 1, 0, 20) 
  #[1] 6.329505

  #----------

  # A random sample of 3 observations from a truncated lognormal distribution 
  # with parameters mean=10, cv=1, min=0, max=20. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rlnormTruncAlt(3, 10, 1, 0, 20) 
  #[1]  6.685391 17.445387 18.543553

Copper and Zinc Concentrations in Shallow Ground Water

Description

Copper and zinc concentrations (mg/L) in shallow ground water from two geological zones (Alluvial Fan and Basin-Trough) in the San Joaquin Valley, CA. There are 68 samples from the Alluvial Fan zone and 50 from the Basin-Trough zone. Some observations are reported as <DL, where DL denotes a detection limit. There are multiple detection limits for both the copper and zinc data in each of the geological zones.

Usage

Millard.Deverel.88.df

Format

A data frame with 118 observations on the following 8 variables.

Cu.orig: a character vector of original copper concentrations (mg/L)
Cu: a numeric vector of copper concentrations with nondetects coded to their detection limit
Cu.censored: a logical vector indicating which copper concentrations are censored
Zn.orig: a character vector of original zinc concentrations (mg/L)
Zn: a numeric vector of zinc concentrations with nondetects coded to their detection limit
Zn.censored: a logical vector indicating which zinc concentrations are censored
Zone: a factor indicating the zone (alluvial fan vs. basin trough)
Location: a numeric vector indicating the sampling location

Source

Millard, S.P., and S.J. Deverel. (1988). Nonparametric Statistical Methods for Comparing Two Sites Based on Data With Multiple Nondetect Limits. Water Resources Research, 24(12), 2087-2098.

References

Deverel, S.J., R.J. Gilliom, R. Fujii, J.A. Izbicki, and J.C. Fields. (1984). Areal Distribution of Selenium and Other Inorganic Constituents in Shallow Ground Water of the San Luis Drain Service Area, San Joaquin, California: A Preliminary Study. U.S. Geological Survey Water Resources Investigative Report 84-4319.

Modified 1,2,3,4-Tetrachlorobenzene Data with Censored Values

Description

Artificial 1,2,3,4-Tetrachlorobenzene (TcCB) concentrations with censored values; based on the reference area data stored in EPA.94b.tccb.df. The data frame EPA.94b.tccb.df contains TcCB concentrations (ppb) in soil samples at a reference area and a cleanup area. The data frame
Modified.TcCB.df contains a modified version of the data from the reference area. For this data set, the concentrations of TcCB less than 0.5 ppb have been recoded as <0.5.

Usage

Modified.TcCB.df

Format

A data frame with 47 observations on the following 3 variables.

TcCB.orig: a character vector of original TcCB concentrations (ppb)
TcCB: a numeric vector with censored observations set to their detection level
Censored: a logical vector indicating which observations are censored

Source

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, p.595.

References

NIOSH Air Lead Levels Data

Description

Air lead levels collected by the National Institute for Occupational Safety and Health (NIOSH) at 15 different areas within the Alma American Labs, Fairply, CO, for health hazard evaluation (HETA 89-052) on Februay 23, 1989.

Usage

NIOSH.89.air.lead.vec

Format

A numeric vector with 15 elements containing air lead concentrations (\mu g/m^3).

Source

Krishnamoorthy, K., T. Matthew, and G. Ramachandran. (2006). Generalized P-Values and Confidence Intervals: A Novel Approach for Analyzing Lognormally Distributed Exposure Data. Journal of Occupational and Environmental Hygiene, 3, 642–650.

References

Zou, G.Y., C.Y. Huo, and J. Taleban. (2009). Simple Confidence Intervals for Lognormal Means and their Differences with Environmental Applications. Environmetrics, 20, 172–180.

Mixture of Two Normal Distributions

Description

Density, distribution function, quantile function, and random generation for a mixture of two normal distribution with parameters mean1, sd1, mean2, sd2, and p.mix.

Usage

  dnormMix(x, mean1 = 0, sd1 = 1, mean2 = 0, sd2 = 1, p.mix = 0.5)
  pnormMix(q, mean1 = 0, sd1 = 1, mean2 = 0, sd2 = 1, p.mix = 0.5)
  qnormMix(p, mean1 = 0, sd1 = 1, mean2 = 0, sd2 = 1, p.mix = 0.5)
  rnormMix(n, mean1 = 0, sd1 = 1, mean2 = 0, sd2 = 1, p.mix = 0.5)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean1

vector of means of the first normal random variable. The default is mean1=0.

sd1

vector of standard deviations of the first normal random variable. The default is sd1=1.

mean2

vector of means of the second normal random variable. The default is mean2=0.

sd2

vector of standard deviations of the second normal random variable. The default is sd2=1.

p.mix

vector of probabilities between 0 and 1 indicating the mixing proportion. For rnormMix this must be a single, non-missing number.

Details

Let f(x; \mu, \sigma) denote the density of a normal random variable with parameters mean=\mu and sd=\sigma. The density, g, of a normal mixture random variable with parameters mean1=\mu_1, sd1=\sigma_1, mean2=\mu_2, sd2=\sigma_2, and p.mix=p is given by:

g(x; \mu_1, \sigma_1, \mu_2, \sigma_2, p) = (1 - p) f(x; \mu_1, \sigma_1) + p f(x; \mu_2, \sigma_2)

Value

dnormMix gives the density, pnormMix gives the distribution function, qnormMix gives the quantile function, and rnormMix generates random deviates.

Note

A normal mixture distribution is sometimes used to model data that appear to be “contaminated”; that is, most of the values appear to come from a single normal distribution, but a few “outliers” are apparent. In this case, the value of mean2 would be larger than the value of mean1, and the mixing proportion p.mix would be fairly close to 0 (e.g., p.mix=0.1). The value of the second standard deviation (sd2) may or may not be the same as the value for the first (sd1).

Another application of the normal mixture distribution is to bi-modal data; that is, data exhibiting two modes.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, pp.53-54, and Chapter 8.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a normal mixture with parameters mean1=0, sd1=1, 
  #  mean2=4, sd2=2, p.mix=0.5, evaluated at 1.5: 

  dnormMix(1.5, mean2=4, sd2=2) 
  #[1] 0.1104211

  #----------

  # The cdf of a normal mixture with parameters mean1=10, sd1=2, 
  # mean2=20, sd2=2, p.mix=0.1, evaluated at 15: 

  pnormMix(15, 10, 2, 20, 2, 0.1) 
  #[1] 0.8950323

  #----------

  # The median of a normal mixture with parameters mean1=10, sd1=2, 
  # mean2=20, sd2=2, p.mix=0.1: 

  qnormMix(0.5, 10, 2, 20, 2, 0.1) 
  #[1] 10.27942

  #----------

  # Random sample of 3 observations from a normal mixture with 
  # parameters mean1=0, sd1=1, mean2=4, sd2=2, p.mix=0.5. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rnormMix(3, mean2=4, sd2=2)
  #[1] 0.07316778 2.06112801 1.05953620

The Truncated Normal Distribution

Description

Density, distribution function, quantile function, and random generation for the truncated normal distribution with parameters mean, sd, min, and max.

Usage

  dnormTrunc(x, mean = 0, sd = 1, min = -Inf, max = Inf)
  pnormTrunc(q, mean = 0, sd = 1, min = -Inf, max = Inf)
  qnormTrunc(p, mean = 0, sd = 1, min = -Inf, max = Inf)
  rnormTrunc(n, mean = 0, sd = 1, min = -Inf, max = Inf)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean

vector of means of the distribution of the non-truncated random variable. The default is mean=0.

sd

vector of (positive) standard deviations of the non-truncated random variable. The default is sd=1.

min

vector of minimum values for truncation on the left. The default value is min=-Inf.

max

vector of maximum values for truncation on the right. The default value is max=Inf.

Details

See the help file for the normal distribution for information about the density and cdf of a normal distribution.

g(y) = frac{f(y)}{F(B) - F(A)}, A \le y \le B

and the cdf of Y, denoted G(y), is given by:

`G(y) =`	0	for `y < A`
	`\frac{F(y) - F(A)}{F(B) - F(A)}`	for `A \le y \le B`
	1	for `y > B`

Quantiles
The p^{th} quantile y_p of Y is given by:

`y_p =`	`A`	for `p = 0`
	`F^{-1}\{p[F(B) - F(A)] + F(A)\}`	for `0 < p < 1`
	`B`	for `p = 1`

Random Numbers
Random numbers are generated using the inverse transformation method:

y = G^{-1}(u)

where u is a random deviate from a uniform [0, 1] distribution.

Mean and Variance
The expected value of a truncated normal random variable with parameters mean=\mu, sd=\sigma, min=A, and max=B is given by:

E(Y) = \mu + \sigma^2 \frac{f(A) - f(B)}{F(B) - F(A)}

(Johnson et al., 1994, p.156; Schneider, 1986, p.17).

The variance of this random variable is given by:

\sigma^2 + \sigma^3 \{z_A f(A) - z_B f(B) - \sigma[f(A) - f(B)]^2 \}

where

z_A = \frac{A - \mu}{\sigma}; \, z_B = \frac{B - \mu}{\sigma}

(Johnson et al., 1994, p.158; Schneider, 1986, p.17).

Value

dnormTrunc gives the density, pnormTrunc gives the distribution function, qnormTrunc gives the quantile function, and rnormTrunc generates random deviates.

Note

A truncated normal distribution is sometimes used as an input distribution for probabilistic risk assessment.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Schneider, H. (1986). Truncated and Censored Samples from Normal Populations. Marcel Dekker, New York, Chapter 2.

Examples

  # Density of a truncated normal distribution with parameters 
  # mean=10, sd=2, min=8, max=13, evaluated at 10 and 11.5:

  dnormTrunc(c(10, 11.5), 10, 2, 8, 13) 
  #[1] 0.2575358 0.1943982

  #----------

  # The cdf of a truncated normal distribution with parameters 
  # mean=10, sd=2, min=8, max=13, evaluated at 10 and 11.5:

  pnormTrunc(c(10, 11.5), 10, 2, 8, 13) 
  #[1] 0.4407078 0.7936573

  #----------

  # The median of a truncated normal distribution with parameters 
  # mean=10, sd=2, min=8, max=13:

  qnormTrunc(.5, 10, 2, 8, 13) 
  #[1] 10.23074

  #----------

  # A random sample of 3 observations from a truncated normal distribution 
  # with parameters mean=10, sd=2, min=8, max=13. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rnormTrunc(3, 10, 2, 8, 13) 
  #[1] 11.975223 11.373711  9.361258

Ammonium Concentration in Precipitation Measured at Olympic National Park Hoh Ranger Station

Description

Ammonium (NH_4) concentration (mg/L) in precipitation measured at Olympic National Park, Hoh Ranger Station (WA14), weekly or every other week from January 6, 2009 through December 20, 2011.

Usage

Olympic.NH4.df

Format

A data frame with 102 observations on the following 6 variables.

Date.On: Start of collection period. Date on which the sample bucket was installed on the collector.
Date.Off: End of collection period. Date on which the sample bucket was removed from the collector.
Week: a numeric vector indicating the cumulative week number starting from January 1, 2009.
NH4.Orig.mg.per.L: a character vector of the original NH_4 concentrations reported either as the observed value or less than some detection limit. For values reported as less than a detection limit, the value reported is the actual limit of detection or, in the case of a diluted sample, the product of the detection limit value and the dilution factor.
NH4.mg.per.L: a numeric vector of NH_4 concentrations with non-detects coded to their detection limit.
Censored: a logical vector indicating which observations are censored.

Details

Station: Olympic National Park-Hoh Ranger Station (WA14)
Location: Jefferson County, Washington
Latitude: 47.8597
Longitude: -123.9325
Elevation: 182 meters
USGS 1:24000 Map Name: Owl Mountain
Operating Agency: Olympic National Park
Sponsoring Agency: NPS-Air Resources Division

Source

National Atmospheric Deposition Program, National Trends Network (NADP/NTN).
https://nadp.slh.wisc.edu/sites/ntn-WA14/

Ozone Concentrations in the Northeast U.S.

Description

Ozone concentrations in 41 U.S. cities based on daily maxima collected between June and August 1974.

Usage

Ozone.NE.df

Format

A data frame with 41 observations on the following 5 variables.

Median: median of daily maxima ozone concentration (ppb).
Quartile: Upper quartile (i.e., 75th percentile) of daily maxima ozone concentration (ppb).
City: a factor indicating the city
Longitude: negative longitude of the city
Latitude: latitude of the city

Source

Cleveland, W.S., Kleiner, B., McRae, J.E., Warner, J.L., and Pasceri, P.E. (1975). The Analysis of Ground-Level Ozone Data from New Jersey, New York, Connecticut, and Massachusetts: Data Quality Assessment and Temporal and Geographical Properties. Bell Laboratories Memorandum.

The original data were collected by the New Jersey Department of Environmental Protection, the New York State Department of Environmental Protection, the Boyce Thompson Institute (Yonkers, for New York data), the Connecticut Department of Environmental Protection, and the Massachusetts Department of Public Health.

Examples

  summary(Ozone.NE.df)
  #     Median          Quartile               City      Longitude     
  # Min.   : 34.00   Min.   : 48.00   Asbury Park: 1   Min.   :-74.71  
  # 1st Qu.: 58.00   1st Qu.: 79.75   Babylon    : 1   1st Qu.:-73.74  
  # Median : 65.00   Median : 90.00   Bayonne    : 1   Median :-73.17  
  # Mean   : 68.15   Mean   : 95.10   Boston     : 1   Mean   :-72.94  
  # 3rd Qu.: 80.00   3rd Qu.:112.25   Bridgeport : 1   3rd Qu.:-72.08  
  # Max.   :100.00   Max.   :145.00   Cambridge  : 1   Max.   :-71.05  
  #                  NA's   :  1.00   (Other)    :35                   
  #    Latitude    
  # Min.   :40.22  
  # 1st Qu.:40.97  
  # Median :41.56  
  # Mean   :41.60  
  # 3rd Qu.:42.25  
  # Max.   :43.32

The Pareto Distribution

Description

Density, distribution function, quantile function, and random generation for the Pareto distribution with parameters location and shape.

Usage

  dpareto(x, location, shape = 1)
  ppareto(q, location, shape = 1)
  qpareto(p, location, shape = 1)
  rpareto(n, location, shape = 1)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

location

vector of (positive) location parameters.

shape

vector of (positive) shape parameters. The default is shape=1.

Details

Let X be a Pareto random variable with parameters location=\eta and shape=\theta. The density function of X is given by:

f(x; \eta, \theta) = \frac{\theta \eta^\theta}{x^{\theta + 1}}, \; \eta > 0, \; \theta > 0, \; x \ge \eta

The cumulative distribution function of X is given by:

F(x; \eta, \theta) = 1 - (\frac{\eta}{x})^\theta

and the p'th quantile of X is given by:

x_p = \eta (1 - p)^{-1/\theta}, \; 0 \le p \le 1

The mode, mean, median, variance, and coefficient of variation of X are given by:

Mode(X) = \eta

E(X) = \frac{\theta \eta}{\theta - 1}, \; \theta > 1

Median(X) = x_{0.5} = 2^{1/\theta} \eta

Var(X) = \frac{\theta \eta^2}{(\theta - 1)^2 (\theta - 1)}, \; \theta > 2

CV(X) = [\theta (\theta - 2)]^{-1/2}, \; \theta > 2

Value

dpareto gives the density, ppareto gives the distribution function, qpareto gives the quantile function, and rpareto generates random deviates.

Note

The Pareto distribution is named after Vilfredo Pareto (1848-1923), a professor of economics. It is derived from Pareto's law, which states that the number of persons N having income \ge x is given by:

N = A x^{-\theta}

where \theta denotes Pareto's constant and is the shape parameter for the probability distribution.

The Pareto distribution takes values on the positive real line. All values must be larger than the “location” parameter \eta, which is really a threshold parameter. There are three kinds of Pareto distributions. The one described here is the Pareto distribution of the first kind. Stable Pareto distributions have 0 < \theta < 2. Note that the r'th moment only exists if r < \theta.

The Pareto distribution is related to the exponential distribution and logistic distribution as follows. Let X denote a Pareto random variable with location=\eta and shape=\theta. Then log(X/\eta) has an exponential distribution with parameter rate=\theta, and -log\{ [(X/\eta)^\theta] - 1 \} has a logistic distribution with parameters location=0 and scale=1.

The Pareto distribution has a very long right-hand tail. It is often applied in the study of socioeconomic data, including the distribution of income, firm size, population, and stock price fluctuations.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a Pareto distribution with parameters location=1 and shape=1, 
  # evaluated at 2, 3 and 4: 

  dpareto(2:4, 1, 1) 
  #[1] 0.2500000 0.1111111 0.0625000

  #----------

  # The cdf of a Pareto distribution with parameters location=2 and shape=1, 
  # evaluated at 3, 4, and 5: 

  ppareto(3:5, 2, 1) 
  #[1] 0.3333333 0.5000000 0.6000000

  #----------

  # The 25'th percentile of a Pareto distribution with parameters 
  # location=1 and shape=1: 

  qpareto(0.25, 1, 1) 
  #[1] 1.333333

  #----------

  # A random sample of 4 numbers from a Pareto distribution with parameters 
  # location=3 and shape=2. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(10) 
  rpareto(4, 3, 2)
  #[1] 4.274728 3.603148 3.962862 5.415322

Real dataset from ProUCL 5.2.0.

Description

A real data set of size n=55 with 18.8% Nondetects (=10). The name of the Excel file that comes with ProUCL 5.2.0 and contains these data is TRS-Real-data-with-NDs.xls.

Usage

    ProUCL.5.2.TRS.df
    data(ProUCL.5.2.TRS.df)

Format

A data frame with 55 observations on the following 3 variables.

Value: numeric vector indicating the concentration.
Detect: numeric vector of 0s (nondetects) and 1s (detects) indicating censoring status.
Censored: logical vector indicating censoring status.

Source

USEPA. (2022a). ProUCL Version 5.2.0 Technical Guide: Statistical Software for Environmental Applications for Data Sets with and without Nondetect Observations. Prepared by: Neptune and Company, Inc., 1435 Garrison Street, Suite 201, Lakewood, CO 80215. p. 143. https://www.epa.gov/land-research/proucl-software.

USEPA. (2022b). ProUCL Version 5.2.0 User Guide: Statistical Software for Environmental Applications for Data Sets with and without Nondetect Observations. Prepared by: Neptune and Company, Inc., 1435 Garrison Street, Suite 201, Lakewood, CO 80215. p. 6-115. https://www.epa.gov/land-research/proucl-software.

ProUCL Critical Values for Anderson-Darling Goodness-of-Fit Test for Gamma Distribution

Description

Critical Values for the Anderson-Darling Goodness-of-Fit Test for a Gamma Distribution, as presented in Tables A-1, A-3, and A-5 on pages 283, 285, and 287, respectively, of USEPA (2015).

Usage

data("ProUCL.Crit.Vals.for.AD.Test.for.Gamma.array")

Format

An array of dimensions 32 by 11 by 3, with the first dimension indicating the sample size (between 5 and 1000), the second dimension indicating the value of the maximum likelihood estimate of the shape parameter (between 0.025 and 50), and the third dimension indicating the assumed significance level (0.01, 0.05, and 0.10).

Details

See USEPA (2015, pp.281-282) and the help file for gofTest for more information. The data in this array are used when the function gofTest is called with test="proucl.ad.gamma". The letter k is used to indicate the value of the estimated shape parameter.

Source

USEPA. (2015). ProUCL Version 5.1.002 Technical Guide. EPA/600/R-07/041, October 2015. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 283, 285, and 287.

References

USEPA. (2015). ProUCL Version 5.1.002 Technical Guide. EPA/600/R-07/041, October 2015. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C.

ProUCL Critical Values for Kolmogorov-Smirnov Goodness-of-Fit Test for Gamma Distribution

Description

Critical Values for the Kolmogorov-Smirnov Goodness-of-Fit Test for a Gamma Distribution, as presented in Tables A-2, A-4, and A-6 on pages 284, 286, and 288, respectively, of USEPA (2015).

Usage

data("ProUCL.Crit.Vals.for.KS.Test.for.Gamma.array")

Format

Details

See USEPA (2015, pp.281-282) for more information. The data in this array are used when the function gofTest is called with test="proucl.ks.gamma".

Source

References

USEPA. (2015). ProUCL Version 5.1.002 Technical Guide. EPA/600/R-07/041, October 2015. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C.

Carbon Monoxide Emissions from Oil Refinery.

Description

Carbon monoxide (CO) emissions (ppm) from an oil refinery near San Francisco. The refinery submitted 31 daily measurements from its stack for the period April 16, 1993 through May 16, 1993 to the Bay Area Air Quality Management District (BAAQMD). The BAAQMD made nine of its own indepent measurements for the period September 11, 1990 through March 30, 1993.

Usage

data(Refinery.CO.df)

Format

A data frame with 40 observations on the following 3 variables.

CO.ppm: a numeric vector of CO emissions (ppm)
Source: a factor indicating the source of the measurment (BAAQMD or refinery
Date: a Date object indicating the date the measurement was taken

Source

Data and Story Library, http://lib.stat.cmu.edu/DASL/Datafiles/Refinery.html.

References

Zou, G.Y., C.Y. Huo, and J. Taleban. (2009). Simple Confidence Intervals for Lognormal Means and their Differences with Environmental Applications. Environmetrics, 20, 172–180.

Ammonia Nitrogen Concentrations in the Skagit River, Marblemount, Washington

Description

Ammonia nitrogen (NH_3—N) concentration (mg/L) in the Skagit River measured monthly from January 1978 through December 2010 at the Marblemount, Washington monitoring station.

Usage

Skagit.NH3_N.df

Format

A data frame with 396 observations on the following 6 variables.

Date

Date of collection.

NH3_N.Orig.mg.per.L

a character vector of the ammonia nitrogen concentrations where values for non-detects are preceeded with the less-than sign (<).

NH3_N.mg.per.L

a numeric vector of ammonia nitrogen concentrations; non-detects have been coded to their detection limit.

DQ1

factor of data qualifier values.

U = The analyte was not detected at or above the reported result.
J = The analyte was positively identified. The associated numerical result is an estimate.
UJ = The analyte was not detected at or above the reported estimated result.

DQ2

factor of data qualifier values. An asterisk (*) indicates a possible quality problem for the result.

Censored

a logical vector indicating which observations are censored.

Details

Station 04A100 - Skagit R \@ Marblemount. Located at the bridge on the Casdace River Road where Highway 20 (North Cascades Highway) turns 90 degrees in Marblemount.

Source

Washington State Deparment of Ecology.
https://ecology.wa.gov/Research-Data/Monitoring-assessment/River-stream-monitoring/Water-quality-monitoring/Using-river-stream-water-quality-data

Total Phosphorus Data from Chesapeake Bay

Description

Monthly estimated total phosphorus mass (mg) within a water column at two different stations for the 5-year time period October 1984 to September 1989 from a study on phosphorus concentration conducted in the Chesapeake Bay.

Usage

Total.P.df

Format

A data frame with 60 observations on the following 4 variables.

CB3.1: a numeric vector of phosphorus concentrations at station CB3.1
CB3.3e: a numeric vector phosphorus concentrations at station CB3.3e
Month: a factor indicating the month the observation was taken
Year: a numeric vector indicating the year an observation was taken

Source

Neerchal, N. K., and S. L. Brunenmeister. (1993). Estimation of Trend in Chesapeake Bay Water Quality Data. In Patil, G.P., and C.R. Rao, eds., Handbook of Statistics, Vol. 6: Multivariate Environmental Statistics. North-Holland, Amsterdam, Chapter 19, 407-422.

The Triangular Distribution

Description

Density, distribution function, quantile function, and random generation for the triangular distribution with parameters min, max, and mode.

Usage

  dtri(x, min = 0, max = 1, mode = 1/2)
  ptri(q, min = 0, max = 1, mode = 1/2)
  qtri(p, min = 0, max = 1, mode = 1/2)
  rtri(n, min = 0, max = 1, mode = 1/2)

Arguments

x

vector of quantiles. Missing values (NAs) are allowed.

q

vector of quantiles. Missing values (NAs) are allowed.

p

vector of probabilities between 0 and 1. Missing values (NAs) are allowed.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

min

vector of minimum values of the distribution of the random variable. The default value is min=0.

max

vector of maximum values of the random variable. The default value is max=1.

mode

vector of modes of the random variable. The default value is mode=1/2.

Details

Let X be a triangular random variable with parameters min=a, max=b, and mode=c.

Probability Density and Cumulative Distribution Function
The density function of X is given by:

`f(x; a, b, c) =`	`\frac{2(x-a)}{(b-a)(c-a)}`	for `a \le x \le c`
	`\frac{2(b-x)}{(b-a)(b-c)}`	for `c \le x \le b`

where a < c < b.

The cumulative distribution function of X is given by:

`F(x; a, b, c) =`	`\frac{(x-a)^2}{(b-a)(c-a)}`	for `a \le x \le c`
	`1 - \frac{(b-x)^2}{(b-a)(b-c)}`	for `c \le x \le b`

where a < c < b.

Quantiles
The p^th quantile of X is given by:

`x_p =`	`a + \sqrt{(b-a)(c-a)p}`	for `0 \le p \le F(c)`
	`b - \sqrt{(b-a)(b-c)(1-p}`	for `F(c) \le p \le 1`

where 0 \le p \le 1.

Random Numbers
Random numbers are generated using the inverse transformation method:

x = F^{-1}(u)

where u is a random deviate from a uniform [0, 1] distribution.

Mean and Variance
The mean and variance of X are given by:

E(X) = \frac{a + b + c}{3}

Var(X) = \frac{a^2 + b^2 + c^2 - ab - ac - bc}{18}

Value

dtri gives the density, ptri gives the distribution function, qtri gives the quantile function, and rtri generates random deviates.

Note

The triangular distribution is so named because of the shape of its probability density function. The average of two independent identically distributed uniform random variables with parameters min=\alpha and max=\beta has a triangular distribution with parameters min=\alpha, max=\beta, and mode=(\beta-\alpha)/2.

The triangular distribution is sometimes used as an input distribution in probability risk assessment.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Examples

  # Density of a triangular distribution with parameters 
  # min=10, max=15, and mode=12, evaluated at 12, 13 and 14: 

  dtri(12:14, 10, 15, 12) 
  #[1] 0.4000000 0.2666667 0.1333333

  #----------

  # The cdf of a triangular distribution with parameters 
  # min=2, max=7, and mode=5, evaluated at 3, 4, and 5: 

  ptri(3:5, 2, 7, 5) 
  #[1] 0.06666667 0.26666667 0.60000000

  #----------

  # The 25'th percentile of a triangular distribution with parameters 
  # min=1, max=4, and mode=3: 

  qtri(0.25, 1, 4, 3) 
  #[1] 2.224745

  #----------

  # A random sample of 4 numbers from a triangular distribution with 
  # parameters min=3 , max=20, and mode=12. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(10) 
  rtri(4, 3, 20, 12) 
  #[1] 11.811593  9.850955 11.081885 13.539496

The Zero-Modified Lognormal (Delta) Distribution

Description

Density, distribution function, quantile function, and random generation for the zero-modified lognormal distribution with parameters meanlog, sdlog, and p.zero.

The zero-modified lognormal (delta) distribution is the mixture of a lognormal distribution with a positive probability mass at 0.

Usage

  dzmlnorm(x, meanlog = 0, sdlog = 1, p.zero = 0.5)
  pzmlnorm(q, meanlog = 0, sdlog = 1, p.zero = 0.5)
  qzmlnorm(p, meanlog = 0, sdlog = 1, p.zero = 0.5)
  rzmlnorm(n, meanlog = 0, sdlog = 1, p.zero = 0.5)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

meanlog

vector of means of the normal (Gaussian) part of the distribution on the log scale. The default is meanlog=0.

sdlog

vector of (positive) standard deviations of the normal (Gaussian) part of the distribution on the log scale. The default is sdlog=1.

p.zero

vector of probabilities between 0 and 1 indicating the probability the random variable equals 0. For rzmlnorm this must be a single, non-missing number.

Details

The zero-modified lognormal (delta) distribution is the mixture of a lognormal distribution with a positive probability mass at 0. This distribution was introduced without a name by Aitchison (1955), and the name \Delta-distribution was coined by Aitchison and Brown (1957, p.95). It is a special case of a “zero-modified” distribution (see Johnson et al., 1992, p. 312).

Let f(x; \mu, \sigma) denote the density of a lognormal random variable X with parameters meanlog=\mu and sdlog=\sigma. The density function of a zero-modified lognormal (delta) random variable Y with parameters meanlog=\mu, sdlog=\sigma, and p.zero=p, denoted h(y; \mu, \sigma, p), is given by:

`h(y; \mu, \sigma, p) =`	`p`	for `y = 0`
	`(1 - p) f(y; \mu, \sigma)`	for `y > 0`

Note that \mu is not the mean of the zero-modified lognormal distribution on the log scale; it is the mean of the lognormal part of the distribution on the log scale. Similarly, \sigma is not the standard deviation of the zero-modified lognormal distribution on the log scale; it is the standard deviation of the lognormal part of the distribution on the log scale.

Let \gamma and \delta denote the mean and standard deviation of the overall zero-modified lognormal distribution on the log scale. Aitchison (1955) shows that:

E[log(Y)] = \gamma = (1 - p) \mu

Var[log(Y)] = \delta^2 = (1 - p) \sigma^2 + p (1-p) \mu^2

Note that when p.zero=p=0, the zero-modified lognormal distribution simplifies to the lognormal distribution.

Value

dzmlnorm gives the density, pzmlnorm gives the distribution function, qzmlnorm gives the quantile function, and rzmlnorm generates random deviates.

Note

The zero-modified lognormal (delta) distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit” (the nondetects are assumed equal to 0). See, for example, Gilliom and Helsel (1986), Owen and DeRouen (1980), and Gibbons et al. (2009, Chapter 12). USEPA (2009, Chapter 15) recommends this strategy only in specific situations, and Helsel (2012, Chapter 1) strongly discourages this approach to dealing with non-detects.

A variation of the zero-modified lognormal (delta) distribution is the zero-modified normal distribution, in which a normal distribution is mixed with a positive probability mass at 0.

One way to try to assess whether a zero-modified lognormal (delta), zero-modified normal, censored normal, or censored lognormal is the best model for the data is to construct both censored and detects-only probability plots (see qqPlotCensored).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901-908.

Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special reference to its uses in economics). Cambridge University Press, London. pp.94-99.

Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, pp.47-51.

Gibbons, RD., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley and Sons, Hoboken, NJ.

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135-146.

Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R. Second Edition. John Wiley and Sons, Hoboken, NJ, Chapter 1.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, p.312.

Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707-719.

USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Examples

  # Density of the zero-modified lognormal (delta) distribution with 
  # parameters meanlog=0, sdlog=1, and p.zero=0.5, evaluated at 
  # 0, 0.5, 1, 1.5, and 2:

  dzmlnorm(seq(0, 2, by = 0.5)) 
  #[1] 0.50000000 0.31374804 0.19947114 0.12248683 
  #[5] 0.07843701

  #----------

  # The cdf of the zero-modified lognormal (delta) distribution with 
  # parameters meanlog=1, sdlog=2, and p.zero=0.1, evaluated at 4:

  pzmlnorm(4, 1, 2, .1) 
  #[1] 0.6189203

  #----------

  # The median of the zero-modified lognormal (delta) distribution with 
  # parameters meanlog=2, sdlog=3, and p.zero=0.1:

  qzmlnorm(0.5, 2, 3, 0.1) 
  #[1] 4.859177

  #----------

  # Random sample of 3 observations from the zero-modified lognormal 
  # (delta) distribution with parameters meanlog=1, sdlog=2, and p.zero=0.4. 
  # (Note: The call to set.seed simply allows you to reproduce this example.)

  set.seed(20) 
  rzmlnorm(3, 1, 2, 0.4)
  #[1] 0.000000 0.000000 3.146641

The Zero-Modified Lognormal (Delta) Distribution (Alternative Parameterization)

Description

Density, distribution function, quantile function, and random generation for the zero-modified lognormal distribution with parameters mean, cv, and p.zero.

The zero-modified lognormal (delta) distribution is the mixture of a lognormal distribution with a positive probability mass at 0.

Usage

  dzmlnormAlt(x, mean = exp(1/2), cv = sqrt(exp(1) - 1), p.zero = 0.5)
  pzmlnormAlt(q, mean = exp(1/2), cv = sqrt(exp(1) - 1), p.zero = 0.5)
  qzmlnormAlt(p, mean = exp(1/2), cv = sqrt(exp(1) - 1), p.zero = 0.5)
  rzmlnormAlt(n, mean = exp(1/2), cv = sqrt(exp(1) - 1), p.zero = 0.5)

Arguments

x

vector of quantiles.

q

vector of quantiles.

p

vector of probabilities between 0 and 1.

n

sample size. If length(n) is larger than 1, then length(n) random values are returned.

mean

vector of means of the lognormal part of the distribution on the. The default is mean=exp(1/2).

cv

vector of (positive) coefficients of variation of the lognormal part of the distribution. The default is cv=sqrt(exp(1) - 1).

p.zero

vector of probabilities between 0 and 1 indicating the probability the random variable equals 0. For rzmlnormAlt this must be a single, non-missing number.

Details

Let f(x; \theta, \tau) denote the density of a lognormal random variable X with parameters mean=\theta and cv=\tau. The density function of a zero-modified lognormal (delta) random variable Y with parameters mean=\theta, cv=\tau, and p.zero=p, denoted h(y; \theta, \tau, p), is given by:

`h(y; \theta, \tau, p) =`	`p`	for `y = 0`
	`(1 - p) f(y; \theta, \tau)`	for `y > 0`

Note that \theta is not the mean of the zero-modified lognormal distribution; it is the mean of the lognormal part of the distribution. Similarly, \tau is not the coefficient of variation of the zero-modified lognormal distribution; it is the coefficient of variation of the lognormal part of the distribution.

Let \gamma, \delta, and \omega denote the mean, standard deviation, and coefficient of variation of the overall zero-modified lognormal distribution. Let \eta denote the standard deviation of the lognormal part of the distribution, so that \eta = \theta \tau. Aitchison (1955) shows that:

E(Y) = \gamma = (1 - p) \theta

Var(Y) = \delta^2 = (1 - p) \eta^2 + p (1-p) \theta^2

so that

\omega = \sqrt{(\tau^2 + p) / (1 - p)}

Note that when p.zero=p=0, the zero-modified lognormal distribution simplifies to the lognormal distribution.

Value

dzmlnormAlt gives the density, pzmlnormAlt gives the distribution function, qzmlnormAlt gives the quantile function, and rzmlnormAlt generates random deviates.