---
title: "tikatuwq: Water Quality Indices and Temporal Trends"
author: "tikatuwq developers"
output:
  rmarkdown::html_vignette:
    number_sections: true
vignette: >
  %\VignetteIndexEntry{tikatuwq: Water Quality Indices and Temporal Trends}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4,
  dpi = 96,
  message = FALSE,
  warning = FALSE,
  fig.alt = "Figure generated by tikatuwq package"
)
```

## Introduction

This vignette focuses on the methods implemented in tikatuwq for computing water quality indices and analyzing temporal trends. We cover:

- Water Quality Index (IQA/WQI) calculation methods
- Trophic State Index (IET) for lakes and reservoirs
- Temporal trend analysis using robust and parametric methods
- Parameter-specific analysis tools

## Water Quality Index (IQA/WQI)

### Method overview

The IQA combines sub-indices (Qi) for individual parameters using weighted arithmetic mean. The sub-indices are obtained by piecewise-linear interpolation over approximate curves (CETESB/NSF style).

```{r load-package}
library(tikatuwq)
data("wq_demo", package = "tikatuwq")
```

### Default parameters and weights

The default IQA implementation uses 9 parameters with standard weights:

```{r iqa-weights}
# Default weights
default_weights <- c(
  od = 0.17,
  coliformes = 0.15,
  dbo = 0.10,
  nt_total = 0.10,
  p_total = 0.10,
  turbidez = 0.08,
  tds = 0.08,
  pH = 0.12,
  temperatura = 0.10
)

# Check sum equals 1
sum(default_weights)
```

### Computing IQA

```{r compute-iqa}
# Compute IQA with default settings
df_iqa <- iqa(wq_demo, na_rm = TRUE)

# View results
cols_show <- intersect(c("ponto", "IQA", "IQA_status"), names(df_iqa))
head(df_iqa[, cols_show, drop = FALSE])

# Distribution
hist(df_iqa$IQA, breaks = 10, main = "IQA Distribution", xlab = "IQA")
```

### Handling missing parameters

When `na_rm = TRUE`, weights are rescaled per row to use only available parameters:

```{r iqa-missing}
# Example with missing parameters
df_missing <- wq_demo
df_missing$tds <- NULL  # Remove one parameter

df_iqa_missing <- iqa(df_missing, na_rm = TRUE)
head(df_iqa_missing$IQA)
```

### Custom weights

You can provide custom weights:

```{r iqa-custom-weights}
# Custom weights (must sum to 1)
custom_weights <- c(
  od = 0.20,
  coliformes = 0.20,
  dbo = 0.10,
  nt_total = 0.10,
  p_total = 0.10,
  turbidez = 0.10,
  tds = 0.10,
  pH = 0.05,
  temperatura = 0.05
)

df_iqa_custom <- iqa(wq_demo, pesos = custom_weights, na_rm = TRUE)
cols_show2 <- intersect(c("IQA", "IQA_status"), names(df_iqa_custom))
head(df_iqa_custom[, cols_show2, drop = FALSE])
```

### Classification

The IQA values are automatically classified into qualitative categories:

```{r iqa-classification}
# Classification function
classify_iqa(c(15, 40, 65, 80, 95))

# English labels
classify_iqa(c(15, 40, 65, 80, 95), locale = "en")

# Distribution in demo data
table(df_iqa$IQA_status)
```

## Trophic State Index (IET)

### Carlson IET

For lentic systems, the Carlson Trophic State Index uses Secchi depth, chlorophyll-a, and total phosphorus:

```{r iet-carlson, eval=FALSE}
# Example dataset with required parameters
# df_lake <- data.frame(
#   ponto = c("L1", "L2"),
#   secchi = c(2.5, 1.0),  # meters
#   clorofila = c(5, 20),  # ug/L
#   p_total = c(0.02, 0.10)  # mg/L (converted to ug/L internally)
# )
# 
# iet_carlson(df_lake, .keep_ids = TRUE)
```

The function automatically:
- Converts p_total (mg/L) to tp (ug/L) if needed
- Accepts aliases (sd for secchi, chla for clorofila)
- Returns TSI values with classification

### Lamparelli IET

Similar to Carlson but with different equations and thresholds:

```{r iet-lamparelli, eval=FALSE}
# iet_lamparelli(df_lake, .keep_ids = TRUE)
```

## Temporal Trend Analysis

### Single parameter trend

The `trend_param()` function computes Theil-Sen slope and Spearman correlation:

```{r trend-single}
# Add temporal structure to demo data
df_temporal <- wq_demo
df_temporal$data <- as.Date("2025-01-01") + seq_len(nrow(df_temporal)) - 1

# Compute trend for turbidity
trend_result <- trend_param(df_temporal, param = "turbidez")

print(trend_result)
```

The result includes:
- `slope`: Theil-Sen slope
- `p_value`: Spearman correlation p-value
- `trend`: classification (increasing, decreasing, stable)

### Plotting trends

```{r plot-trend}
library(ggplot2)

# Plot with trend line
p_trend <- plot_trend(df_temporal, param = "turbidez", method = "theilsen")
print(p_trend)

# With LOESS smoothing
p_loess <- plot_trend(df_temporal, param = "turbidez", method = "loess")
print(p_loess)
```

### Multiple parameters

Use `param_trend_multi()` to analyze trends across multiple parameters:

```{r trend-multi}
# Trends for multiple parameters
params <- c("turbidez", "od", "dbo")
trends_multi <- param_trend_multi(df_temporal, parametros = params)

print(trends_multi)
```

## Parameter-specific Analysis

### Summary statistics

```{r param-summary}
# Summary for one parameter
summary_turb <- param_summary(df_temporal, parametro = "turbidez")
print(summary_turb)

# Multi-parameter summary
summary_multi <- param_summary_multi(df_temporal, parametros = c("turbidez", "od", "dbo"))
print(summary_multi)
```

### Parameter plots

```{r param-plot}
# Single parameter plot
p1 <- param_plot(df_temporal, parametro = "turbidez")
print(p1)

# Multi-parameter plot
p2 <- param_plot_multi(df_temporal, parametros = c("turbidez", "od", "dbo"))
print(p2)
```

## Statistical Methods

### Theil-Sen estimator

The Theil-Sen method is robust to outliers:

```{r theil-sen-details, eval=FALSE}
# Theil-Sen computes median of all pairwise slopes
# For data with outliers, it is more reliable than OLS
# Used by default in trend_param() and plot_trend()
```

### Spearman correlation

Non-parametric test for monotonic trends:

```{r spearman-details, eval=FALSE}
# Spearman correlation tests for monotonic relationship
# Does not assume linearity
# p-value indicates significance of trend
```

## Best Practices

### Choosing parameters for IQA

- Include all 9 default parameters when possible
- Use `na_rm = TRUE` if some parameters are missing
- Adjust weights only if you have domain knowledge

### Handling censored values

- Use `nd_policy = "ld2"` (default) for conservative estimates
- Consider `nd_policy = "na"` if censored values should not influence results
- Document your choice in reports

### Trend analysis

- Use Theil-Sen for robust estimates with outliers
- Require at least 4 observations per group for reliable trends
- Consider seasonal effects when analyzing temporal data

### Units consistency

- Ensure all parameters use standard units (mg/L, NTU, etc.)
- Use `clean_units()` to convert if needed
- Document unit conversions in methodology sections

## References

- Carlson, R. E. (1977). A trophic state index for lakes. *Limnology and Oceanography*, 22(2), 361-369.
- Lamparelli, M. C. (2004). Graus de trofia em corpos d'agua do estado de Sao Paulo: avaliacao dos metodos de monitoramento. *Tese de Doutorado, Universidade de Sao Paulo*.
- CETESB. (2021). Aguas superficiais: indice de qualidade das aguas (IQA). *Companhia Ambiental do Estado de Sao Paulo*.

## Summary

This vignette covered:

1. IQA calculation with default and custom weights
2. Handling missing parameters
3. IET methods for lentic systems
4. Temporal trend analysis (Theil-Sen, Spearman)
5. Parameter-specific analysis tools

For workflow examples, see the "From raw water quality data to CONAMA report" vignette.