---
title: "Overview of bigPLScox"
shorttitle: "Overview of bigPLScox"
author:
- name: "Frédéric Bertrand"
  affiliation:
  - Cedric, Cnam, Paris
  email: frederic.bertrand@lecnam.net
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Overview of bigPLScox}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/overview-",
  fig.width = 7,
  fig.height = 4.5,
  dpi = 150,
  message = FALSE,
  warning = FALSE
)
```

# Introduction

The goal of **bigPLScox** is to provide Partial Least Squares (PLS) variants of
the Cox proportional hazards model that scale to high-dimensional survival
settings. The package implements several algorithms tailored for large-scale
problems, including sparse, grouped, and deviance-residual-based approaches. It
integrates with the **bigmemory** ecosystem so that data stored on disk can be
analysed without exhausting RAM.

This vignette gives a quick tour of the core workflows. It highlights how to
prepare data, fit a model, assess model quality, and explore advanced
extensions. The complementary vignette "Getting started with bigPLScox" offers a
more hands-on tutorial, while "Benchmarking bigPLScox" focuses on performance
comparisons.

# Package highlights

* **Generalised PLS Cox regression** via `coxgpls()` with support for grouped
  predictors.
* **Sparse and structured-sparse extensions** through `coxsgpls()` and
  `coxspls_sgpls()`.
* **Deviance-residual estimators** such as `coxgplsDR()` for increased
  robustness.
* **Cross-validation helpers** (`cv.coxgpls()`, `cv.coxsgpls()`, …) to select the
  number of latent components.
* **Big-memory interfaces** (`big_pls_cox()`, `big_pls_cox_gd()`) designed for
  file-backed matrices stored with **bigmemory**.

# Available algorithms

The following modeling functions are provided:

* `coxgpls()` for generalized PLS Cox regression.
* `coxsgpls()` and `coxspls_sgpls()` for sparse and structured sparse extensions.
* `coxgplsDR()` and `coxsgplsDR()` for deviance-residual-based estimation.
* `cv.coxgpls()` and related `cv.*` helpers for component selection.

For stochastic gradient descent on large data the package includes
`big_pls_cox()` and `big_pls_cox_gd()`.

# Loading an example dataset

The package ships with a small allelotyping dataset that we use throughout this
vignette. The data include censoring indicators alongside a large set of
predictors.

```{r load-data}
library(bigPLScox)

data(micro.censure)
data(Xmicro.censure_compl_imp)

train_idx <- seq_len(80)
Y_train <- micro.censure$survyear[train_idx]
C_train <- micro.censure$DC[train_idx]
X_train <- Xmicro.censure_compl_imp[train_idx, -40]
```

# Fitting a PLS-Cox model

`coxgpls()` provides a matrix interface that mirrors `survival::coxph()` but
adds latent components to stabilise estimation in high dimensions.

```{r fit-coxgpls}
fit <- coxgpls(
  X_train,
  Y_train,
  C_train,
  ncomp = 6,
  ind.block.x = c(3, 10, 15)
)
fit
```

The summary includes convergence diagnostics, latent component information, and
predicted linear predictors that can be used for risk stratification.

# Model assessment

Cross-validation helps decide how many components should be retained. The
`cv.coxgpls()` helper accepts either a matrix or a list containing `x`, `time`,
and `status` elements.

```{r cv-coxgpls}
set.seed(123)
cv_res <- cv.coxgpls(
  list(x = X_train, time = Y_train, status = C_train),
  nt = 10,
  ind.block.x = c(3, 10, 15)
)
cv_res
```

The resulting object may be plotted to visualise the cross-validated deviance or
to apply one-standard-error rules when choosing the number of components.

# Alternative estimators

Deviance-residual-based estimators provide increased robustness by iteratively
updating residuals. Sparse variants enable feature selection in extremely
high-dimensional designs.

```{r alternative-estimators}
dr_fit <- coxgplsDR(
  X_train,
  Y_train,
  C_train,
  ncomp = 6,
  ind.block.x = c(3, 10, 15)
)
dr_fit
```

Additional sparse estimators can be invoked via `coxsgpls()` and
`coxspls_sgpls()` by providing `keepX` or `penalty` arguments that control the
number of active predictors per component.

# Working with big data

For extremely large problems, stochastic gradient descent routines operate on
memory-mapped matrices created with **bigmemory**. The helper below converts a
standard matrix to a `big.matrix` and runs a small example.

```{r bigmemory-example}
X_big <- bigmemory::as.big.matrix(X_train)
big_fit <- big_pls_cox(
  X_big,
  time = Y_train,
  status = C_train,
  ncomp = 6
)
big_fit
```

The `big_pls_cox_gd()` function exposes a gradient-descent variant that is often
preferred for streaming workloads. Both functions can be combined with
`foreach::foreach()` for multi-core execution.

# Further reading

* `vignette("getting-started", package = "bigPLScox")` for a detailed
  walkthrough of data preparation and model diagnostics.
* `vignette("bigPLScox-benchmarking", package = "bigPLScox")` for reproducible
  performance comparisons.
* The package website at <https://fbertran.github.io/bigPLScox/> hosts reference
  documentation and additional examples.