---
title: "Families and model types in fbrglm"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Families and model types in fbrglm}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment  = "#>",
    fig.width  = 6,
    fig.height = 4
)
set.seed(20260301)
```

## Overview

`fbrglm` exposes a single formula/data interface for regularized GLMs
and reuses one set of S3 methods across model types. The common
operations are:

- `fbrglm(formula, data, family, lambda, ...)`
- `print()` / `summary()` / `coef()` / `nobs()`
- `predict(newdata, type = c("link", "response", "class"), newoffset)`
- `plot()` (delegates to the underlying `glmnet` / `cv.glmnet` plot)
- `as_glmnet()` / `as_cv_glmnet()`

Classical standard errors, *z* values, and *p* values are **not**
produced under `infer = "none"` (the only inference mode currently
implemented). The package returns regularized point estimates;
inference modes are the next milestone.

The `family` argument accepts the same forms as `glmnet::glmnet()`
itself: the six canonical character strings (`"gaussian"`,
`"binomial"`, `"poisson"`, `"cox"`, `"multinomial"`, `"mgaussian"`),
any GLM family object (e.g. `stats::Gamma(link = "log")`), or a family
object from another package (e.g. `MASS::negative.binomial(theta = 2)`).
fbrglm validates the argument shape, stores both the value passed to
glmnet and a display name on the fit, and wraps glmnet's errors with
the offending family name when something downstream fails.

```{r setup}
library(fbrglm)
```

## Summary of supported model types

| Model                                | fbrglm formulation                                                  | Response               | Status                |
|--------------------------------------|---------------------------------------------------------------------|------------------------|-----------------------|
| Linear regression                    | `family = "gaussian"`                                               | continuous `y`         | **supported**         |
| Logistic regression                  | `family = "binomial"`                                               | 0 / 1 `y`              | **supported**         |
| Poisson regression                   | `family = "poisson"`                                                | count `y`              | **supported**         |
| Piecewise exponential survival       | `"poisson"` + interval factor + `offset(log(exposure))`             | event count / interval | **supported (formulation)** |
| Native Cox regression                | `family = "cox"` + `Surv(time, status) ~ ...`                       | survival object        | **supported (experimental)** |
| Gamma regression                     | `family = stats::Gamma(link = "log")`                               | positive continuous    | **supported (experimental)** |
| Negative binomial (fixed θ)          | `family = MASS::negative.binomial(theta = ...)`                     | overdispersed counts   | **supported (fixed θ only)** |
| Multinomial regression               | `family = "multinomial"`                                            | factor `y` (≥ 3 levels) | **experimental**     |
| Multi-response Gaussian              | `family = "mgaussian"`, `cbind(y1, y2) ~ ...` LHS                  | matrix `y`             | **experimental**     |

Two important caveats:

- The **piecewise exponential survival** model and **native Cox**
  model are *not* the same thing. The former is a Poisson regression
  on a long-format dataset with an interval factor and
  `offset(log(exposure))`; the latter is glmnet's partial-likelihood
  Cox path triggered by `family = "cox"` together with a `Surv()` LHS.
  Both are useful, but the parameters they estimate and the data they
  consume are different. Do not conflate them.
- **Negative binomial** support is for the **fixed-θ** family object
  (`MASS::negative.binomial(theta = ...)`). Joint estimation of θ in
  the style of `MASS::glm.nb()` is a different problem (alternating
  optimization plus a separate likelihood for θ) and is **not**
  implemented here.

## Linear regression

```{r linear}
set.seed(101)
n <- 150
dat_lm <- data.frame(
    y  = 0.5 + 0.8 * rnorm(n) - 0.3 * rnorm(n) + rnorm(n, sd = 0.5),
    x1 = rnorm(n),
    x2 = rnorm(n)
)
fit_lm <- fbrglm(y ~ x1 + x2, data = dat_lm,
                 family = "gaussian",
                 lambda = "fix", lambda_value = 0.05)
coef(fit_lm)
head(predict(fit_lm, type = "response"))
```

## Logistic regression

```{r logistic}
set.seed(102)
n <- 200
dat_logit <- data.frame(
    y  = rbinom(n, 1, 0.5),
    x1 = rnorm(n), x2 = rnorm(n)
)
fit_logit <- fbrglm(y ~ x1 + x2, data = dat_logit,
                    family = "binomial",
                    lambda = "fix", lambda_value = 0.05)
head(predict(fit_logit, type = "response"))   # probabilities in [0, 1]
head(predict(fit_logit, type = "class"))      # 0 / 1
```

## Poisson regression with an offset

A common count-data setup attaches an *exposure* (time, area,
population) to each row. The Poisson likelihood is written on the
event count with `log(exposure)` as an additive offset on the linear
predictor.

```{r poisson}
set.seed(103)
n <- 200
exposure <- runif(n, min = 0.5, max = 2)
x1 <- rnorm(n); x2 <- rnorm(n)
rate <- exp(-1 + 0.4 * x1 - 0.2 * x2)
dat_pois <- data.frame(
    y  = rpois(n, rate * exposure),
    x1 = x1, x2 = x2,
    exposure = exposure
)

fit_pois <- fbrglm(y ~ x1 + x2, data = dat_pois,
                   family = "poisson",
                   offset = log(dat_pois$exposure),
                   lambda = "fix", lambda_value = 0.05)

new_dat <- dat_pois[1:5, ]
predict(fit_pois,
        newdata   = new_dat,
        newoffset = log(new_dat$exposure),
        type = "response")
```

## Piecewise exponential survival model via Poisson regression

A Cox-like survival model can be fit with a Poisson regression on a
long-format dataset whose rows are *subject* × *time-interval*. The
construction is:

- split each subject's follow-up time into a small number of
  intervals;
- for each interval the subject is at risk in, generate one row with
  the time at risk (`exposure`), an event indicator (`event`), the
  interval label (a factor), and the time-fixed covariates;
- model `event ~ x1 + x2 + interval + offset(log(exposure))` with a
  Poisson likelihood.

This is the *piecewise exponential* model and is closely related to
Cox regression when the intervals are fine enough. It is **not** the
same as glmnet's partial-likelihood Cox path (see the next section).

```{r pwe-data}
set.seed(104)
n_subj <- 120
breaks <- c(0, 1, 2, 3)
n_int  <- length(breaks) - 1

x1 <- rnorm(n_subj)
x2 <- rnorm(n_subj)
rate    <- exp(-1 + 0.4 * x1 - 0.2 * x2)
T_event <- rexp(n_subj, rate)
T_obs   <- pmin(T_event, 3)
delta   <- as.integer(T_event <= 3)

rows <- list()
for (i in seq_len(n_subj)) {
    for (j in seq_len(n_int)) {
        lo <- breaks[j]; hi <- breaks[j + 1]
        if (T_obs[i] <= lo) next
        exposure   <- min(T_obs[i], hi) - lo
        if (exposure <= 0) next
        event_here <- as.integer(delta[i] == 1L &
                                 T_obs[i] >  lo &
                                 T_obs[i] <= hi)
        rows[[length(rows) + 1]] <- data.frame(
            id       = i,
            interval = j,
            exposure = exposure,
            event    = event_here,
            x1       = x1[i],
            x2       = x2[i]
        )
    }
}
long_dat <- do.call(rbind, rows)
long_dat$interval <- factor(long_dat$interval, levels = seq_len(n_int))

fit_pwe <- fbrglm(event ~ x1 + x2 + interval,
                  data   = long_dat,
                  family = "poisson",
                  offset = log(long_dat$exposure),
                  lambda = "fix", lambda_value = 0.05)
coef(fit_pwe)
```

## Native Cox regression

`family = "cox"` together with `survival::Surv(time, status) ~ ...`
on the LHS hands the problem to glmnet's partial-likelihood Cox path
directly. fbrglm carries the `Surv` response through `model.frame()`
without converting it. `type = "class"` is not meaningful here and is
rejected.

```{r cox, message = FALSE, warning = FALSE}
if (requireNamespace("survival", quietly = TRUE)) {
    set.seed(105)
    n <- 200
    x1 <- rnorm(n); x2 <- rnorm(n)
    rate    <- exp(0.4 * x1 - 0.2 * x2)
    T_event <- rexp(n, rate)
    t_obs   <- pmin(T_event, 3)
    delta   <- as.integer(T_event <= 3)
    dat_cox <- data.frame(t_obs = t_obs, delta = delta,
                          x1 = x1, x2 = x2)
    fit_cox <- fbrglm(survival::Surv(t_obs, delta) ~ x1 + x2,
                      data   = dat_cox,
                      family = "cox",
                      lambda = "fix", lambda_value = 0.05)
    fit_cox$family_name
    head(predict(fit_cox, type = "link"))
}
```

Cox support is marked **experimental**: glmnet's `coxnet` is mature on
its own, but fbrglm has not yet been exercised against the full
breadth of Cox usage (strata, ties, time-varying covariates), so
unusual setups may need to fall back to `as_glmnet(fit_cox)`.

## Gamma regression

```{r gamma}
set.seed(106)
n <- 200
x1 <- rnorm(n); x2 <- rnorm(n)
eta <- 0.4 * x1 - 0.2 * x2
dat_g <- data.frame(
    y  = rgamma(n, shape = 2, rate = exp(-eta)),
    x1 = x1, x2 = x2
)
fit_g <- fbrglm(y ~ x1 + x2, data = dat_g,
                family = stats::Gamma(link = "log"),
                lambda = "fix", lambda_value = 0.05)
fit_g$family_name
head(predict(fit_g, type = "response"))
```

## Negative binomial regression (fixed θ)

```{r negbin, message = FALSE}
if (requireNamespace("MASS", quietly = TRUE)) {
    set.seed(107)
    n <- 200
    x1 <- rnorm(n); x2 <- rnorm(n)
    mu <- exp(0.4 * x1 - 0.2 * x2)
    dat_nb <- data.frame(
        y  = rnbinom(n, mu = mu, size = 2),
        x1 = x1, x2 = x2
    )
    fit_nb <- fbrglm(y ~ x1 + x2, data = dat_nb,
                     family = MASS::negative.binomial(theta = 2),
                     lambda = "fix", lambda_value = 0.05)
    fit_nb$family_name
    head(predict(fit_nb, type = "response"))
}
```

Joint estimation of θ (`MASS::glm.nb()` style) is **not** in scope
here; θ must be chosen externally.

## Multinomial regression (experimental)

```{r multinomial}
set.seed(108)
n <- 200
dat_mult <- data.frame(
    y  = factor(sample(c("a", "b", "c"), n, replace = TRUE)),
    x1 = rnorm(n), x2 = rnorm(n)
)
fit_mult <- fbrglm(y ~ x1 + x2, data = dat_mult,
                   family = "multinomial",
                   lambda = "fix", lambda_value = 0.05)
fit_mult$family_name
p_resp <- predict(fit_mult, type = "response")
dim(p_resp)     # one column per class
head(p_resp)
head(predict(fit_mult, type = "class"))
```

For multinomial fits, `coef(fit)` returns a list of (sparse) matrices
— one per class — exactly as glmnet does; callers who want the
underlying object can use `as_glmnet(fit_mult)`.

## Multi-response Gaussian (experimental)

```{r mgaussian}
set.seed(109)
n <- 150
dat_mg <- data.frame(
    y1 = rnorm(n), y2 = rnorm(n),
    x1 = rnorm(n), x2 = rnorm(n)
)
fit_mg <- fbrglm(cbind(y1, y2) ~ x1 + x2, data = dat_mg,
                 family = "mgaussian",
                 lambda = "fix", lambda_value = 0.05)
fit_mg$family_name
pred_mg <- predict(fit_mg, type = "response")
dim(pred_mg)   # one column per response
```

## Limitations

- `infer = "none"` only — no classical standard errors, *z* values,
  *p* values, or confidence intervals. Honest post-selection inference
  is the next milestone.
- Native Cox, multinomial, and mgaussian are marked **experimental**:
  fit and predict work for the basic cases shown here, but the more
  unusual variants (Cox strata, ties handling, multinomial grouped /
  ungrouped) have not been exhaustively tested.
- Negative binomial supports **fixed θ** only. Joint θ estimation in
  the spirit of `MASS::glm.nb()` is out of scope.
- The piecewise exponential survival example is a Poisson formulation
  and is *not* identical to native Cox partial likelihood.