Skip to contents

Introduction

Generalized Linear Models (GLMs) are widely used for analyzing multivariate data with non-normal responses. The Iteratively Reweighted Least Squares (IRLS) algorithm is the main workhorse for fitting these models in small to medium-scale problems because it reduces the estimation problem to a series of Weighted Least Squares (WLS) subproblems, which are computationally less expensive than solving the full maximum likelihood optimization directly.

However, IRLS can be sensitive to convergence issues—especially when using the standard Ordinary Least Squares (OLS) update in each iteration. In cases of model mis-specification or high variability in the data, the OLS estimator may yield significant estimation error. The glm2 package improves upon the standard glm by incorporating a step-halving strategy to better handle convergence challenges. This robust framework is implemented in its glm.fit2 function.

In the savvyGLM package, we build on the robust framework provided by the glm2 package (and its function glm.fit2) by replacing the standard OLS update in the IRLS algorithm with a shrinkage estimator. The rationale is that shrinkage estimation introduces a small bias to substantially reduce the Mean Squared Error (MSE) of the OLS estimates, resulting in more reliable parameter estimates and enhanced convergence stability. The corresponding shrinkage paper can be find in Slab and Shrinkage Linear Regression Estimation.

Simulation studies and real-data applications demonstrate that replacing the OLS update in IRLS with a shrinkage estimator improves the convergence and overall performance of GLM estimation compared to the traditional non-penalized implementation.

In summary, our package modifies glm.fit2 from glm2 to:

  • Leverage robust features, such as step-halving, already present in glm2.

  • Replace the OLS update with one of several shrinkage estimators (e.g., St_ost, DSh_ost, SR_ost, GSR_ost, and optionally Sh_ost).

  • Employ parallelization when multiple candidate methods are specified.

For more details, please see:

Asimit, V., Chen, Z., Dimitrova, D., Xie, Y., & Zhang, Y. (2025). Shrinkage GLM Modeling.

Main Features

savvyGLM provides five classes of shrinkage methods for GLMs, although by default four are evaluated unless the user explicitly includes Sh(since Sh is time-consuming):

  1. Custom Optimization Methods: Implements shrinkage estimators (St_ost, DSh_ost, SR_ost, GSR_ost, and optionally Sh_ost) that replace the standard OLS update in IRLS.

  2. Flexible Model Choice: The model_class argument allows users to choose one or more shrinkage methods. By default, only "St", "DSh", "SR", and "GSR" are evaluated. Including "Sh" is optional because solving the Sylvester equation required by Sh_ost is computationally intensive.

  • Parallel Evaluation: When multiple candidate methods are specified, the function evaluates them in parallel (if use_parallel = TRUE) and selects the final model based on the lowest Akaike Information Criterion (AIC).

Key Argument and Output Summary

Argument Description Default
x A numeric matrix of predictors. As for glm.fit. No default; user must supply.
y A numeric vector of responses. As for glm.fit. No default; user must supply.
weights An optional vector of weights to be used in the fitting process. rep(1, nobs)
model_class A character vector specifying the shrinkage model(s) to be used. Allowed values are "St", "DSh", "SR", "GSR", and "Sh". If multiple values are given, they are run in parallel and the best is chosen by AIC. If "Sh" is not included, it is omitted from the evaluation. c("St", "DSh", "SR", "GSR")
start, etastart, mustart Optional starting values for the parameters, the linear predictor, and the mean, respectively. NULL
offset An optional offset to be included in the model. rep(0, nobs)
family A function specifying the conditional distribution of YXY \mid X, e.g., YXfamily(g1(Xβ))Y \mid X \sim \text{family}(g^{-1}(X\beta)). (e.g., gaussian(), binomial(), poisson()). gaussian()
control A list of parameters for controlling the fitting process (e.g., maxit, epsilon). Similar to glm.control. list()
intercept A logical value indicating whether an intercept should be included in the model. TRUE
use_parallel Logical. If TRUE, multiple methods in model_class are evaluated in parallel. If only one method is specified, it is run directly. The best model is chosen by the lowest AIC among the methods tested. TRUE
Key Output Description Notes
chosen_fit The name of the shrinkage method selected based on the lowest AIC. If only one method is specified in model_class, then chosen_fit will simply return that method’s name. Returned in the final model object.
coefficients, residuals, etc. The usual GLM components, similar to those returned by glm.fit. See documentation for full details.

This vignette provides an overview of the methods implemented in savvyGLM and demonstrates how to use them on simulated data and real data. We begin with a theoretical overview of each shrinkage class and GLM with IRLS, then walk through simulation examples that illustrate how to apply savvyGLM for each method.


Theoretical Overview

Shrinkage estimators

The underlying optimization methods in savvyGLM are designed to reduce the theoretical mean square error by introducing a small bias that yields a substantial reduction in variance. This trade-off leads to more reliable parameter estimates and improved convergence properties in Generalized Linear Models.

Here, we just provide a table summary since for a detailed theoretical overview of the shrinkage estimators applied to GLMs, please refer to the savvySh package, which covers many of the same shrinkage estimators used here, but in the context of linear regression. The methods in savvyGLM extend these ideas to generalized linear models, ensuring that the IRLS updates incorporate shrinkage at each iteration.

Method Shrinkage Approach Simple Description Extra Overhead? Included by Default?
OLS No shrinkage Baseline estimator without any regularization. No No (baseline method)
St Scalar shrinkage Scales all OLS coefficients by a common factor to reduce MSE. No Yes
DSh Diagonal matrix Shrinks each OLS coefficient separately using individual factors. No Yes
SR Slab penalty (1 direction) Adds a penalty that shrinks the estimate along a fixed direction (e.g., all-ones vector). No Yes
GSR Slab penalty (multiple directions) Generalizes SR by penalizing multiple directions, often from eigenvectors. No Yes
Sh Full matrix (Sylvester equation) Applies a matrix transformation to OLS using a solution to the Sylvester equation. Yes (computational) No (only if specified in model_class)

Generalized Linear Model and its IRLS Implementation

A GLM assumes that a response variable YY, defined on some set 𝒴\mathcal{Y} \subseteq \Re, is related to a set of covariates 𝐗\mathbf{X} in 𝒳d\mathcal{X} \subseteq \Re{^d}. The conditional distribution of YY belongs to the exponential dispersion family, with probability density or mass function

fY(y;θ,ϕ)=exp{θyb(θ)a(ϕ)+c(y,ϕ)}, f_Y(y; \theta, \phi) \;=\; \exp\left\{\frac{\theta\,y - b(\theta)}{a(\phi)} \;+\; c(y, \phi)\right\},

where θ\theta is the canonical parameter, ϕ\phi is the dispersion parameter, and aa, bb, cc are known functions. The mean of YY is linked to a linear predictor ηi=𝐱i𝛃\eta_i = \mathbf{x}_i^\top \boldsymbol{\beta} through a link function gg, so that

𝔼[Yi𝐗i=𝐱i]=h(𝐱i𝛃), \mathbb{E}[Y_i \mid \mathbf{X}_i = \mathbf{x}_i] \;=\; h\left(\mathbf{x}_i^\top \boldsymbol{\beta}\right),

where h=g1h = g^{-1} is the inverse of the link. Under a canonical link, h()=b()h(\cdot) = b^\prime(\cdot).

For an independent sample {(yi,𝐱i)}i=1n\left\{(y_i, \mathbf{x}_i)\right\}_{i=1}^n, the log-likelihood for 𝛃\boldsymbol{\beta} can be written as

(𝛃)=i=1nθiyib(θi)a(ϕ)+c(yi,ϕ), \ell(\boldsymbol{\beta}) \;=\; \sum_{i=1}^n \frac{\theta_i\,y_i \;-\; b(\theta_i)}{a(\phi)} \;+\; c\left(y_i, \phi\right),

where θi=(b)1h(𝐱i𝛃)\theta_i = (b^\prime)^{-1}\circ\, h\left(\mathbf{x}_i^\top \boldsymbol{\beta}\right). Maximizing this log-likelihood is equivalent to minimizing the following objective function,

𝒞(𝛃)=i=1n(θiyib(θi)). \mathcal{C}(\boldsymbol{\beta}) \;=\; -\sum_{i=1}^n \left(\theta_i\,y_i \;-\; b(\theta_i)\right).

Taking the derivative of 𝒞(𝛃)\mathcal{C}(\boldsymbol{\beta}) with respect to 𝛃\boldsymbol{\beta} leads to the so-called normal equations, which can be solved iteratively. In particular, the IRLS algorithm reformulates each iteration as a WLS problem:

𝛃̂(t+1)=argmin𝛃(𝐳(t)𝐗𝛃)𝐖(t)(𝐳(t)𝐗𝛃), \widehat{\boldsymbol{\beta}}^{(t+1)} \;=\; \underset{\boldsymbol{\beta}}{\mathrm{arg\,min}} \;\left(\mathbf{z}^{(t)} - \mathbf{X}\,\boldsymbol{\beta}\right)^\top \mathbf{W}^{(t)} \left(\mathbf{z}^{(t)} - \mathbf{X}\,\boldsymbol{\beta}\right),

where 𝐖(t)\mathbf{W}^{(t)} is the diagonal matrix of weights and 𝐳(t)\mathbf{z}^{(t)} is the pseudo-response at iteration tt. Specifically,

𝐖(t)=diag((h(ηi(t)))2/V(μi(t))),𝐳i(t)=ηi(t)+yiμi(t)h(ηi(t)), \mathbf{W}^{(t)} \;=\; \mathrm{diag}\left(\left(h^\prime(\eta_i^{(t)})\right)^2 / V\left(\mu_i^{(t)}\right)\right), \quad \mathbf{z}_i^{(t)} \;=\; \eta_i^{(t)} \;+\; \frac{y_i - \mu_i^{(t)}}{h^\prime(\eta_i^{(t)})},

with μi(t)=h(ηi(t))\mu_i^{(t)} = h(\eta_i^{(t)}) and ηi(t)=𝐱i𝛃̂(t)\eta_i^{(t)} = \mathbf{x}_i^\top \widehat{\boldsymbol{\beta}}^{(t)}. Here, V(μi)V(\mu_i) is the variance function determined by the exponential family, and hh^\prime is the derivative of the inverse link.

By iterating this WLS problem until convergence (or until a maximum number of iterations is reached), IRLS provides the maximum likelihood estimates for the GLM parameters. In savvyGLM, these WLS updates are further enhanced by applying shrinkage estimators (St_ost, DSh_ost, SR_ost, GSR_ost}, orSh_ost`) at each iteration, thereby improving estimation accuracy and convergence stability.


Simulation Examples

This section presents simulation studies to compare the performance of the shrinkage estimators implemented in the savvyGLM package. We compare the standard IRLS update (as implemented in glm.fit2) with various shrinkage methods implemented in savvyGLM by measuring the L2L_2 distance between the estimated and true coefficients. Lower L2L_2 values indicate better estimation accuracy.

The following data-generating processes (DGPs) are considered for logistic regression (LR), Poisson, and Gamma GLMs with log link and sqrt link:

  1. Covariate Matrix: The covariate matrix 𝐗n×p\mathbf{X} \in \Re^{n \times p} is drawn from a multivariate normal distribution with mean zero, unit variances, and a Toeplitz covariance matrix Σ\Sigma where Cov(Xi,j,Xk,j)=ρ|ik|\text{Cov}(X_{i,j}, X_{k,j}) = \rho^{|i-k|} with ρ(1,1)\rho \in (-1,1).

  2. True Coefficients: The true regression coefficients βj\beta_j are set to alternate in sign and increase in magnitude (e.g., 1,1,2,2,1, -1, 2, -2, \dots for LR with the logit link). For Poisson and Gamma GLMs with the log link, a smaller scaling is used to avoid numerical issues.

  3. Response Generation:

    • LR: Compute the linear predictor: ηi=β0+j=1pβjxij\eta_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij} and generate YiBinomial(1,11+eηi)Y_i \sim \text{Binomial}\left(1, \frac{1}{1+e^{-\eta_i}}\right).

    • Poisson GLM:Two link functions are considered:For the log link: μi=eηi\mu_i = e^{\eta_i}; for the sqrt link: μi=ηi2\mu_i = \eta_i^2. Then, generate YiPoisson(μi)Y_i \sim \text{Poisson}(\mu_i).

    • Gamma GLM: Similarly, use μi=eηi\mu_i = e^{\eta_i} (log link) or μi=ηi2\mu_i = \eta_i^2 (sqrt link), and generate YiY_i from a Gamma distribution with mean μi\mu_i.

For each scenario, we fit the GLM using the standard IRLS update (i.e., OLS-based) and the shrinkage-based IRLS update implemented in savvyGLM. The performance is assessed using the L2L_2 distance between the estimated and true coefficients. Tables comparing these distances and other relevant diagnostics are provided to illustrate the benefits of the shrinkage approach. These simulation studies aims to demonstrate that replacing the OLS update with shrinkage estimators in the IRLS algorithm improves convergence and estimation accuracy, especially in challenging settings with multicollinearity or extreme response values.

Note: Due to differences in underlying libraries and random number generation implementations (e.g., BLAS/LAPACK and RNG behavior), the data generated by functions like mvrnorm may differ slightly between Windows and macOS/Linux, even with the same seed.

Logistic Regression

Balanced Data

In this simulation, balanced binary responses are generated using the logit link. Covariates are sampled from a multivariate normal distribution with a Toeplitz covariance matrix determined by various correlation coefficients. To ensure a balanced dataset, a larger dataset is first generated, and then a subset is selected to achieve roughly equal proportions of 0’s and 1’s. We fit the models using the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
#> 
#> Attaching package: 'glm2'
#> The following object is masked from 'package:MASS':
#> 
#>     crabs
library(CVXR)
#> 
#> Attaching package: 'CVXR'
#> The following object is masked from 'package:MASS':
#> 
#>     huber
#> The following object is masked from 'package:stats':
#> 
#>     power
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
target_proportion <- 0.5
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- binomial(link = "logit")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

findStartingValues <- function(x, y) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  g_y <- (y + 0.5) / 2
  logit_g_y <- log(g_y / (1 - g_y))
  objective <- Minimize(sum_squares(logit_g_y - eta))
  problem <- Problem(objective)
  result <- solve(problem)
  as.numeric(result$getValue(beta))
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  n_val_large <- n_val * 10
  
  X_large <- mvrnorm(n_val_large, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_large_intercept <- cbind(1, X_large)
  beta_true <- theta_func(p_val + 1)
  mu_y_large <- as.vector(1 / (1 + exp(-X_large_intercept %*% beta_true)))
  y_large <- rbinom(n = n_val_large, size = 1, prob = mu_y_large)

  y_zero_indices <- which(y_large == 0)[1:round(n_val * target_proportion)]
  y_one_indices <- which(y_large == 1)[1:(n_val - round(n_val * target_proportion))]
  final_indices <- c(y_zero_indices, y_one_indices)
  X_final <- X_large_intercept[final_indices, ]
  y_final <- y_large[final_indices]
  starting_values <- findStartingValues(X_final, y_final)
  
  fit_ols <- glm.fit2(X_final, y_final, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_final, y_final, model_class = m,
                           start = starting_values, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different ρ\rho values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (Balanced LR)")
L2 Distance Between Estimated and True Coefficients (Balanced LR)
rho=-0.75 rho=-0.5 rho=0 rho=0.5 rho=0.75
OLS 113.9223 370.0605 389.5301 85.0934 1411.1209
SR 113.9563 370.6068 389.4979 85.0923 1411.7484
GSR 119.8898 369.5698 391.3903 81.4199 1413.7821
St 116.8701 366.5771 388.6831 82.2640 1409.4099
DSh 117.7217 365.9178 388.7349 81.8181 1414.8157
Sh 12.3169 53.2125 62.8100 71.2555 307.2719

Imbalanced Data

For imbalanced LR, the simulation settings are similar except that the target proportion for the negative class is reduced (for example, only 5% of the responses are 0’s). This results in a skewed binary response, which allows us to evaluate the robustness of the shrinkage estimators under class imbalance. As before, models are fit using both the standard GLM and the various shrinkage-based GLM.

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
target_proportion <- 0.05
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- binomial(link = "logit")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

findStartingValues <- function(x, y) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  g_y <- (y + 0.5) / 2
  logit_g_y <- log(g_y / (1 - g_y))
  objective <- Minimize(sum_squares(logit_g_y - eta))
  problem <- Problem(objective)
  result <- solve(problem)
  as.numeric(result$getValue(beta))
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  n_val_large <- n_val * 10
  
  X_large <- mvrnorm(n_val_large, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_large_intercept <- cbind(1, X_large)
  beta_true <- theta_func(p_val + 1)
  mu_y_large <- as.vector(1 / (1 + exp(-X_large_intercept %*% beta_true)))
  y_large <- rbinom(n = n_val_large, size = 1, prob = mu_y_large)

  y_zero_indices <- which(y_large == 0)[1:round(n_val * target_proportion)]
  y_one_indices <- which(y_large == 1)[1:(n_val - round(n_val * target_proportion))]
  final_indices <- c(y_zero_indices, y_one_indices)
  X_final <- X_large_intercept[final_indices, ]
  y_final <- y_large[final_indices]
  starting_values <- findStartingValues(X_final, y_final)
  
  fit_ols <- glm.fit2(X_final, y_final, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_final, y_final, model_class = m,
                           start = starting_values, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different ρ\rho values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (Imbalanced LR)")
L2 Distance Between Estimated and True Coefficients (Imbalanced LR)
rho=-0.75 rho=-0.5 rho=0 rho=0.5 rho=0.75
OLS 43.8576 76.1059 149.6920 148.9224 416.6273
SR 43.4368 75.9847 149.3453 148.8895 416.8925
GSR 45.6810 75.6519 149.7902 149.3444 418.4628
St 45.5025 75.1966 149.0896 148.5704 415.2965
DSh 44.0459 74.7376 148.6800 148.5846 417.6846
Sh 26.6729 19.8582 9.2992 12.6352 65.0610

Poisson GLMs

For Poisson GLMs with the log link, the mean response is modeled as μ=exp(η)\mu = \exp(\eta) with η=Xinterceptβtrue\eta = X_{\text{intercept}} \, \beta_{\text{true}}. In this simulation, covariates are generated from a multivariate normal distribution with a Toeplitz covariance matrix. The responses are simulated from a Poisson distribution with rate μ=exp(η)\mu = \exp(\eta). We fit the models using the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- poisson(link = "log")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  base_increment <- 0.1
  growth_rate <- 0.95 
  betas <- base_increment * (growth_rate ^ seq(from = 0, length.out = ceiling(p_val / 2)))
  betas <- rep(betas, each = 2)[1:p_val]
  signs <- rep(c(1, -1), length.out = p_val)
  betas <- betas * signs
  return(betas)
}

findStartingValues <- function(x, y) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(log(y + 0.1) - eta))
  problem <- Problem(objective)
  result <- solve(problem)
  starting_values <- as.numeric(result$getValue(beta))
  return(starting_values)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector(exp(X_intercept %*% beta_true))
  y <- rpois(n_val, lambda = mu_y)
  starting_values <- findStartingValues(X_intercept, y)
  
  fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
                           start = starting_values, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different ρ\rho values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (log link for Poisson GLM)")
L2 Distance Between Estimated and True Coefficients (log link for Poisson GLM)
rho=-0.75 rho=-0.5 rho=0 rho=0.5 rho=0.75
OLS 0.2847 0.2477 0.2115 0.3973 0.3676
SR 0.2728 0.2142 0.1987 0.3986 0.3677
GSR 0.1981 0.1741 0.1870 0.3206 0.3010
St 0.2473 0.2005 0.1855 0.3273 0.2855
DSh 0.2718 0.2473 0.2206 0.3546 0.3501
Sh 0.2750 0.2437 0.2095 0.3914 0.3579

For the Poisson GLM with the sqrt link, the mean response is defined as μ=η2\mu = \eta^2 with η=Xinterceptβtrue\eta = X_{\text{intercept}} \, \beta_{\text{true}}. The simulation procedure is similar to the log link case; however, the responses are generated with Poission distribution with λ\lambdaμ=η2\mu=\eta^2. To address potential numerical issues (such as non-positive responses). Models are then fit using both the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- poisson(link = "sqrt")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

findStartingValues <- function(x, y, epsilon = 1e-6) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
  constraints <- list(eta >= epsilon)
  problem <- Problem(objective, constraints)
  result <- solve(problem)
  starting_values <- as.numeric(result$getValue(beta))
  return(starting_values)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector((X_intercept %*% beta_true)^2)
  y <- rpois(n_val, lambda = mu_y)
  starting_values <- findStartingValues(X_intercept, y)
  
  fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
                           start = starting_values, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different ρ\rho values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (sqrt link for Poisson GLM)")
L2 Distance Between Estimated and True Coefficients (sqrt link for Poisson GLM)
rho=-0.75 rho=-0.5 rho=0 rho=0.5 rho=0.75
OLS 105.6656 82.8127 57.8924 45.0648 40.1208
SR 107.3804 84.1176 58.0029 45.0435 40.0857
GSR 103.4257 81.8108 57.1888 44.9339 40.3197
St 99.2810 80.5491 57.3508 44.6846 39.9475
DSh 104.1294 82.4709 57.5069 45.2324 40.5008
Sh 105.6656 82.8127 57.8924 45.0648 40.1208

Gamma GLMs

In Gamma GLMs with the log link, the mean response is given by μ=exp(η)\mu = \exp(\eta) with η=Xinterceptβtrue\eta = X_{\text{intercept}} \, \beta_{\text{true}}. Covariates are generated as before, and responses are drawn from a Gamma distribution with shape μ=exp(η)\mu =\exp(\eta). A retry mechanism is used to ensure that all simulated responses are positive, which is required by the Gamma model. Models are fit using both the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "log")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  base_increment <- 0.1
  growth_rate <- 0.95 
  betas <- base_increment * (growth_rate ^ seq(from = 0, length.out = ceiling(p_val / 2)))
  betas <- rep(betas, each = 2)[1:p_val]
  signs <- rep(c(1, -1), length.out = p_val)
  betas <- betas * signs
  return(betas)
}

findStartingValues <- function(x, y) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(log(y + 0.1) - eta))
  problem <- Problem(objective)
  result <- solve(problem)
  starting_values <- as.numeric(result$getValue(beta))
  return(starting_values)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector(exp(X_intercept %*% beta_true))
  y <-  rgamma(n_val, shape = mu_y, scale = 1)
  retry_count <- 0
  while(any(y <= 0) && retry_count < 100) {
    bad_idx <- which(y <= 0)
    y[bad_idx] <- rgamma(length(bad_idx), shape = mu_y[bad_idx], scale = 1)
    retry_count <- retry_count + 1
  }
  starting_values <- findStartingValues(X_intercept, y)
  
  fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
                           start = starting_values, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different ρ\rho values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (log link for Gamma GLM)")
L2 Distance Between Estimated and True Coefficients (log link for Gamma GLM)
rho=-0.75 rho=-0.5 rho=0 rho=0.5 rho=0.75
OLS 0.4254 0.2821 0.1926 0.3676 0.4349
SR 0.3941 0.2812 0.1919 0.3652 0.4345
GSR 0.3001 0.2103 0.1742 0.3101 0.3238
St 0.2670 0.2179 0.1627 0.2818 0.3337
DSh 0.3171 0.2600 0.2052 0.3366 0.3509
Sh 0.4102 0.2777 0.1894 0.3605 0.4205

For Gamma GLMs with the sqrt link, the mean response is modeled as μ=η2\mu = \eta^2 with η=Xinterceptβtrue\eta = X_{\text{intercept}} \, \beta_{\text{true}}. Covariates are generated as before, and responses are drawn from a Gamma distribution with shape μ=η2\mu =\eta^2, including a retry mechanism to ensure all response values are positive. Fitting the models using both the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "sqrt")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

findStartingValues <- function(x, y, epsilon = 1e-6) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
  constraints <- list(eta >= epsilon)
  problem <- Problem(objective, constraints)
  result <- solve(problem)
  starting_values <- as.numeric(result$getValue(beta))
  return(starting_values)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector((X_intercept %*% beta_true)^2)
  y <- rgamma(n_val, shape = mu_y, scale = 1)
  retry_count <- 0
  while(any(y <= 0) && retry_count < 100) {
    bad_idx <- which(y <= 0)
    y[bad_idx] <- rgamma(length(bad_idx), shape = mu_y[bad_idx], scale = 1)
    retry_count <- retry_count + 1
  }
  starting_values <- findStartingValues(X_intercept, y)
  
  fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
                           start = starting_values, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different ρ\rho values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (sqrt link for Gamma GLM)")
L2 Distance Between Estimated and True Coefficients (sqrt link for Gamma GLM)
rho=-0.75 rho=-0.5 rho=0 rho=0.5 rho=0.75
OLS 107.8225 81.8266 56.4835 45.4632 40.4655
SR 107.8121 81.8224 56.4548 45.4602 40.4746
GSR 104.3230 80.3020 55.5819 45.3778 40.4913
St 102.0399 79.3341 55.8032 45.0603 40.2149
DSh 105.1456 81.0958 55.9391 45.8814 40.3181
Sh 107.7353 81.7724 56.4640 45.4739 40.4802

Real Data Analysis

In this section, we evaluate the performance of Gamma GLM estimation methods using real flood insurance data from three states: Florida (FL), Texas (TX), and Louisiana (LA). The dataset covers the period 2014 to 2023 and includes claims from the National Flood Insurance Programme (NFIP). For each state (e.g., Florida (FL) and Louisiana (LA)), the data are no provided here, but you can download from NFIP and follow the preprocess steps provided in the paper.

For each year, the dataset is split into a 70% training set and a 30% test set. Models are fitted using the standard GLM (via glm.fit2) and several shrinkage-based GLM (via savvy_glm.fit2 with model classes such as SR, GSR, St, DSh, and optionally Sh) using two different link functions:

  • For Gamma GLMs with the log link, the mean response is modeled as μ=exp(η)\mu = \exp(\eta).

  • For Gamma GLMs with the sqrt link, the mean response is modeled as μ=η2\mu = \eta^2.

For each replication, the MSE on the test set is computed. For each shrinkage-based GLM, we calculate the ratio

MSE Ratio=MSEBaseline (OLS)MSEShrinkage. \text{MSE Ratio} = \frac{\text{MSE}_{\text{Baseline (OLS)}}}{\text{MSE}_{\text{Shrinkage}}}.

A ratio greater than one indicates that the shrinkage-based GLM achieved a lower MSE (i.e., superior performance) compared to the standard update. This process is repeated multiple times (e.g., N=100N=100 replications per year), and the average ratio is computed for each model and each year.

Here, we just provide two cases and for a detailed account of the full data processing and evaluation procedures, please refer to the main paper.


# Load required packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(caret)
library(knitr)

set.seed(1234)
years <- 2014:2015
model_names <- c("OLS", "SR", "GSR", "St", "DSh")
N <- 10
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "log")

Evaluation_results <- matrix(NA, nrow = length(model_names), ncol = length(years),
                             dimnames = list(model_names, years))

findStartingValues <- function(x, y) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(log(y + 0.1) - eta))
  problem <- Problem(objective)
  result <- solve(problem)
  starting_values <- as.numeric(result$getValue(beta))
  return(starting_values)
}

calculate_mse <- function(true_values, predicted_values) {
  mean((true_values - predicted_values)^2)
}

for (yr in years) {
  cat("Processing year:", yr, "\n")
  filename <- sprintf("FL_data_%d.csv", yr)
  data_year <- read.csv(filename, header = TRUE, stringsAsFactors = FALSE)
  
  ratio_mat <- matrix(NA, nrow = N, ncol = length(model_names)-1)
  colnames(ratio_mat) <- model_names[-1]
  for (i in 1:N) {
    set.seed(yr * 1000 + i)
    train_index <- createDataPartition(data_year[, 1], p = 0.7, list = FALSE)
    train_data <- data_year[train_index, ]
    test_data <- data_year[-train_index, ]
    
    X_train <- as.matrix(train_data[, -1])
    y_train <- train_data[, 1]
    X_test <- as.matrix(test_data[, -1])
    y_test <- test_data[, 1]
    X_train_int <- cbind(1, X_train)
    X_test_int <- cbind(1, X_test)
    starting_values <- findStartingValues(X_train_int, y_train)
  
    model_glm2 <- glm.fit2(X_train_int, y_train, start = starting_values,
                           control = control_list, family = family_type)
    y_pred_glm2 <- exp(X_test_int %*% model_glm2$coefficients)
    mse_ols <- calculate_mse(y_test, y_pred_glm2)
    for (m in model_names[-1]) {
      model_savvy <- savvy_glm.fit2(X_train_int, y_train, model_class = m,
                                      start = starting_values, control = control_list, family = family_type)
      y_pred <- exp(X_test_int %*% model_savvy$coefficients)
      mse_savvy <- calculate_mse(y_test, y_pred)
      ratio_mat[i, m] <- mse_ols / mse_savvy
    }
  }
  avg_ratios <- c(1, colMeans(ratio_mat, na.rm = TRUE))
  Evaluation_results[, as.character(yr)] <- avg_ratios
}
Evaluation_results_df <- as.data.frame(Evaluation_results)
kable(Evaluation_results_df, digits = 4,
      caption = "Average MSE Ratio (OLS / Shrinkage) for Each Year (Gamma GLM with Log Link)")
# Load required packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(caret)
library(knitr)

set.seed(1234)
years <- 2014:2023
model_names <- c("OLS", "SR", "GSR", "St", "DSh")
N <- 100
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "sqrt")

Evaluation_results <- matrix(NA, nrow = length(model_names), ncol = length(years),
                             dimnames = list(model_names, years))

findStartingValues <- function(x, y, epsilon = 1e-6) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
  constraints <- list(eta >= epsilon)
  problem <- Problem(objective, constraints)
  result <- solve(problem)
  starting_values <- as.numeric(result$getValue(beta))
  return(starting_values)
}

calculate_mse <- function(true_values, predicted_values) {
  mean((true_values - predicted_values)^2)
}

for (yr in years) {
  cat("Processing year:", yr, "\n")
  filename <- sprintf("LA_data_%d.csv", yr)
  data_year <- read.csv(filename, header = TRUE, stringsAsFactors = FALSE)
  
  ratio_mat <- matrix(NA, nrow = N, ncol = length(model_names)-1)
  colnames(ratio_mat) <- model_names[-1]
  for (i in 1:N) {
    set.seed(yr * 1000 + i)
    train_index <- createDataPartition(data_year[, 1], p = 0.7, list = FALSE)
    train_data <- data_year[train_index, ]
    test_data <- data_year[-train_index, ]
    
    X_train <- as.matrix(train_data[, -1])
    y_train <- train_data[, 1]
    X_test <- as.matrix(test_data[, -1])
    y_test <- test_data[, 1]
    X_train_int <- cbind(1, X_train)
    X_test_int <- cbind(1, X_test)
    starting_values <- findStartingValues(X_train_int, y_train)
  
    model_glm2 <- glm.fit2(X_train_int, y_train, start = starting_values,
                           control = control_list, family = family_type)
    y_pred_glm2 <- (X_test_int %*% model_glm2$coefficients)^2
    mse_ols <- calculate_mse(y_test, y_pred_glm2)
    for (m in model_names[-1]) {
      model_savvy <- savvy_glm.fit2(X_train_int, y_train, model_class = m,
                                      start = starting_values, control = control_list, family = family_type)
      y_pred <- exp(X_test_int %*% model_savvy$coefficients)
      mse_savvy <- calculate_mse(y_test, y_pred)
      ratio_mat[i, m] <- mse_ols / mse_savvy
    }
  }
  avg_ratios <- c(1, colMeans(ratio_mat, na.rm = TRUE))
  Evaluation_results[, as.character(yr)] <- avg_ratios
}
Evaluation_results_df <- as.data.frame(Evaluation_results)
kable(Evaluation_results_df, digits = 4,
      caption = "Average MSE Ratio (OLS / Shrinkage) for Each Year (Gamma GLM with Log Link)")