savvyGLM: Shrinkage Methods for Generalized Linear Models
savvyGLM.Rmd
Introduction
Generalized Linear Models (GLMs) are widely used for analyzing multivariate data with non-normal responses. The Iteratively Reweighted Least Squares (IRLS) algorithm is the main workhorse for fitting these models in small to medium-scale problems because it reduces the estimation problem to a series of Weighted Least Squares (WLS) subproblems, which are computationally less expensive than solving the full maximum likelihood optimization directly.
However, IRLS can be sensitive to convergence issues—especially when
using the standard Ordinary Least Squares (OLS) update in each
iteration. In cases of model mis-specification or high variability in
the data, the OLS estimator may yield significant estimation error. The
glm2
package improves upon the standard glm
by
incorporating a step-halving strategy to better handle convergence
challenges. This robust framework is implemented in its
glm.fit2
function.
In the savvyGLM
package, we build on the robust
framework provided by the glm2
package (and its function
glm.fit2
) by replacing the standard OLS update in the IRLS
algorithm with a shrinkage estimator. The rationale is that shrinkage
estimation introduces a small bias to substantially reduce the Mean
Squared Error (MSE) of the OLS estimates, resulting in more
reliable parameter estimates and enhanced convergence stability. The
corresponding shrinkage paper can be find in Slab and Shrinkage
Linear Regression Estimation.
Simulation studies and real-data applications demonstrate that replacing the OLS update in IRLS with a shrinkage estimator improves the convergence and overall performance of GLM estimation compared to the traditional non-penalized implementation.
In summary, our package modifies glm.fit2
from
glm2
to:
Leverage robust features, such as step-halving, already present in
glm2
.Replace the OLS update with one of several shrinkage estimators (e.g.,
St_ost
,DSh_ost
,SR_ost
,GSR_ost
, and optionallySh_ost
).Employ parallelization when multiple candidate methods are specified.
For more details, please see:
Asimit, V., Chen, Z., Dimitrova, D., Xie, Y., & Zhang, Y. (2025). Shrinkage GLM Modeling.
Main Features
savvyGLM
provides five classes of shrinkage methods for
GLMs, although by default four are evaluated unless the user explicitly
includes Sh
(since Sh is time-consuming):
Custom Optimization Methods: Implements shrinkage estimators (
St_ost
,DSh_ost
,SR_ost
,GSR_ost
, and optionallySh_ost
) that replace the standard OLS update in IRLS.Flexible Model Choice: The model_class argument allows users to choose one or more shrinkage methods. By default, only
"St"
,"DSh"
,"SR"
, and"GSR"
are evaluated. Including"Sh"
is optional because solving the Sylvester equation required bySh_ost
is computationally intensive.
- Parallel Evaluation: When multiple candidate methods are specified, the function evaluates them in parallel (if use_parallel = TRUE) and selects the final model based on the lowest Akaike Information Criterion (AIC).
Key Argument and Output Summary
Argument | Description | Default |
---|---|---|
x |
A numeric matrix of predictors. As for glm.fit . |
No default; user must supply. |
y |
A numeric vector of responses. As for glm.fit . |
No default; user must supply. |
weights |
An optional vector of weights to be used in the fitting process. | rep(1, nobs) |
model_class |
A character vector specifying the shrinkage model(s) to be used.
Allowed values are "St" , "DSh" ,
"SR" , "GSR" , and "Sh" . If
multiple values are given, they are run in parallel and the best is
chosen by AIC. If "Sh" is not included, it is omitted from
the evaluation. |
c("St", "DSh", "SR", "GSR") |
start , etastart , mustart
|
Optional starting values for the parameters, the linear predictor, and the mean, respectively. | NULL |
offset |
An optional offset to be included in the model. | rep(0, nobs) |
family |
A function specifying the conditional distribution of
,
e.g.,
.
(e.g., gaussian() , binomial() ,
poisson() ). |
gaussian() |
control |
A list of parameters for controlling the fitting process (e.g.,
maxit , epsilon ). Similar to glm.control . |
list() |
intercept |
A logical value indicating whether an intercept should be included in the model. | TRUE |
use_parallel |
Logical. If TRUE , multiple methods in
model_class are evaluated in parallel. If only one method
is specified, it is run directly. The best model is chosen by the lowest
AIC among the methods tested. |
TRUE |
Key Output | Description | Notes |
chosen_fit |
The name of the shrinkage method selected based on the lowest AIC.
If only one method is specified in model_class , then
chosen_fit will simply return that method’s name. |
Returned in the final model object. |
coefficients , residuals , etc. |
The usual GLM components, similar to those returned by glm.fit . |
See documentation for full details. |
This vignette provides an overview of the methods implemented in
savvyGLM
and demonstrates how to use them on simulated data and real data. We begin with a theoretical
overview of each shrinkage class and GLM with IRLS, then walk through
simulation examples that illustrate how to apply savvyGLM
for each method.
Theoretical Overview
Shrinkage estimators
The underlying optimization methods in savvyGLM
are
designed to reduce the theoretical mean square error by introducing a
small bias that yields a substantial reduction in variance. This
trade-off leads to more reliable parameter estimates and improved
convergence properties in Generalized Linear Models.
Here, we just provide a table summary since for a detailed
theoretical overview of the shrinkage estimators applied to GLMs, please
refer to the savvySh
package, which covers many of the same shrinkage estimators used
here, but in the context of linear regression. The methods in
savvyGLM
extend these ideas to generalized linear models,
ensuring that the IRLS updates incorporate shrinkage at each
iteration.
Method | Shrinkage Approach | Simple Description | Extra Overhead? | Included by Default? |
---|---|---|---|---|
OLS | No shrinkage | Baseline estimator without any regularization. | No | No (baseline method) |
St | Scalar shrinkage | Scales all OLS coefficients by a common factor to reduce MSE. | No | Yes |
DSh | Diagonal matrix | Shrinks each OLS coefficient separately using individual factors. | No | Yes |
SR | Slab penalty (1 direction) | Adds a penalty that shrinks the estimate along a fixed direction (e.g., all-ones vector). | No | Yes |
GSR | Slab penalty (multiple directions) | Generalizes SR by penalizing multiple directions, often from eigenvectors. | No | Yes |
Sh | Full matrix (Sylvester equation) | Applies a matrix transformation to OLS using a solution to the Sylvester equation. | Yes (computational) | No (only if specified in model_class ) |
Generalized Linear Model and its IRLS Implementation
A GLM assumes that a response variable , defined on some set , is related to a set of covariates in . The conditional distribution of belongs to the exponential dispersion family, with probability density or mass function
where is the canonical parameter, is the dispersion parameter, and , , are known functions. The mean of is linked to a linear predictor through a link function , so that
where is the inverse of the link. Under a canonical link, .
For an independent sample , the log-likelihood for can be written as
where . Maximizing this log-likelihood is equivalent to minimizing the following objective function,
Taking the derivative of with respect to leads to the so-called normal equations, which can be solved iteratively. In particular, the IRLS algorithm reformulates each iteration as a WLS problem:
where is the diagonal matrix of weights and is the pseudo-response at iteration . Specifically,
with and . Here, is the variance function determined by the exponential family, and is the derivative of the inverse link.
By iterating this WLS problem until convergence (or until a maximum
number of iterations is reached), IRLS provides the maximum likelihood
estimates for the GLM parameters. In savvyGLM
, these WLS
updates are further enhanced by applying shrinkage estimators
(St_ost
, DSh_ost
, SR_ost
,
GSR_ost}, or
Sh_ost`) at each iteration, thereby improving
estimation accuracy and convergence stability.
Simulation Examples
This section presents simulation studies to compare the performance
of the shrinkage estimators implemented in the savvyGLM
package. We compare the standard IRLS update (as implemented in
glm.fit2
) with various shrinkage methods implemented in
savvyGLM
by measuring the
distance between the estimated and true coefficients. Lower
values indicate better estimation accuracy.
The following data-generating processes (DGPs) are considered for logistic regression (LR), Poisson, and Gamma GLMs with log link and sqrt link:
Covariate Matrix: The covariate matrix is drawn from a multivariate normal distribution with mean zero, unit variances, and a Toeplitz covariance matrix where with .
True Coefficients: The true regression coefficients are set to alternate in sign and increase in magnitude (e.g., for LR with the logit link). For Poisson and Gamma GLMs with the log link, a smaller scaling is used to avoid numerical issues.
-
Response Generation:
LR: Compute the linear predictor: and generate .
Poisson GLM:Two link functions are considered:For the log link: ; for the sqrt link: . Then, generate .
Gamma GLM: Similarly, use (log link) or (sqrt link), and generate from a Gamma distribution with mean .
For each scenario, we fit the GLM using the standard IRLS update
(i.e., OLS-based) and the shrinkage-based IRLS update implemented in
savvyGLM
. The performance is assessed using the
distance between the estimated and true coefficients. Tables comparing
these distances and other relevant diagnostics are provided to
illustrate the benefits of the shrinkage approach. These simulation
studies aims to demonstrate that replacing the OLS update with shrinkage
estimators in the IRLS algorithm improves convergence and estimation
accuracy, especially in challenging settings with multicollinearity or
extreme response values.
Note: Due to differences in underlying libraries and
random number generation implementations (e.g., BLAS/LAPACK and RNG
behavior), the data generated by functions like mvrnorm
may
differ slightly between Windows and macOS/Linux, even with the same
seed.
Logistic Regression
Balanced Data
In this simulation, balanced binary responses are generated using the
logit link. Covariates are sampled from a multivariate normal
distribution with a Toeplitz covariance matrix determined by various
correlation coefficients. To ensure a balanced dataset, a larger dataset
is first generated, and then a subset is selected to achieve roughly
equal proportions of 0’s and 1’s. We fit the models using the standard
GLM (via glm.fit2
) and the shrinkage-based GLM (via
savvy_glm.fit2
in savvyGLM
package).
# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
#>
#> Attaching package: 'glm2'
#> The following object is masked from 'package:MASS':
#>
#> crabs
library(CVXR)
#>
#> Attaching package: 'CVXR'
#> The following object is masked from 'package:MASS':
#>
#> huber
#> The following object is masked from 'package:stats':
#>
#> power
library(knitr)
set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
target_proportion <- 0.5
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- binomial(link = "logit")
sigma.rho <- function(rho_val, p_val) {
rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}
theta_func <- function(p_val) {
sgn <- rep(c(1, -1), length.out = p_val)
mag <- ceiling(seq_len(p_val) / 2)
sgn * mag
}
findStartingValues <- function(x, y) {
beta <- Variable(ncol(x))
eta <- x %*% beta
g_y <- (y + 0.5) / 2
logit_g_y <- log(g_y / (1 - g_y))
objective <- Minimize(sum_squares(logit_g_y - eta))
problem <- Problem(objective)
result <- solve(problem)
as.numeric(result$getValue(beta))
}
model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
rho_val <- rho_vals[j]
Sigma <- sigma.rho(rho_val, p_val)
n_val_large <- n_val * 10
X_large <- mvrnorm(n_val_large, mu = rep(mu_val, p_val), Sigma = Sigma)
X_large_intercept <- cbind(1, X_large)
beta_true <- theta_func(p_val + 1)
mu_y_large <- as.vector(1 / (1 + exp(-X_large_intercept %*% beta_true)))
y_large <- rbinom(n = n_val_large, size = 1, prob = mu_y_large)
y_zero_indices <- which(y_large == 0)[1:round(n_val * target_proportion)]
y_one_indices <- which(y_large == 1)[1:(n_val - round(n_val * target_proportion))]
final_indices <- c(y_zero_indices, y_one_indices)
X_final <- X_large_intercept[final_indices, ]
y_final <- y_large[final_indices]
starting_values <- findStartingValues(X_final, y_final)
fit_ols <- glm.fit2(X_final, y_final, start = starting_values,
control = control_list, family = family_type)
l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")
for (m in model_names[-1]) {
fit <- savvy_glm.fit2(X_final, y_final, model_class = m,
start = starting_values, control = control_list, family = family_type)
l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
}
}
A summary table here is produced to compare these distances across different values.
l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (Balanced LR)")
rho=-0.75 | rho=-0.5 | rho=0 | rho=0.5 | rho=0.75 | |
---|---|---|---|---|---|
OLS | 113.9223 | 370.0605 | 389.5301 | 85.0934 | 1411.1209 |
SR | 113.9563 | 370.6068 | 389.4979 | 85.0923 | 1411.7484 |
GSR | 119.8898 | 369.5698 | 391.3903 | 81.4199 | 1413.7821 |
St | 116.8701 | 366.5771 | 388.6831 | 82.2640 | 1409.4099 |
DSh | 117.7217 | 365.9178 | 388.7349 | 81.8181 | 1414.8157 |
Sh | 12.3169 | 53.2125 | 62.8100 | 71.2555 | 307.2719 |
Imbalanced Data
For imbalanced LR, the simulation settings are similar except that the target proportion for the negative class is reduced (for example, only 5% of the responses are 0’s). This results in a skewed binary response, which allows us to evaluate the robustness of the shrinkage estimators under class imbalance. As before, models are fit using both the standard GLM and the various shrinkage-based GLM.
# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)
set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
target_proportion <- 0.05
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- binomial(link = "logit")
sigma.rho <- function(rho_val, p_val) {
rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}
theta_func <- function(p_val) {
sgn <- rep(c(1, -1), length.out = p_val)
mag <- ceiling(seq_len(p_val) / 2)
sgn * mag
}
findStartingValues <- function(x, y) {
beta <- Variable(ncol(x))
eta <- x %*% beta
g_y <- (y + 0.5) / 2
logit_g_y <- log(g_y / (1 - g_y))
objective <- Minimize(sum_squares(logit_g_y - eta))
problem <- Problem(objective)
result <- solve(problem)
as.numeric(result$getValue(beta))
}
model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
rho_val <- rho_vals[j]
Sigma <- sigma.rho(rho_val, p_val)
n_val_large <- n_val * 10
X_large <- mvrnorm(n_val_large, mu = rep(mu_val, p_val), Sigma = Sigma)
X_large_intercept <- cbind(1, X_large)
beta_true <- theta_func(p_val + 1)
mu_y_large <- as.vector(1 / (1 + exp(-X_large_intercept %*% beta_true)))
y_large <- rbinom(n = n_val_large, size = 1, prob = mu_y_large)
y_zero_indices <- which(y_large == 0)[1:round(n_val * target_proportion)]
y_one_indices <- which(y_large == 1)[1:(n_val - round(n_val * target_proportion))]
final_indices <- c(y_zero_indices, y_one_indices)
X_final <- X_large_intercept[final_indices, ]
y_final <- y_large[final_indices]
starting_values <- findStartingValues(X_final, y_final)
fit_ols <- glm.fit2(X_final, y_final, start = starting_values,
control = control_list, family = family_type)
l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")
for (m in model_names[-1]) {
fit <- savvy_glm.fit2(X_final, y_final, model_class = m,
start = starting_values, control = control_list, family = family_type)
l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
}
}
A summary table here is produced to compare these distances across different values.
l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (Imbalanced LR)")
rho=-0.75 | rho=-0.5 | rho=0 | rho=0.5 | rho=0.75 | |
---|---|---|---|---|---|
OLS | 43.8576 | 76.1059 | 149.6920 | 148.9224 | 416.6273 |
SR | 43.4368 | 75.9847 | 149.3453 | 148.8895 | 416.8925 |
GSR | 45.6810 | 75.6519 | 149.7902 | 149.3444 | 418.4628 |
St | 45.5025 | 75.1966 | 149.0896 | 148.5704 | 415.2965 |
DSh | 44.0459 | 74.7376 | 148.6800 | 148.5846 | 417.6846 |
Sh | 26.6729 | 19.8582 | 9.2992 | 12.6352 | 65.0610 |
Poisson GLMs
Log Link Function
For Poisson GLMs with the log link, the mean response is modeled as
with
.
In this simulation, covariates are generated from a multivariate normal
distribution with a Toeplitz covariance matrix. The responses are
simulated from a Poisson distribution with rate
.
We fit the models using the standard GLM (via glm.fit2
) and
the shrinkage-based GLM (via savvy_glm.fit2
in
savvyGLM
package).
# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)
set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- poisson(link = "log")
sigma.rho <- function(rho_val, p_val) {
rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}
theta_func <- function(p_val) {
base_increment <- 0.1
growth_rate <- 0.95
betas <- base_increment * (growth_rate ^ seq(from = 0, length.out = ceiling(p_val / 2)))
betas <- rep(betas, each = 2)[1:p_val]
signs <- rep(c(1, -1), length.out = p_val)
betas <- betas * signs
return(betas)
}
findStartingValues <- function(x, y) {
beta <- Variable(ncol(x))
eta <- x %*% beta
objective <- Minimize(sum_squares(log(y + 0.1) - eta))
problem <- Problem(objective)
result <- solve(problem)
starting_values <- as.numeric(result$getValue(beta))
return(starting_values)
}
model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
rho_val <- rho_vals[j]
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_intercept <- cbind(1, X)
beta_true <- theta_func(p_val + 1)
mu_y <- as.vector(exp(X_intercept %*% beta_true))
y <- rpois(n_val, lambda = mu_y)
starting_values <- findStartingValues(X_intercept, y)
fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
control = control_list, family = family_type)
l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")
for (m in model_names[-1]) {
fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
start = starting_values, control = control_list, family = family_type)
l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
}
}
A summary table here is produced to compare these distances across different values.
l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (log link for Poisson GLM)")
rho=-0.75 | rho=-0.5 | rho=0 | rho=0.5 | rho=0.75 | |
---|---|---|---|---|---|
OLS | 0.2847 | 0.2477 | 0.2115 | 0.3973 | 0.3676 |
SR | 0.2728 | 0.2142 | 0.1987 | 0.3986 | 0.3677 |
GSR | 0.1981 | 0.1741 | 0.1870 | 0.3206 | 0.3010 |
St | 0.2473 | 0.2005 | 0.1855 | 0.3273 | 0.2855 |
DSh | 0.2718 | 0.2473 | 0.2206 | 0.3546 | 0.3501 |
Sh | 0.2750 | 0.2437 | 0.2095 | 0.3914 | 0.3579 |
sqrt Link Function
For the Poisson GLM with the sqrt link, the mean response is defined
as
with
.
The simulation procedure is similar to the log link case; however, the
responses are generated with Poission distribution with
.
To address potential numerical issues (such as non-positive responses).
Models are then fit using both the standard GLM (via
glm.fit2
) and the shrinkage-based GLM (via
savvy_glm.fit2
in savvyGLM
package).
# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)
set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- poisson(link = "sqrt")
sigma.rho <- function(rho_val, p_val) {
rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}
theta_func <- function(p_val) {
sgn <- rep(c(1, -1), length.out = p_val)
mag <- ceiling(seq_len(p_val) / 2)
sgn * mag
}
findStartingValues <- function(x, y, epsilon = 1e-6) {
beta <- Variable(ncol(x))
eta <- x %*% beta
objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
constraints <- list(eta >= epsilon)
problem <- Problem(objective, constraints)
result <- solve(problem)
starting_values <- as.numeric(result$getValue(beta))
return(starting_values)
}
model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
rho_val <- rho_vals[j]
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_intercept <- cbind(1, X)
beta_true <- theta_func(p_val + 1)
mu_y <- as.vector((X_intercept %*% beta_true)^2)
y <- rpois(n_val, lambda = mu_y)
starting_values <- findStartingValues(X_intercept, y)
fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
control = control_list, family = family_type)
l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")
for (m in model_names[-1]) {
fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
start = starting_values, control = control_list, family = family_type)
l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
}
}
A summary table here is produced to compare these distances across different values.
l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (sqrt link for Poisson GLM)")
rho=-0.75 | rho=-0.5 | rho=0 | rho=0.5 | rho=0.75 | |
---|---|---|---|---|---|
OLS | 105.6656 | 82.8127 | 57.8924 | 45.0648 | 40.1208 |
SR | 107.3804 | 84.1176 | 58.0029 | 45.0435 | 40.0857 |
GSR | 103.4257 | 81.8108 | 57.1888 | 44.9339 | 40.3197 |
St | 99.2810 | 80.5491 | 57.3508 | 44.6846 | 39.9475 |
DSh | 104.1294 | 82.4709 | 57.5069 | 45.2324 | 40.5008 |
Sh | 105.6656 | 82.8127 | 57.8924 | 45.0648 | 40.1208 |
Gamma GLMs
log Link Function
In Gamma GLMs with the log link, the mean response is given by
with
.
Covariates are generated as before, and responses are drawn from a Gamma
distribution with shape
.
A retry mechanism is used to ensure that all simulated responses are
positive, which is required by the Gamma model. Models are fit using
both the standard GLM (via glm.fit2
) and the
shrinkage-based GLM (via savvy_glm.fit2
in
savvyGLM
package).
# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)
set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "log")
sigma.rho <- function(rho_val, p_val) {
rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}
theta_func <- function(p_val) {
base_increment <- 0.1
growth_rate <- 0.95
betas <- base_increment * (growth_rate ^ seq(from = 0, length.out = ceiling(p_val / 2)))
betas <- rep(betas, each = 2)[1:p_val]
signs <- rep(c(1, -1), length.out = p_val)
betas <- betas * signs
return(betas)
}
findStartingValues <- function(x, y) {
beta <- Variable(ncol(x))
eta <- x %*% beta
objective <- Minimize(sum_squares(log(y + 0.1) - eta))
problem <- Problem(objective)
result <- solve(problem)
starting_values <- as.numeric(result$getValue(beta))
return(starting_values)
}
model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
rho_val <- rho_vals[j]
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_intercept <- cbind(1, X)
beta_true <- theta_func(p_val + 1)
mu_y <- as.vector(exp(X_intercept %*% beta_true))
y <- rgamma(n_val, shape = mu_y, scale = 1)
retry_count <- 0
while(any(y <= 0) && retry_count < 100) {
bad_idx <- which(y <= 0)
y[bad_idx] <- rgamma(length(bad_idx), shape = mu_y[bad_idx], scale = 1)
retry_count <- retry_count + 1
}
starting_values <- findStartingValues(X_intercept, y)
fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
control = control_list, family = family_type)
l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")
for (m in model_names[-1]) {
fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
start = starting_values, control = control_list, family = family_type)
l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
}
}
A summary table here is produced to compare these distances across different values.
l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (log link for Gamma GLM)")
rho=-0.75 | rho=-0.5 | rho=0 | rho=0.5 | rho=0.75 | |
---|---|---|---|---|---|
OLS | 0.4254 | 0.2821 | 0.1926 | 0.3676 | 0.4349 |
SR | 0.3941 | 0.2812 | 0.1919 | 0.3652 | 0.4345 |
GSR | 0.3001 | 0.2103 | 0.1742 | 0.3101 | 0.3238 |
St | 0.2670 | 0.2179 | 0.1627 | 0.2818 | 0.3337 |
DSh | 0.3171 | 0.2600 | 0.2052 | 0.3366 | 0.3509 |
Sh | 0.4102 | 0.2777 | 0.1894 | 0.3605 | 0.4205 |
sqrt Link Function
For Gamma GLMs with the sqrt link, the mean response is modeled as
with
.
Covariates are generated as before, and responses are drawn from a Gamma
distribution with shape
,
including a retry mechanism to ensure all response values are positive.
Fitting the models using both the standard GLM (via
glm.fit2
) and the shrinkage-based GLM (via
savvy_glm.fit2
in savvyGLM
package).
# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)
set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "sqrt")
sigma.rho <- function(rho_val, p_val) {
rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}
theta_func <- function(p_val) {
sgn <- rep(c(1, -1), length.out = p_val)
mag <- ceiling(seq_len(p_val) / 2)
sgn * mag
}
findStartingValues <- function(x, y, epsilon = 1e-6) {
beta <- Variable(ncol(x))
eta <- x %*% beta
objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
constraints <- list(eta >= epsilon)
problem <- Problem(objective, constraints)
result <- solve(problem)
starting_values <- as.numeric(result$getValue(beta))
return(starting_values)
}
model_names <- c("OLS", "SR", "GSR", "St", "DSh", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
rho_val <- rho_vals[j]
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_intercept <- cbind(1, X)
beta_true <- theta_func(p_val + 1)
mu_y <- as.vector((X_intercept %*% beta_true)^2)
y <- rgamma(n_val, shape = mu_y, scale = 1)
retry_count <- 0
while(any(y <= 0) && retry_count < 100) {
bad_idx <- which(y <= 0)
y[bad_idx] <- rgamma(length(bad_idx), shape = mu_y[bad_idx], scale = 1)
retry_count <- retry_count + 1
}
starting_values <- findStartingValues(X_intercept, y)
fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
control = control_list, family = family_type)
l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")
for (m in model_names[-1]) {
fit <- savvy_glm.fit2(X_intercept, y, model_class = m,
start = starting_values, control = control_list, family = family_type)
l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
}
}
A summary table here is produced to compare these distances across different values.
l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (sqrt link for Gamma GLM)")
rho=-0.75 | rho=-0.5 | rho=0 | rho=0.5 | rho=0.75 | |
---|---|---|---|---|---|
OLS | 107.8225 | 81.8266 | 56.4835 | 45.4632 | 40.4655 |
SR | 107.8121 | 81.8224 | 56.4548 | 45.4602 | 40.4746 |
GSR | 104.3230 | 80.3020 | 55.5819 | 45.3778 | 40.4913 |
St | 102.0399 | 79.3341 | 55.8032 | 45.0603 | 40.2149 |
DSh | 105.1456 | 81.0958 | 55.9391 | 45.8814 | 40.3181 |
Sh | 107.7353 | 81.7724 | 56.4640 | 45.4739 | 40.4802 |
Real Data Analysis
In this section, we evaluate the performance of Gamma GLM estimation methods using real flood insurance data from three states: Florida (FL), Texas (TX), and Louisiana (LA). The dataset covers the period 2014 to 2023 and includes claims from the National Flood Insurance Programme (NFIP). For each state (e.g., Florida (FL) and Louisiana (LA)), the data are no provided here, but you can download from NFIP and follow the preprocess steps provided in the paper.
For each year, the dataset is split into a 70% training set and a 30%
test set. Models are fitted using the standard GLM (via
glm.fit2
) and several shrinkage-based GLM (via
savvy_glm.fit2
with model classes such as SR, GSR, St, DSh,
and optionally Sh) using two different link functions:
For Gamma GLMs with the log link, the mean response is modeled as .
For Gamma GLMs with the sqrt link, the mean response is modeled as .
For each replication, the MSE on the test set is computed. For each shrinkage-based GLM, we calculate the ratio
A ratio greater than one indicates that the shrinkage-based GLM achieved a lower MSE (i.e., superior performance) compared to the standard update. This process is repeated multiple times (e.g., replications per year), and the average ratio is computed for each model and each year.
Here, we just provide two cases and for a detailed account of the full data processing and evaluation procedures, please refer to the main paper.
FL dataset with log Link Function
# Load required packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(caret)
library(knitr)
set.seed(1234)
years <- 2014:2015
model_names <- c("OLS", "SR", "GSR", "St", "DSh")
N <- 10
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "log")
Evaluation_results <- matrix(NA, nrow = length(model_names), ncol = length(years),
dimnames = list(model_names, years))
findStartingValues <- function(x, y) {
beta <- Variable(ncol(x))
eta <- x %*% beta
objective <- Minimize(sum_squares(log(y + 0.1) - eta))
problem <- Problem(objective)
result <- solve(problem)
starting_values <- as.numeric(result$getValue(beta))
return(starting_values)
}
calculate_mse <- function(true_values, predicted_values) {
mean((true_values - predicted_values)^2)
}
for (yr in years) {
cat("Processing year:", yr, "\n")
filename <- sprintf("FL_data_%d.csv", yr)
data_year <- read.csv(filename, header = TRUE, stringsAsFactors = FALSE)
ratio_mat <- matrix(NA, nrow = N, ncol = length(model_names)-1)
colnames(ratio_mat) <- model_names[-1]
for (i in 1:N) {
set.seed(yr * 1000 + i)
train_index <- createDataPartition(data_year[, 1], p = 0.7, list = FALSE)
train_data <- data_year[train_index, ]
test_data <- data_year[-train_index, ]
X_train <- as.matrix(train_data[, -1])
y_train <- train_data[, 1]
X_test <- as.matrix(test_data[, -1])
y_test <- test_data[, 1]
X_train_int <- cbind(1, X_train)
X_test_int <- cbind(1, X_test)
starting_values <- findStartingValues(X_train_int, y_train)
model_glm2 <- glm.fit2(X_train_int, y_train, start = starting_values,
control = control_list, family = family_type)
y_pred_glm2 <- exp(X_test_int %*% model_glm2$coefficients)
mse_ols <- calculate_mse(y_test, y_pred_glm2)
for (m in model_names[-1]) {
model_savvy <- savvy_glm.fit2(X_train_int, y_train, model_class = m,
start = starting_values, control = control_list, family = family_type)
y_pred <- exp(X_test_int %*% model_savvy$coefficients)
mse_savvy <- calculate_mse(y_test, y_pred)
ratio_mat[i, m] <- mse_ols / mse_savvy
}
}
avg_ratios <- c(1, colMeans(ratio_mat, na.rm = TRUE))
Evaluation_results[, as.character(yr)] <- avg_ratios
}
Evaluation_results_df <- as.data.frame(Evaluation_results)
kable(Evaluation_results_df, digits = 4,
caption = "Average MSE Ratio (OLS / Shrinkage) for Each Year (Gamma GLM with Log Link)")
LA dataset with sqrt Link Function
# Load required packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(caret)
library(knitr)
set.seed(1234)
years <- 2014:2023
model_names <- c("OLS", "SR", "GSR", "St", "DSh")
N <- 100
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "sqrt")
Evaluation_results <- matrix(NA, nrow = length(model_names), ncol = length(years),
dimnames = list(model_names, years))
findStartingValues <- function(x, y, epsilon = 1e-6) {
beta <- Variable(ncol(x))
eta <- x %*% beta
objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
constraints <- list(eta >= epsilon)
problem <- Problem(objective, constraints)
result <- solve(problem)
starting_values <- as.numeric(result$getValue(beta))
return(starting_values)
}
calculate_mse <- function(true_values, predicted_values) {
mean((true_values - predicted_values)^2)
}
for (yr in years) {
cat("Processing year:", yr, "\n")
filename <- sprintf("LA_data_%d.csv", yr)
data_year <- read.csv(filename, header = TRUE, stringsAsFactors = FALSE)
ratio_mat <- matrix(NA, nrow = N, ncol = length(model_names)-1)
colnames(ratio_mat) <- model_names[-1]
for (i in 1:N) {
set.seed(yr * 1000 + i)
train_index <- createDataPartition(data_year[, 1], p = 0.7, list = FALSE)
train_data <- data_year[train_index, ]
test_data <- data_year[-train_index, ]
X_train <- as.matrix(train_data[, -1])
y_train <- train_data[, 1]
X_test <- as.matrix(test_data[, -1])
y_test <- test_data[, 1]
X_train_int <- cbind(1, X_train)
X_test_int <- cbind(1, X_test)
starting_values <- findStartingValues(X_train_int, y_train)
model_glm2 <- glm.fit2(X_train_int, y_train, start = starting_values,
control = control_list, family = family_type)
y_pred_glm2 <- (X_test_int %*% model_glm2$coefficients)^2
mse_ols <- calculate_mse(y_test, y_pred_glm2)
for (m in model_names[-1]) {
model_savvy <- savvy_glm.fit2(X_train_int, y_train, model_class = m,
start = starting_values, control = control_list, family = family_type)
y_pred <- exp(X_test_int %*% model_savvy$coefficients)
mse_savvy <- calculate_mse(y_test, y_pred)
ratio_mat[i, m] <- mse_ols / mse_savvy
}
}
avg_ratios <- c(1, colMeans(ratio_mat, na.rm = TRUE))
Evaluation_results[, as.character(yr)] <- avg_ratios
}
Evaluation_results_df <- as.data.frame(Evaluation_results)
kable(Evaluation_results_df, digits = 4,
caption = "Average MSE Ratio (OLS / Shrinkage) for Each Year (Gamma GLM with Log Link)")