Skip to contents

Introduction

savvySh is an R package that implements a suite of shrinkage estimators for multivariate linear regression. The motivation for shrinkage estimation originates from Steinโ€™s paradox, which shows that it is possible to improve on the classical Ordinary Least Squares (OLS) estimator when estimating multiple parameters simultaneously. This improvement is achieved by introducing structured bias that reduces variance, yielding estimators with smaller overall Mean Squared Error (MSE).

The savvySh package provides a unified interface to four shrinkage strategies, each designed to improve prediction accuracy and estimation stability in finite samples. These methods do not require cross-validation or tuning parameter selection, with the exception of the Shrinkage Ridge Regression (SRR), for which the shrinkage intensity is chosen by minimizing an explicit MSE criterion.

The package is particularly suited for high-dimensional settings or cases where the design matrix exhibits near multicollinearity. Empirical results in both simulation and real data settings show that at least one of the shrinkage estimators consistently performs as well as or better than OLS, and in many cases leads to substantial improvements, especially when applied to generalized linear models.

For more details, please see:

Asimit, V., Cidota, M. A., Chen, Z., & Asimit, J. (2025). Slab and Shrinkage Linear Regression Estimation

Main features: savvySh provides four classes of shrinkage estimators for linear regression:

  • Multiplicative Shrinkage: Modifies OLS estimates by applying data-driven multiplicative factors. This includes the Stein estimator (St), Diagonal Shrinkage (DSh), and the more general Shrinkage estimator (Sh) that solves a Sylvester equation.

  • Slab Regression: Adds a quadratic penalty to shrink the coefficients in specific directions. The Simple Slab Regression (SR) penalizes along a fixed direction, and the Generalized Slab Regression (GSR) allows penalties along multiple directions, typically aligned with the data structure.

  • Linear Shrinkage: Combines the OLS estimator (through the origin) with a target that assumes uncorrelated covariates. This method is designed for standardized data and avoids estimating the intercept.

  • Shrinkage Ridge Regression: Improves on Ridge Regression (RR) by shrinking toward a diagonal matrix with equal entries, balancing the sample structure with a stable target.

Inputs: The primary inputs to savvySh are similar to those of a regression function:

  • x: The design matrix (predictor matrix) of dimension nร—pn\times p.

  • y: The response vector of length nn.

  • model_class: The shrinkage method to use (one of "Multiplicative", "Slab", "Linear", or "ShrinkageRR").

  • Additional method-specific parameters (e.g., include Shor not), with sensible defaults or automatic selection if not provided.

Output: The output of savvySh is a list containing several elements:

  • call: the original function call.

  • model: The data frame of y and x used in the analysis.

  • optimal_lambda: the penalty parameter used (if applicable).

  • model_class: the shrinkage method used (e.g.,"Multiplicative", "Slab", "Linear", or "ShrinkageRR").

  • coefficients: A list of estimated coefficients for each applicable estimator for chosen model_class.

  • Additional method-specific diagnostics (e.g., fitted values and predicted MSE).

This vignette provides an overview of the methods implemented in savvySh and demonstrates how to use them on simulated data and real data. We begin with a theoretical overview of each shrinkage class, then walk through simulation examples that illustrate how to apply savvySh for each method.


Theoretical Overview

The savvySh function encompasses four shrinkage strategies for linear regression. All these methods aim to improve predictive accuracy and interpretability by trading a small amount of bias for a larger reduction in variance. Below we briefly summarize the theoretical background of each method.

Consider a response vector ๐ฒโˆˆโ„œn\mathbf{y} \in \Re^n and a predictor matrix ๐—โˆˆโ„œnร—(p+1)\mathbf{X} \in \Re^{n \times (p+1)}. Let ๐›ƒฬ‚OLS\widehat{\boldsymbol{\beta}}^{\text{OLS}} be the OLS estimator of the coefficient vector, and ฮฃ=๐—โŠค๐—\Sigma = \mathbf{X}^\top \mathbf{X} the Gram matrix of the design.

Multiplicative Shrinkage

This class of estimators modifies the OLS solution by multiplying it with a matrix ๐ƒ\mathbf{D}, yielding the form ๐›ƒฬ‚=๐ƒ๐›ƒฬ‚OLS\widehat{\boldsymbol{\beta}} = \mathbf{D} \widehat{\boldsymbol{\beta}}^{OLS}. The goal is to choose ๐ƒ\mathbf{D} to minimize the MSE of the resulting estimator. While earlier work such as Hocking et al.ย (1976) focused on the case where the design matrix is in canonical form (orthonormal), our framework generalizes these results to arbitrary design matrices without requiring that assumption. Common choices for ๐ƒ\mathbf{D} include:

  1. Stein Estimator (St): The St estimator shrinks all coefficients uniformly by a single factor a*ฬ‚\widehat{a^*}:

๐›ƒฬ‚St=a*ฬ‚๐›ƒฬ‚OLS,wherea*ฬ‚=(๐›ƒฬ‚OLS)T๐›ƒฬ‚OLS(๐›ƒฬ‚OLS)T๐›ƒฬ‚OLS+MSE(๐›ƒฬ‚OLS)ฬ‚. \widehat{\boldsymbol{\beta}}^{\text{St}} = \widehat{a^*} \widehat{\boldsymbol{\beta}}^{\text{OLS}}, \quad \text{where} \quad \widehat{a^*} = \frac{\left(\widehat{\boldsymbol\beta}^{OLS}\right)^T\widehat{\boldsymbol\beta}^{OLS}}{\left(\widehat{\boldsymbol\beta}^{OLS}\right)^T\widehat{\boldsymbol\beta}^{OLS} + \widehat{\text{MSE}\left(\widehat{\boldsymbol{\beta}}^{\text{OLS}}\right)}}.

This corresponds to ๐ƒ=a๐ˆp+1\mathbf{D} = a \mathbf{I}_{p+1}. It reduces variance by scaling all coefficients equally.

  1. Diagonal Shrinkage Estimator (DSh): Extending the St estimator, DSh applies a separate shrinkage factor bk*ฬ‚\widehat{b_k^*} to each coefficient:

๐›ƒฬ‚DSh=diag(๐›*ฬ‚),wherebk*ฬ‚=(ฮฒฬ‚kOLS)2(ฮฒฬ‚kOLS)2+ฯƒ2ฬ‚ฯƒk. \widehat{\boldsymbol{\beta}}^{\text{DSh}} = \text{diag}\left(\widehat{\mathbf{b^*}}\right), \quad \text{where} \quad \widehat{b_k^*} = \frac{\left(\widehat{\beta}_k^{\text{OLS}}\right)^2}{\left(\widehat{\beta}_k^{\text{OLS}}\right)^2 + \widehat{\sigma^2} \sigma_k}.

This corresponds to ๐ƒ=diag(๐›)\mathbf{D} = \text{diag}\left(\mathbf{b}\right) and ฯƒk\sigma_k is the kthk^{\text{th}} diagonal entry of ฮฃโˆ’1\Sigma^{-1} and ฯƒ2ฬ‚\widehat{\sigma^2} is the estimated residual variance.

  1. Shrinkage Estimator (Sh): The Sh estimator is the most general form in the multiplicative shrinkage family. It applies a full (non-diagonal) matrix ๐‚*ฬ‚\widehat{\mathbf{C}^*} to the OLS estimate and requires solving a Sylvester equation:

๐›ƒฬ‚Sh=๐‚*ฬ‚๐›ƒฬ‚OLS,where ๐‚*ฬ‚ solves the Sylvester equation:ฮฃโˆ’1๐‚+๐‚๐›ƒฬ‚OLS(๐›ƒฬ‚OLS)T=๐›ƒฬ‚OLS(๐›ƒฬ‚OLS)T. \widehat{\boldsymbol{\beta}}^{\text{Sh}} = \widehat{\mathbf{C}^*} \widehat{\boldsymbol{\beta}}^{\text{OLS}}, \quad \text{where } \widehat{\mathbf{C}^*} \text{ solves the Sylvester equation:} \ \Sigma^{-1} \mathbf{C} + \mathbf{C} \widehat{\boldsymbol{\beta}}^{\text{OLS}} \left(\widehat{\boldsymbol{\beta}}^{\text{OLS}}\right)^T = \widehat{\boldsymbol{\beta}}^{\text{OLS}} \left(\widehat{\boldsymbol{\beta}}^{\text{OLS}}\right)^T.

This corresponds to ๐ƒ=diag(๐‚)\mathbf{D} = \text{diag}\left(\mathbf{C}\right). Unlike St and DSh, the Sh estimator allows interactions across coefficients through off-diagonal entries in ๐‚*ฬ‚\widehat{\mathbf{C}^*}, capturing richer shrinkage structures.

Slab Regression

Slab regression introduces a structured quadratic penalty that shrinks the regression coefficients in specified directions. It is closely related to the Generalized LASSO estimator introduced by Tibshirani and Taylor (2011), but with key differences. Unlike the Generalized LASSO, slab regression does not aim for variable selection or sparsity, and it avoids cross-validation by selecting shrinkage levels through theoretical MSE criteria. These penalties are based on projection constraints that can be fixed (e.g., constant direction ๐ฎ=๐Ÿ\mathbf{u} = \mathbf{1}) or structured (e.g., eigenvectors of ฮฃ\Sigma). This leads to two main estimators.

  1. (Simple) Slab Regression (SR): The SR estimator applies a single penalty along a fixed direction vector ๐ฎโˆˆโ„œp+1\mathbf{u} \in \Re^{p+1}. A common choice is ๐ฎ=๐Ÿ\mathbf{u} = \mathbf{1}. The estimator is given by:

๐›ƒฬ‚SR(ฮผ;๐ฎ):=argmin๐›ƒโˆˆโ„œp+1(โˆ‘i=1n(yiโˆ’๐ฑiโŠค๐›ƒ)2+ฮผ(๐ฎโŠค๐›ƒ)2), \widehat{\boldsymbol{\beta}}^{SR}\left(\mu;\textbf{u}\right) := \arg\min_{\boldsymbol{\beta}\in\Re^{p+1}} \left( \sum_{i=1}^n \big(y_i - \mathbf{x}_i^\top \boldsymbol{\beta}\big)^2 + \mu \big(\mathbf{u}^\top \boldsymbol{\beta}\big)^2 \right),

where ฮผโ‰ฅ0\mu \geq 0 is the penalty parameter controls the amount of shrinkage along ๐ฎ\mathbf{u} and has a closed-form, chosen analytically to minimize MSE under the slab constraint (๐›ƒโŠค๐ฎ)2(\boldsymbol{\beta}^\top \mathbf{u})^2.

  1. Generalized Slab Regression (GSR): The GSR estimator extends SR by allowing shrinkage in multiple directions. These directions are typically chosen as eigenvectors of ฮฃ\Sigma. The estimator is:

๐›ƒฬ‚GSR(๐›):=argmin๐›ƒโˆˆโ„œp+1(โˆ‘i=1n(yiโˆ’๐ฑiโŠค๐›ƒ)2+โˆ‘lโˆˆโ„’ฮผl(๐ฎlโŠค๐›ƒ)2), \widehat{\boldsymbol{\beta}}^{GSR}(\boldsymbol{\mu}) := \arg\min_{\boldsymbol{\beta} \in \Re^{p+1}} \left(\sum_{i=1}^n \big(y_i - \mathbf{x}_i^\top \boldsymbol{\beta}\big)^2 + \sum_{l \in \mathcal{L}} \mu_l \big(\mathbf{u}_l^\top \boldsymbol{\beta}\big)^2 \right),

where each ฮผlโ‰ฅ0\mu_l \geq 0 controls the strength of shrinkage along ๐ฎl\mathbf{u}_l, and the set โ„’\mathcal{L} indexes the selected directions. A special case arises when the ๐ฎl\mathbf{u}_l form the standard basis vectors, recovering the classical Generalized RR of Hoerl and Kennard (1970).

Linear Shrinkage

The LSh estimator combines the OLS estimator (computed through the origin) with a target estimator that assumes uncorrelated covariates (ฮฃฬƒ=diag(ฮฃ)\widetilde{\Sigma} = \mathrm{diag}(\Sigma)). This target estimator is given by ๐›ƒฬ‚ind=ฮฃฬƒโˆ’1๐—T๐ฒ\widehat{\boldsymbol{\beta}}^{ind} = \widetilde{\Sigma}^{-1}\mathbf{X}^T\mathbf{y}. The method assumes that the data are standardized (zero mean and unit variance) so that the intercept is not needed. The LSh estimator is defined as:

๐›ƒฬ‚ind(ฯ):=(1โˆ’ฯ)๐›ƒฬ‚ฬ‚OLS+ฯ๐›ƒฬ‚ind=(ฯฮฃฬƒโˆ’1ฮฃ+(1โˆ’ฯ)๐ˆp)๐›ƒฬ‚ฬ‚OLS:=ฮฃ(ฯ)๐›ƒฬ‚ฬ‚OLS, \widehat{\boldsymbol{\beta}}^{ind}(\rho):=(1-\rho)\widehat{\widehat{\boldsymbol{\beta}\,}}^{OLS}+\rho \widehat{\boldsymbol{\beta}}^{ind}=\left(\rho\widetilde{\Sigma}^{-1}\Sigma + (1-\rho)\textbf{I}_p\right)\widehat{\widehat{\boldsymbol{\beta}\,}}^{OLS}:=\Sigma(\rho)\widehat{\widehat{\boldsymbol{\beta}\,}}^{OLS},

where ๐›ƒฬ‚ฬ‚OLS\widehat{\widehat{\boldsymbol{\beta}}}^{OLS} is the OLS estimator without intercept, and ฯ\rho is the shrinkage intensity chosen to improve stability and reduce MSE.

Shrinkage Ridge Regression

The SRR estimator improves upon standard RR by shrinking the ฮฃ\Sigma toward a diagonal matrix with equal entries. Specifically, it shrinks toward v๐ˆp+1v \mathbf{I}_{p+1}, where v=1p+1Tr(ฮฃ)v = \frac{1}{p+1} \mathrm{Tr}(\Sigma) is the average variance across all covariates and the intercept. The SRR estimator is given by:

๐›ƒฬ‚SRR(ฯ)=(ฮฃ*(ฯ))โˆ’1๐—T๐ฒwithฮฃ*(ฯ)=(1โˆ’ฯ)ฮฃ+ฯv๐ˆp+1. \widehat{\boldsymbol{\beta}}^{SRR}(\rho)=\big(\Sigma^*(\rho)\big)^{-1}\textbf{X}^T\textbf{y} \quad \text{with} \quad \Sigma^*(\rho)=(1-\rho)\Sigma+\rho v \textbf{I}_{p+1}.

Unlike the approach in Ledoit and Wolf (2004), which minimizes the MSE of the covariance matrix, SRR chooses the optimal ฯ*\rho^* by directly minimizing the MSE of the coefficient estimator. Although ฯ*\rho^* does not have a closed form, it can be selected numerically. This approach is especially useful when the design matrix has a low-rank or unstable covariance structure, as often encountered in high-dimensional or collinear settings.

Summary of Shrinkage Estimators

Method Shrinkage Direction Shrinkage Form Assumes Standardized Data? Tuning or Optimizing Required? Handles Multicollinearity?
OLS None No shrinkage No No No
RR Toward origin (๐—โŠค๐—+ฮปI)โˆ’1๐—โŠค๐ฒ\left(\mathbf{X}^\top\mathbf{X} + \lambda I\right)^{-1} \mathbf{X}^\top \mathbf{y} No Yes (cross-validation) Yes
St Global (scalar) a*ฬ‚โ‹…ฮฒฬ‚OLS\widehat{a^*} \cdot \widehat{\beta}^{\text{OLS}} No No No
DSh Coordinate-wise (diagonal) diag(๐›*ฬ‚)โ‹…ฮฒฬ‚OLS\text{diag}\left(\widehat{\mathbf{b^*}}\right) \cdot \widehat{\beta}^{\text{OLS}} No No No
Sh Matrix shrinkage (non-diagonal) ๐‚*ฬ‚โ‹…ฮฒฬ‚OLS\widehat{\mathbf{C}^*} \cdot \widehat{\beta}^{\text{OLS}} No No No
SR Fixed direction (e.g., ๐Ÿ\mathbf{1}) Penalized projection No No No
GSR Multiple directions (eigenvectors) Penalized projection No No No
LSh Toward diagonal target Convex combination Yes No No
SRR Toward scaled identity Modified Ridge No Yes (numerical minimization) Yes

Simulation Examples

This section presents a set of simulation studies that demonstrate the performance of shrinkage estimators implemented in savvySh. We compare them against OLS using synthetic datasets. The performance metric is the L2L_2 distance between the estimated coefficients and the true parameter vector. Lower L2L_2 values indicate better recovery of the true coefficients.

We explore several generative scenarios based on the design matrix and error distribution. The structure of the design matrix includes different forms of correlation and dependence. Each simulation is followed by a table comparing L2L_2 distances and estimated coefficients.

The following data-generating processes (DGPs) are considered:

  • Multivariate Gaussian distributed covariates with Toeplitz covariance matrix - N(๐›,๐šฟ(ฯ))N(\pmb{\mu}, \pmb{\Psi}(\rho)) where ๐šฟst(ฯ)=ฯ|sโˆ’t|\pmb{\Psi}_{st}(\rho) = \rho^{|s-t|} for all 1โ‰คs,tโ‰คp1 \leq s, t \leq p, Here, ๐›=(ฮผ1,ฮผ2,โ€ฆ,ฮผp)T\pmb{\mu} = (\mu_1, \mu_2, \ldots, \mu_p)^T is the mean vector, ๐šฟ(ฯ)\pmb{\Psi}(\rho) is the covariance matrix, and ฯ\rho represents the correlation coefficient that controls the dependence between covariates. The response is generated using normal noise.

  • Studentโ€™s tt distributed Multivariate Gaussian distributed covariates with Toeplitz covariance matrix - N(๐›,๐šฟ(ฯ))N(\pmb{\mu}, \pmb{\Psi}(\rho)) where ๐šฟst(ฯ)=ฯ|sโˆ’t|\pmb{\Psi}_{st}(\rho) = \rho^{|s-t|} for all 1โ‰คs,tโ‰คp1 \leq s, t \leq p. Here, ๐›=(ฮผ1,ฮผ2,โ€ฆ,ฮผp)T\pmb{\mu} = (\mu_1, \mu_2, \ldots, \mu_p)^T is the mean vector, ๐šฟ(ฯ)\pmb{\Psi}(\rho) is the covariance matrix, and ฯ\rho represents the correlation coefficient that controls the dependence between covariates. The response includes heavier-tailed variation.

  • Multivariate Gaussian Copula with Binomial marginal distributed covariates and Toeplitz covariance matrix - ๐™iโˆผN(๐ŸŽ,ฮจ(ฯ))\textbf{Z}_i\sim N(\textbf{0}, \Psi(\rho)) with Xik=Fโˆ’1(ฮฆ(Zik))X_{ik} = F^{-1}(\Phi(Z_{ik})), for 1โ‰คkโ‰คp1 \leq k \leq p where ฮฆ\Phi is the cumulative distribution function (CDF) of N(0,1)N(0,1), and Fโˆ’1F^{-1} is the inverse CDF of the binomial distribution.

  • Latent Space Features - Covariates are generated by combining a low-rank structure with random variations, Specifically, we simulate ๐—=๐€๐™+๐„\mathbf{X} = \mathbf{A} \mathbf{Z} + \mathbf{E}, where ๐€\mathbf{A} is an nร—fn \times f matrix of factor loadings with entries drawn independently from a standard normal distribution N(0,1)N(0,1), and ๐™\mathbf{Z} is an fร—pf \times p matrix of latent factors, also with entries drawn independently from N(0,1)N(0,1). The term ๐„\mathbf{E} is an nร—pn \times p matrix of independent Gaussian noise with variance 10โˆ’610^{-6}, i.e., N(0,10โˆ’6)N(0,10^{-6}).

In all four settings, the true regression coefficient vector alternates in sign with increasing magnitude, making it possible to assess the shrinkage effect across varying coefficient scales.

Note: Due to differences in underlying libraries and random number generation implementations (e.g., BLAS/LAPACK and RNG behavior), the data generated by functions like mvrnorm may differ slightly between Windows and macOS/Linux, even with the same seed.

Multiplicative Shrinkage and Slab Regression

This section shows how savvySh implements multiplicative shrinkage including St, DSh, and Sh; Slab Shrinkage including SR and GSR. Each method is compared to OLS.

Multivariate Gaussian Distribution

This example corresponds to the Multivariate Gaussian distributed covariates with Toeplitz covariance matrix setup described above. We simulate a design matrix from a multivariate normal distribution with a Toeplitz covariance matrix. The response is generated with Gaussian noise with variance ฯƒ2=25\sigma^2 = 25.

# Load packages
library(savvySh)
library(MASS)
library(knitr)

# Parameters
set.seed(123)
n_val <- 1000
p_val <- 10
rho_val <- 0.75
sigma_val <- 5
mu_val <- 0

# Correlation matrix
sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

# True beta
theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

# Simulate data
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_intercept <- cbind(1, X)
beta_true <- theta_func(p_val + 1)
y <- rnorm(n_val, mean = X_intercept %*% beta_true, sd = sigma_val)

# Fit models
ols_fit <- lm(y ~ X)
beta_ols <- coef(ols_fit)

multi_results <- savvySh(X, y, model_class = "Multiplicative", include_Sh = TRUE)
beta_St  <- coef(multi_results, "St")
beta_DSh <- coef(multi_results, "DSh")
beta_Sh  <- coef(multi_results, "Sh")

slab_results <- savvySh(X, y, model_class = "Slab")
beta_SR  <- coef(slab_results, "SR")
beta_GSR <- coef(slab_results, "GSR")

The tables below report the results of each estimator: the first table shows the L2L_2 distance between the estimated coefficient vector and the ground truth, while the second table displays the actual estimated coefficients (rounded) for each method alongside the true values.

# L2 comparison
l2_table <- data.frame(
  Method = c("OLS", "St", "DSh", "Sh", "SR", "GSR"),
  L2_Distance = c(
    sqrt(sum((beta_ols - beta_true)^2)),
    sqrt(sum((beta_St  - beta_true)^2)),
    sqrt(sum((beta_DSh - beta_true)^2)),
    sqrt(sum((beta_Sh  - beta_true)^2)),
    sqrt(sum((beta_SR  - beta_true)^2)),
    sqrt(sum((beta_GSR - beta_true)^2))
  )
)

kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients")
L2 Distance Between Estimated and True Coefficients
Method L2_Distance
OLS 0.9791
St 0.9292
DSh 0.9261
Sh 0.9787
SR 1.0092
GSR 0.9491

# Coefficient comparison table
coef_table <- data.frame(
  OLS = round(beta_ols, 3),
  St = round(beta_St, 3),
  DSh = round(beta_DSh, 3),
  Sh = round(beta_Sh, 3),
  SR = round(beta_SR, 3),
  GSR = round(beta_GSR, 3),
  True = round(beta_true, 3)
)

kable(coef_table, caption = "Estimated Coefficients by Method (rounded)")
Estimated Coefficients by Method (rounded)
OLS St DSh Sh SR GSR True
(Intercept) 1.135 1.129 1.113 1.135 0.751 1.150 1
X1 -1.148 -1.142 -1.103 -1.148 -1.382 -1.135 -1
X2 2.348 2.336 2.311 2.348 2.262 2.310 2
X3 -2.079 -2.068 -2.035 -2.079 -2.053 -2.061 -2
X4 3.120 3.103 3.090 3.120 3.014 3.144 3
X5 -2.923 -2.908 -2.890 -2.923 -2.958 -2.919 -3
X6 4.286 4.263 4.264 4.285 4.195 4.288 4
X7 -4.759 -4.734 -4.741 -4.759 -4.791 -4.763 -4
X8 5.196 5.168 5.178 5.195 5.171 5.201 5
X9 -5.120 -5.093 -5.103 -5.120 -5.204 -5.074 -5
X10 6.245 6.212 6.235 6.245 6.045 6.146 6

Studentโ€™s t Distribution

This example corresponds to the Studentโ€™s tt distributed Multivariate Gaussian distributed covariates with Toeplitz covariance matrix setup. We repeat the same design as above, but instead of Gaussian noise, we add a scaled tt-distributed noise with degrees of freedom ฮฝ=5024\nu = \frac{50}{24}.

# Load packages
library(savvySh)
library(MASS)
library(knitr)

# Parameters
set.seed(123)
n_val <- 1000
p_val <- 10
rho_val <- 0.75
df_val = 50/24 
mu_val <- 0

# Correlation matrix
sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

# True beta
theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

# Simulate data
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_intercept <- cbind(1, X)
beta_true <- theta_func(p_val + 1)
y <- as.vector(X_intercept %*% beta_true) + rt(n = n_val, df = df_val)

# Fit models
ols_fit <- lm(y ~ X)
beta_ols <- coef(ols_fit)

multi_results <- savvySh(X, y, model_class = "Multiplicative", include_Sh = TRUE)
beta_St  <- coef(multi_results, "St")
beta_DSh <- coef(multi_results, "DSh")
beta_Sh  <- coef(multi_results, "Sh")

slab_results <- savvySh(X, y, model_class = "Slab")
beta_SR  <- coef(slab_results, "SR")
beta_GSR <- coef(slab_results, "GSR")

The tables below report the results of each estimator: the first table shows the L2L_2 distance between the estimated coefficient vector and the ground truth, while the second table displays the actual estimated coefficients (rounded) for each method alongside the true values.

# L2 comparison
l2_table <- data.frame(
  Method = c("OLS", "St", "DSh", "Sh", "SR", "GSR"),
  L2_Distance = c(
    sqrt(sum((beta_ols - beta_true)^2)),
    sqrt(sum((beta_St  - beta_true)^2)),
    sqrt(sum((beta_DSh - beta_true)^2)),
    sqrt(sum((beta_Sh  - beta_true)^2)),
    sqrt(sum((beta_SR  - beta_true)^2)),
    sqrt(sum((beta_GSR - beta_true)^2))
  )
)

kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients")
L2 Distance Between Estimated and True Coefficients
Method L2_Distance
OLS 5.8495
St 5.8271
DSh 5.2781
Sh 5.8494
SR 5.9631
GSR 5.1581

# Coefficient comparison table
coef_table <- data.frame(
  OLS = round(beta_ols, 3),
  St = round(beta_St, 3),
  DSh = round(beta_DSh, 3),
  Sh = round(beta_Sh, 3),
  SR = round(beta_SR, 3),
  GSR = round(beta_GSR, 3),
  True = round(beta_true, 3)
)

kable(coef_table, caption = "Estimated Coefficients by Method (rounded)")
Estimated Coefficients by Method (rounded)
OLS St DSh Sh SR GSR True
(Intercept) -0.345 -0.252 -0.023 -0.345 -0.623 -0.368 1
X1 -0.115 -0.084 0.000 -0.115 -0.285 0.488 -1
X2 -0.811 -0.592 -0.082 -0.811 -0.873 -0.426 2
X3 1.835 1.339 0.653 1.835 1.853 0.068 -2
X4 3.308 2.413 2.118 3.308 3.232 3.587 3
X5 -5.577 -4.069 -4.621 -5.577 -5.602 -4.396 -3
X6 3.148 2.297 1.955 3.148 3.082 2.635 4
X7 -2.880 -2.101 -1.677 -2.879 -2.903 -1.918 -4
X8 5.121 3.736 4.178 5.121 5.103 3.825 5
X9 -5.497 -4.011 -4.614 -5.497 -5.558 -4.731 -5
X10 5.805 4.235 5.207 5.805 5.660 4.447 6

Gaussian Copula with Binomial Margins

This example corresponds to the Multivariate Gaussian Copula with Binomial marginal distributed covariates and Toeplitz covariance matrix setup. The covariates are transformed from a Gaussian copula to Binomial marginals using the inverse CDF transform. This simulates discrete predictor variables.

# Load packages
library(savvySh)
library(MASS)
library(knitr)

# Parameters
set.seed(123)
n_val <- 1000
p_val <- 10
rho_val <- 0.75
q_0 <- 0.01
sigma_val <- 5

# Correlation matrix
sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

sigma.temp <- sigma.rho(rho_val, p_val) 
Z <- mvrnorm(n_val, mu = rep(0, p_val), Sigma = sigma.temp) 
X <- apply(Z, 2, function(z_col) qbinom(pnorm(z_col), size = 2, prob = q_0))  
X_intercept <- cbind(1, X)
beta_true <- c(0, runif(p_val, 0.01, 0.3))
y <- rnorm(n_val, mean = as.vector(X_intercept %*% beta_true), sd = sigma_val)

# Fit models
ols_fit <- lm(y ~ X)
beta_ols <- coef(ols_fit)

multi_results <- savvySh(X, y, model_class = "Multiplicative", include_Sh = TRUE)
beta_St  <- coef(multi_results, "St")
beta_DSh <- coef(multi_results, "DSh")
beta_Sh  <- coef(multi_results, "Sh")

slab_results <- savvySh(X, y, model_class = "Slab")
beta_SR  <- coef(slab_results, "SR")
beta_GSR <- coef(slab_results, "GSR")

The tables below report the results of each estimator: the first table shows the L2L_2 distance between the estimated coefficient vector and the ground truth, while the second table displays the actual estimated coefficients (rounded) for each method alongside the true values.

# L2 comparison
l2_table <- data.frame(
  Method = c("OLS", "St", "DSh", "Sh", "SR", "GSR"),
  L2_Distance = c(
    sqrt(sum((beta_ols - beta_true)^2)),
    sqrt(sum((beta_St  - beta_true)^2)),
    sqrt(sum((beta_DSh - beta_true)^2)),
    sqrt(sum((beta_Sh  - beta_true)^2)),
    sqrt(sum((beta_SR  - beta_true)^2)),
    sqrt(sum((beta_GSR - beta_true)^2))
  )
)

kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients")
L2 Distance Between Estimated and True Coefficients
Method L2_Distance
OLS 3.0967
St 1.0756
DSh 1.8059
Sh 3.0724
SR 3.1182
GSR 1.6176

# Coefficient comparison table
coef_table <- data.frame(
  OLS = round(beta_ols, 3),
  St = round(beta_St, 3),
  DSh = round(beta_DSh, 3),
  Sh = round(beta_Sh, 3),
  SR = round(beta_SR, 3),
  GSR = round(beta_GSR, 3),
  True = round(beta_true, 3)
)

kable(coef_table, caption = "Estimated Coefficients by Method (rounded)")
Estimated Coefficients by Method (rounded)
OLS St DSh Sh SR GSR True
(Intercept) 0.122 0.042 0.043 0.123 0.133 0.048 0.000
X1 0.442 0.153 0.045 0.439 0.420 0.060 0.297
X2 -0.681 -0.235 -0.112 -0.674 -0.842 -0.352 0.098
X3 0.549 0.190 0.070 0.545 0.459 0.081 0.136
X4 0.302 0.104 0.010 0.307 0.225 0.462 0.057
X5 1.634 0.564 0.979 1.622 1.542 1.167 0.249
X6 -0.049 -0.017 0.000 -0.048 -0.153 0.030 0.070
X7 -0.278 -0.096 -0.014 -0.277 -0.308 -0.194 0.093
X8 -1.633 -0.564 -1.036 -1.623 -1.668 -0.784 0.168
X9 1.858 0.642 1.223 1.841 1.797 0.759 0.181
X10 -0.628 -0.217 -0.105 -0.620 -0.761 -0.007 0.138

Linear Shrinkage

This section describes the LSh estimator, which shrinks the OLS coefficients (through the origin) toward a target based on the diagonal of the ฮฃ\Sigma. The method assumes standardized data.

Multivariate Gaussian Distribution

This example corresponds to the Multivariate Gaussian distributed covariates with Toeplitz covariance matrix setup. We simulate from a Gaussian design with Toeplitz correlation, using additive Gaussian noise with variance ฯƒ2=25\sigma^2 = 25. Centering is applied before estimation.

# Load packages
library(savvySh)
library(MASS)
library(knitr)

# Parameters
set.seed(1)
n_val <- 1000
p_val <- 10
rho_val <- 0.75
sigma_val <- 5
mu_val <- 0

# Correlation matrix
sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

# True beta
theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

# Simulate data
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_centred <- scale(X, center = TRUE, scale = FALSE)
beta_true <- theta_func(p_val)
y <- rnorm(n_val, mean = X_centred %*% beta_true, sd = sigma_val)
y_centred <- scale(y, center = TRUE, scale = FALSE)

# Fit models
ols_fit <- lm(y_centred ~ X_centred-1)
beta_ols <- coef(ols_fit)

linear_results <- savvySh(X_centred, y_centred, model_class = "Linear")
beta_LSh <- coef(linear_results, "LSh")

The tables below report the results of each estimator: the first table shows the L2L_2 distance between the estimated coefficient vector and the ground truth, while the second table displays the actual estimated coefficients (rounded) for each method alongside the true values.


# L2 comparison
l2_table <- data.frame(
  Method = c("OLS", "LSh"),
  L2_Distance = c(
    sqrt(sum((beta_ols - beta_true)^2)),
    sqrt(sum((beta_LSh  - beta_true)^2))
  )
)

kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients")
L2 Distance Between Estimated and True Coefficients
Method L2_Distance
OLS 0.7403
LSh 0.7274

# Coefficient comparison table
coef_table <- data.frame(
  OLS = round(beta_ols, 3),
  LSh = round(beta_LSh, 3),
  True = round(beta_true, 3)
)

kable(coef_table, caption = "Estimated Coefficients by Method (rounded)")
Estimated Coefficients by Method (rounded)
OLS LSh True
X_centred1 1.238 1.238 1
X_centred2 -1.518 -1.503 -1
X_centred3 2.340 2.334 2
X_centred4 -2.089 -2.072 -2
X_centred5 2.867 2.855 3
X_centred6 -3.075 -3.053 -3
X_centred7 4.204 4.183 4
X_centred8 -4.059 -4.037 -4
X_centred9 4.897 4.867 5
X_centred10 -4.858 -4.839 -5

Studentโ€™s t Distribution

This example corresponds to the Studentโ€™s tt distributed Multivariate Gaussian distributed covariates with Toeplitz covariance matrix setup. We repeat the setup from above but replace Gaussian noise with tt-distributed noise with degrees of freedom ฮฝ=5024\nu = \frac{50}{24}. Centering is applied before estimation.

# Load packages
library(savvySh)
library(MASS)
library(knitr)

# Parameters
set.seed(123)
n_val <- 1000
p_val <- 10
df_val = 50/24 
sigma_val <- 5
mu_val <- 0

# Correlation matrix
sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

# True beta
theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

# Simulate data
Sigma <- sigma.rho(rho_val, p_val)
X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
X_centred <- scale(X, center = TRUE, scale = FALSE)
beta_true <- theta_func(p_val)
y <- as.vector(X_centred %*% beta_true) + rt(n = n_val, df = df_val)
y_centred <- scale(y, center = TRUE, scale = FALSE)

# Fit models
ols_fit <- lm(y_centred ~ X_centred-1)
beta_ols <- coef(ols_fit)

linear_results <- savvySh(X_centred, y_centred, model_class = "Linear")
beta_LSh <- coef(linear_results, "LSh")

The tables below report the results of each estimator: the first table shows the L2L_2 distance between the estimated coefficient vector and the ground truth, while the second table displays the actual estimated coefficients (rounded) for each method alongside the true values.

# L2 comparison
l2_table <- data.frame(
  Method = c("OLS", "LSh"),
  L2_Distance = c(
    sqrt(sum((beta_ols - beta_true)^2)),
    sqrt(sum((beta_LSh  - beta_true)^2))
  )
)

kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients")
L2 Distance Between Estimated and True Coefficients
Method L2_Distance
OLS 5.6927
LSh 4.8586

# Coefficient comparison table
coef_table <- data.frame(
  OLS = round(beta_ols, 3),
  LSh = round(beta_LSh, 3),
  True = round(beta_true, 3)
)

kable(coef_table, caption = "Estimated Coefficients by Method (rounded)")
Estimated Coefficients by Method (rounded)
OLS LSh True
X_centred1 1.885 1.698 1
X_centred2 -3.811 -2.715 -1
X_centred3 5.835 4.813 2
X_centred4 -1.692 -1.179 -2
X_centred5 0.423 0.297 3
X_centred6 -3.852 -3.078 -3
X_centred7 5.120 3.907 4
X_centred8 -3.879 -3.122 -4
X_centred9 4.503 3.199 5
X_centred10 -5.195 -4.492 -5

Shrinkage Ridge Regression

The SRR estimator improves upon RR by replacing the ฮฃ\Sigma with a convex combination of ฮฃ\Sigma and a scaled identity matrix, chosen to minimize the MSE of the coefficients.

Latent Space Features

This example corresponds to the Latent Space Features setup. Here, covariates are generated using a latent factor model with a small amount of additive Gaussian noise. This simulates a low-rank structure in the design matrix. We compare ridge regression estimates using glmnet against the SRR from savvySh.

# Load packages
library(savvySh)
library(MASS)
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-8
library(knitr)

# Parameters
set.seed(123)
n_val <- 1000
p_val <- 10
f_val <- 5
sigma_val <- 5

# True beta
theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

A <- matrix(rnorm(n_val * f_val), nrow = n_val)  
Z <- matrix(rnorm(f_val * p_val), nrow = f_val) 
X <- A %*% Z  # n x p matrix
noise <- matrix(rnorm(n_val * p_val, sd = sqrt(10^(-6))), nrow = n_val)
X_noisy <- X + noise
X_intercept <- cbind(rep(1, n_val), X_noisy)
beta_true <- theta_func(p_val + 1)
y <- rnorm(n_val,mean=as.vector(X_intercept%*%beta_true),sd=sigma_val)

# Fit models
OLS_results <- lm(y~X_noisy)
beta_OLS <- coef(OLS_results)

glmnet_fit <- cv.glmnet(X_noisy, y, alpha = 0)
lambda_min_RR_glmnet <- glmnet_fit$lambda.min
beta_RR <- as.vector(coef(glmnet_fit, s = "lambda.min"))

SRR_results <- savvySh(X_noisy, y, model_class = "ShrinkageRR")
beta_SRR <- coef(SRR_results, "SRR")

The tables below report the results of each estimator: the first table shows the L2L_2 distance between the estimated coefficient vector and the ground truth, while the second table displays the actual estimated coefficients (rounded) for each method alongside the true values.

# L2 comparison
l2_table <- data.frame(
  Method = c("OLS", "RR", "SRR"),
  L2_Distance = c(
     sqrt(sum((beta_OLS - beta_true)^2)),
    sqrt(sum((beta_RR - beta_true)^2)),
    sqrt(sum((beta_SRR  - beta_true)^2))
  )
)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients")
L2 Distance Between Estimated and True Coefficients
Method L2_Distance
OLS 313.7002
RR 13.8878
SRR 9.0543

# Coefficient comparison table
coef_table <- data.frame(
  OLS = round(beta_OLS, 3),
  RR = round(beta_RR, 3),
  SRR = round(beta_SRR, 3),
  True = round(beta_true, 3)
)
kable(coef_table, caption = "Estimated Coefficients by Method (rounded)")
Estimated Coefficients by Method (rounded)
OLS RR SRR True
(Intercept) 1.010 1.011 1.007 1
X_noisy1 45.350 3.057 2.838 -1
X_noisy2 -4.960 -8.059 -2.278 2
X_noisy3 147.454 -0.907 -0.780 -2
X_noisy4 42.069 2.850 4.477 3
X_noisy5 -63.730 0.541 0.154 -3
X_noisy6 -186.510 1.760 1.627 4
X_noisy7 118.817 0.063 -0.727 -4
X_noisy8 -17.079 2.482 3.598 5
X_noisy9 -132.102 -0.035 -2.048 -5
X_noisy10 31.292 3.009 3.117 6

Real Data Analysis

This section provides example code and references for applying the shrinkage estimators implemented in savvySh to real-world datasets. These applications demonstrate how shrinkage methods can be used in areas such as generalized linear models, portfolio investment, and fine-mapping in genetics. We do not run the code or present specific results here; instead, we offer clean, reproducible examples that users can adapt. Full details and preprocessing steps can be found in the relevant repositories or the main paper.

  • The cybersickness dataset is used for illustrating shrinkage GLMs using a Poisson model. The main Shrinkage GLMs function is part of the related package savvyGLM, you can find here.

  • The returns_441 dataset contains portfolio returns and is used for demonstrating shrinkage estimation in financial asset allocation.

  • The Statistical Fine-Mapping Application demonstrates the use of shrinkage-based estimators in a genetic context. We do not provide detailed descriptions here, as the full R code and analysisโ€”including data preparation and model fittingโ€”are available at:flashfm-savvySh.

Cybersickness data

This example demonstrates how to apply Poisson GLMs and various shrinkage estimators to a dataset related to cybersickness severity scores. We use the cybersickness_10lags dataset, included in the package under data/cybersickness_10lags.rda. The response variable is ordinal and grouped into four categories. The goal is to predict this ordinal outcome using lagged predictors. Here we show how to preprocess, fit models, and evaluate prediction performance using metrics such as MSE, RMSE, and MAE.

remotes::install_github("Ziwei-ChenChen/savvyGLM")
library(savvyGLM)
library(savvySh)
library(savvyGLM)
library(MASS)

standardize_features<-function(dataset){  
  dataset[2:(ncol(dataset)-1)] <- as.data.frame(scale(dataset[2:(ncol(dataset)-1)])) 
  return(dataset)
}

set_classes<-function(data){  
  data[,ncol(data)]<-replace(data[,ncol(data)], data[,ncol(data)]<1, 0) 
  data[,ncol(data)]<-replace(data[,ncol(data)], data[,ncol(data)] %in% c(1,2,2.5,3,3.5), 1) 
  data[,ncol(data)]<-replace(data[,ncol(data)], data[,ncol(data)] %in% c(4,5,6), 2) 
  data[,ncol(data)]<-replace(data[,ncol(data)], data[,ncol(data)]>=7, 3) 
  return(data)
}
fit_and_return_coefficients <- function(x, y) {
  
  control_list <- list(maxit = 200, epsilon = 1e-6, trace = TRUE)
  family_type <- poisson(link = "log")
  
  # Fitting models
  opt_glm2_OLS <- glm.fit2(x, y, family = family_type, control = control_list)
  opt_glm2_SR <- savvy_glm.fit2(x, y, model_class = "SR", family = family_type, control = control_list)
  opt_glm2_GSR <- savvy_glm.fit2(x, y, model_class = "GSR", family = family_type, control = control_list)
  opt_glm2_St <- savvy_glm.fit2(x, y, model_class = "St", family = family_type, control = control_list)
  opt_glm2_DSh <- savvy_glm.fit2(x, y, model_class = "DSh", family = family_type, control = control_list)
  opt_glm2_Sh <- savvy_glm.fit2(x, y, model_class = "Sh", family = family_type, control = control_list)
  
  return(list(
    glm2_OLS_result = opt_glm2_OLS$coefficients,
    glm2_SR_result = opt_glm2_SR$coefficients,
    glm2_GSR_result = opt_glm2_GSR$coefficients,
    glm2_St_result = opt_glm2_St$coefficients,
    glm2_DSh_result = opt_glm2_DSh$coefficients,
    glm2_Sh_result = opt_glm2_Sh$coefficients
  ))
}

test_model <- function(glm_coefficients, data_X, data_Y) {
  upper_limit <- 3  # =3 for 4 classes; =9 for 10 classes
  
  ### Model 1 ---> OLS ###
  predicted_glm2_OLS <- floor(exp(data_X %*% as.matrix(glm_coefficients$glm2_OLS_result)))
  predicted_glm2_OLS <- ifelse(predicted_glm2_OLS <= upper_limit, predicted_glm2_OLS, upper_limit)
  
  ### Model 2 ---> SR ###
  predicted_glm2_SR <- floor(exp(data_X %*% as.matrix(glm_coefficients$glm2_SR_result)))
  predicted_glm2_SR <- ifelse(predicted_glm2_SR <= upper_limit, predicted_glm2_SR, upper_limit)
  
  ### Model 3 ---> GSR ###
  predicted_glm2_GSR <- floor(exp(data_X %*% as.matrix(glm_coefficients$glm2_GSR_result)))
  predicted_glm2_GSR <- ifelse(predicted_glm2_GSR <= upper_limit, predicted_glm2_GSR, upper_limit)
  
  ### Model 4 ---> Stein ###
  predicted_glm2_St <- floor(exp(data_X %*% as.matrix(glm_coefficients$glm2_St_result)))
  predicted_glm2_St <- ifelse(predicted_glm2_St <= upper_limit, predicted_glm2_St, upper_limit)
  
  ### Model 5 ---> Diagonal Shrinkage ###
  predicted_glm2_DSh <- floor(exp(data_X %*% as.matrix(glm_coefficients$glm2_DSh_result)))
  predicted_glm2_DSh <- ifelse(predicted_glm2_DSh <= upper_limit, predicted_glm2_DSh, upper_limit)
  
  ### Model 6 ---> Sh ###
  predicted_glm2_Sh <- floor(exp(data_X %*% as.matrix(glm_coefficients$glm2_Sh_result)))
  predicted_glm2_Sh <- ifelse(predicted_glm2_Sh <= upper_limit, predicted_glm2_Sh, upper_limit)
  
  print(max(predicted_glm2_OLS))
  print(max(predicted_glm2_SR))
  
  r_OLS <- c(mse(data_Y, predicted_glm2_OLS), rmse(data_Y, predicted_glm2_OLS), mae(data_Y, predicted_glm2_OLS))
  r_SR <- c(mse(data_Y, predicted_glm2_SR), rmse(data_Y, predicted_glm2_SR), mae(data_Y, predicted_glm2_SR))
  r_GSR <- c(mse(data_Y, predicted_glm2_GSR), rmse(data_Y, predicted_glm2_GSR), mae(data_Y, predicted_glm2_GSR))
  r_St <- c(mse(data_Y, predicted_glm2_St), rmse(data_Y, predicted_glm2_St), mae(data_Y, predicted_glm2_St))
  r_DSh <- c(mse(data_Y, predicted_glm2_DSh), rmse(data_Y, predicted_glm2_DSh), mae(data_Y, predicted_glm2_DSh))
  r_Sh <- c(mse(data_Y, predicted_glm2_Sh), rmse(data_Y, predicted_glm2_Sh), mae(data_Y, predicted_glm2_Sh))
  
  return(list(
    results_OLS = r_OLS,
    results_SR = r_SR,
    results_GSR = r_GSR,
    results_St = r_St,
    results_DSh = r_DSh,
    results_Sh = r_Sh
  ))
}
out_of_sample_performance <- function(percentage = 0.3, no_trials = 50, filein = "data_for_regression10.csv", fileout = "results10.csv") {
  
  data <- read.csv(filein)
  agregated_results<-rep(0,4*3)
  
  data<-set_classes(data)   # for 4 classes
  #data[,ncol(data)]<-floor(data[,ncol(data)])  # for 10 classes -- the scores 2.5 and 3.5 are floored
  
  data<-standardize_features(data)
  
  for (r in 1:no_trials){
    #for train-test split with percentage e.g. 70-30
    bound <- floor(nrow(data)*(1-percentage)) #define % of training
    data <- data[sample(nrow(data)), ]           #sample rows 
    train <- data[1:bound, ]            
    test <- data[(bound+1):nrow(data), ]
    
    #training
    X.tilde <- train[,-ncol(train)]
    X <-  X.tilde[,-1]
    train_X <- as.matrix(train[,-ncol(train)])
    train_Y <-  train[,ncol(train)]
    glm_coefficients<-fit_and_return_coefficients(train_X, train_Y)
    
    #test
    test_X <- as.matrix(test[,-ncol(test)])
    test_Y <-  as.matrix(test[,ncol(test)])
    results<-test_model(glm_coefficients, test_X, test_Y)
    agregated_results<-agregated_results+unlist(results)
  }
  
  # Average results over trials
  aggregated_results <- aggregated_results / no_trials
  df <- data.frame(matrix(unlist(agregated_results), nrow=length(results), byrow=TRUE))
  colnames(df) <- c("MSE", "RMSE", "MAE")
  rownames(df) <- c("GLM2", "SR", "GSR", "St", "DSh", "Sh")
  write.csv(df, fileout)
}

input_file_path <- "data_for_regression10.csv"
output_file_path <- "results10.csv"
run_performance_test <- out_of_sample_performance(
  percentage = 0.3,        
  no_trials = 50,            
  filein = input_file_path,  
  fileout = output_file_path 
)

Portfolio Investment

This second example applies shrinkage methods to asset return data for constructing optimal portfolios. We use the returns_441 dataset, included in the package under data/returns_441.rda. The dataset consists of daily returns for 441 stocks including the index, with Date formatted as YYYYMMDD. For each rolling window, we:

  • Fit multiple regression-based models (OLS, RR, POET_99%, St, DSh, Sh, SR, GSR, SRR).

  • Extract the estimated coefficients and compute portfolio weights.

  • Simulate out-of-sample portfolio performance.

  • Compute rolling statistics: expected annual returns, volatility, and Sharpe ratios.

library(savvySh)
library(MASS)
library(glmnet)
library(PerformanceAnalytics)
library(lubridate)
library(quadprog)
library(xts)
library(POET)

data <- returns_441
data$Date <- as.Date(as.character(data$Date), format = "%Y%m%d")
colnames(data)[2:442] <- paste0("Company", 1:441)

training_size <- 5 * 252  
testing_size  <- 3 * 21  
step_size <- 3 * 21   
n_total <- nrow(data)
max_windows <- floor((n_total - training_size - testing_size) / step_size) + 1
cat("Total rows:", n_total, "\n")
cat("Max windows:", max_windows, "\n")

get_full_weights <- function(est_vector) {
  w <- est_vector[-1]      
  w_last <- 1 - sum(w)    
  return(c(w, w_last))
}
poet_est_99 <- function(x, y) {
  x <- as.matrix(x)
  y <- as.numeric(y)
  n <- nrow(x)
  p <- ncol(x)
  
  x_mean <- colMeans(x)
  y_mean <- mean(y)
  x_c <- x - matrix(rep(x_mean, each = n), n, p)
  y_c <- y - y_mean
  Y=t(x_c)
  
  # Choose K to explain 99% variance
  eigvals <- eigen(cov(x_c), symmetric = TRUE, only.values = TRUE)$values
  eigvals_sorted <- sort(eigvals, decreasing = TRUE)
  cumsum_vals <- cumsum(eigvals_sorted) / sum(eigvals_sorted)
  K <- which(cumsum_vals >= 0.99)[1]
  poet_result <- POET::POET(Y, K = K)
  Sigma_hat <- poet_result$SigmaY
  
  xcyc <- crossprod(x_c, y_c) / n
  beta_1p <- as.numeric(solve(Sigma_hat, xcyc))
  beta_0 <- y_mean - sum(x_mean * beta_1p)
  beta_full <- c(beta_0, beta_1p)
  
  return(as.vector(beta_full))
}
rolling_annual_expected_returns <- data.frame()
rolling_annual_sharpe_ratios <- data.frame()
rolling_annual_volatilities <- data.frame()

for (window_index in seq_len(max_windows)) {
  start_index <- 1 + (window_index - 1) * step_size
  train_start <- start_index
  train_end <- start_index + training_size - 1
  test_start <- train_end + 1
  test_end <- train_end + testing_size

  train_data <- data[train_start:train_end, ]
  test_data <- data[test_start:test_end, ]
  train_returns <- as.matrix(train_data[, -1])  
  test_returns <- as.matrix(test_data[, -1])  
  
  # Create X_train and Y_train
  # Y_train is last column (Company441)
  # X_train is (Y_train - first 440 columns)
  Y_train <- train_returns[, 441]
  X_train <- matrix(Y_train, nrow = nrow(train_returns), ncol = 440) - train_returns[, 1:440]
  
  # Fit all estimators
  est_results <- list(
    OLS = OLS_est(X_train, Y_train)$est,
    RR = RR_est(X_train, Y_train)$est,
    POET_99 = poet_est_99(X_train, Y_train),
    St = St_ost(X_train, Y_train),
    DSh = DSh_ost(X_train, Y_train),
    Sh = Sh_ost(X_train, Y_train), 
    SR = SR_ost(X_train, Y_train),
    GSR = GSR_ost(X_train, Y_train),
    SRR = SRR_shrinkage_ost(X_train, Y_train)
  )
  weights_list <- lapply(est_results, get_full_weights)
  names(weights_list) <- names(est_results)
  
  test_dates <- as.Date(test_data$Date)
  test_returns_xts <- xts(test_returns, order.by = test_dates)
  daily_returns_list <- lapply(weights_list, function(w) {
    rp <- Return.portfolio(R = test_returns_xts, weights = w)
    return(as.numeric(rp)) 
  })
  daily_values_list <- lapply(daily_returns_list, function(r) {
    cum_val <- cumprod(1 + r)
    return(cum_val)
  })
  
  model_names <- names(daily_returns_list)
  n_test <- length(test_start:test_end)
  daily_values_mat <- matrix(0, nrow = length(model_names), ncol = n_test)
  daily_returns_mat <- matrix(0, nrow = length(model_names), ncol = n_test)
  rownames(daily_values_mat) <- model_names
  rownames(daily_returns_mat) <- model_names
  for (i in seq_along(model_names)) {
    daily_values_mat[i, ]  <- daily_values_list[[i]]
    daily_returns_mat[i, ] <- daily_returns_list[[i]]
  }
  
  returns_xts_mat <- xts(t(daily_returns_mat), order.by = test_dates)
  annual_returns <- as.numeric(Return.annualized(R = returns_xts_mat, scale = 252))
  names(annual_returns) <- colnames(returns_xts_mat)
  annual_vols <- as.numeric(StdDev.annualized(x = returns_xts_mat, scale = 252))
  names(annual_vols) <- colnames(returns_xts_mat)
  annual_sharp <- as.numeric(SharpeRatio.annualized(R = returns_xts_mat, scale = 252))
  names(annual_sharp) <- colnames(returns_xts_mat)
  
  window_result_returns <- as.data.frame(t(annual_returns))
  window_result_returns$Window <- window_index
  rolling_annual_expected_returns <- rbind(rolling_annual_expected_returns, window_result_returns)
  
  window_result_sharpe <- as.data.frame(t(annual_sharp))
  window_result_sharpe$Window <- window_index
  rolling_annual_sharpe_ratios <- rbind(rolling_annual_sharpe_ratios, window_result_sharpe)
  
  window_result_vols <- as.data.frame(t(annual_vols))
  window_result_vols$Window <- window_index
  rolling_annual_volatilities <- rbind(rolling_annual_volatilities, window_result_vols)
  
  cat("Completed window", window_index,
      ": Training rows [", train_start, "to", train_end, 
      "] (Dates:", format(train_data$Date[1], "%Y-%m-%d"), 
      "to", format(train_data$Date[nrow(train_data)], "%Y-%m-%d"), 
      "), Testing rows [", test_start, "to", test_end, 
      "] (Dates:", format(test_data$Date[1], "%Y-%m-%d"), 
      "to", format(test_data$Date[nrow(test_data)], "%Y-%m-%d"), ")\n")
}