Cross-Validation for Parity Regression Model Estimation

Performs k-fold cross-validation for Parity Regression (PR) models to select optimal tuning parameters. The underlying PR methodology distributes the total prediction error evenly across all parameters, ensuring stability in the presence of high multicollinearity and substantial noise (such as time series data with structural changes and evolving trends). This function supports both Budget-based and Target-based parameterizations and evaluates models across a variety of loss metrics.

Usage

cv.savvyPR(
  x,
  y,
  method = c("budget", "target"),
  vals = NULL,
  nval = 100,
  lambda_vals = NULL,
  nlambda = 100,
  folds = 10,
  model_type = c("PR3", "PR1", "PR2"),
  measure_type = c("mse", "mae", "rmse", "mape"),
  foldid = FALSE,
  use_feature_selection = FALSE,
  standardize = FALSE,
  intercept = TRUE,
  exclude = NULL
)

Arguments

x: A matrix of predictors with rows as observations and columns as variables. Must not contain NA values, and should not include an intercept column of ones.
y: A numeric vector of the response variable, should have the same number of observations as x. Must not contain NA values.
method: Character string specifying the parameterization method to use: "budget" (default) or "target".
vals: Optional; a numeric vector of values for tuning the PR model (acts as c for budget, or t for target). If NULL, a default sequence is generated based on the selected method. Must contain at least two values.
nval: Numeric value specifying the number of tuning values to try in the optimization process if vals=NULL. Defaults to 100.
lambda_vals: Optional; a numeric vector of lambda values used for regularization in the "PR2" and "PR3" model types. If NULL and model_type is "PR2" or "PR3", a default sequence is used. Must contain at least two values.
nlambda: Numeric value specifying the number of lambda_val values to try in the optimization process if lambda_vals=NULL.
folds: The number of folds to be used in the cross-validation, default is 10. Must be an integer >= 3.
model_type: Character string specifying the type of model to fit. Defaults to "PR3". Can be one of "PR3", "PR1", or "PR2". See details for further clarification.
measure_type: Character vector specifying the measure to use for model evaluation. Defaults to "mse". Supported types include "mse", "mae", "rmse", and "mape".
foldid: Logical indicating whether to return fold assignments. Defaults to FALSE.
use_feature_selection: Logical indicating whether to perform feature selection during the model fitting process. Defaults to FALSE.
standardize: Logical indicating whether to standardize predictor variables. Defaults to TRUE.
intercept: Logical indicating whether to include an intercept in the model. Defaults to TRUE.
exclude: Optional; indicate if any variables should be excluded in the model fitting process.

Value

A list of class "cv.savvyPR" containing the following components based on the specified model_type:

call: The matched call used to invoke the function.
coefficients: The optimal coefficients results of the final fitted model.
mean_error_cv: A vector of computed error values across all tested parameters.
model_type: The type of PR model used: PR1, PR2, or PR3.
measure_type: The loss measure used for evaluation, with a descriptive name.
method: The parameterization method used: "budget" or "target".
PR_fit: The final fitted model object from the savvyPR function.
coefficients_cv: A matrix of average coefficients across all cross-validation folds for each tuning parameter.
vals: The tuning values (acting as c or t) used in the cross-validation process.
lambda_vals: The lambda values used in the cross-validation process, applicable to PR2 and PR3.
optimal_val: The optimal tuning value found from cross-validation, applicable to PR1 and PR2.
fixed_val: The fixed tuning value used in PR3, derived from an initial PR1-style optimization.
optimal_lambda_val: The optimal lambda value found in PR3.
fixed_lambda_val: The fixed lambda value used in PR2, derived from cv.glmnet.
optimal_index: A list detailing the indices of the optimal parameters within the cross-validation matrix.
fold_assignments: (Optional) The fold assignments used during the cross-validation, provided if foldid=TRUE.

Details

Cross-Validation for Parity Regression Model Estimation

This function facilitates cross-validation for parity regression models across a range of tuning values (val) and regularization values (\(\lambda\)), depending on the model type specified. Each model type handles the parameters differently:

PR1: Performs cross-validation only over the val sequence while fixing \(\lambda=0\). This model type is primarily used when the focus is on understanding how different levels of risk parity constraints impact the model performance purely based on the parity mechanism without the influence of ridge \(\lambda\) shrinkage.
PR2: Uses a fixed \(\lambda\) value determined by performing a ridge regression (lambda optimization) using cv.glmnet on the dataset. It then performs cross-validation over the val sequence while using this optimized \(\lambda\) value. This approach is useful when one wishes to maintain a stable amount of standard shrinkage while exploring the impact of varying levels of the proportional contribution constraint.
PR3: First, determines an optimal val using the same method as PR1. Then, keeping this val fixed, it conducts a cross-validation over all possible \(\lambda\) values. This dual-stage optimization can be particularly effective when the initial parity regularization needs further refinement via \(\lambda\) adjustment.

The function supports several types of loss metrics for assessing model performance:

mse: Mean Squared Error: Measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.
mae: Mean Absolute Error: Measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.
rmse: Root Mean Squared Error: It is the square root of the mean of the squared errors. RMSE is a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is prediction.
mape: Mean Absolute Percentage Error: Measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error, as shown above. Because it is based on relative errors, it is less sensitive to large deviations in small true values.

The choice of measure impacts how the model's performance is assessed during cross-validation. Users should select the measure that best reflects the requirements of their specific analytical context.

References

Asimit, V., Chen, Z., Ichim, B., & Millossovich, P. (2026). Prity Regression Estimation. Retrieved from https://openaccess.city.ac.uk/id/eprint/37017/

The optimization technique employed follows the algorithm described by: F. Spinu (2013). An Algorithm for Computing Risk Parity Weights. SSRN Preprint. doi:10.2139/ssrn.2297383

Author

Ziwei Chen, Vali Asimit and Pietro Millossovich
Maintainer: Ziwei Chen <ziwei.chen.3@citystgeorges.ac.uk>

Examples

# \donttest{
# Generate synthetic data
set.seed(123)
n <- 100 # Number of observations
p <- 12  # Number of variables
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(rnorm(p), p, 1)
y <- x %*% beta + rnorm(n, sd = 0.5)

# Example 1: PR1 with "budget" method (focusing on c values with MSE)
result_pr1_budget <- cv.savvyPR(x, y, method = "budget", model_type = "PR1")
print(result_pr1_budget)
#> 
#> Call:  cv.savvyPR(x = x, y = y, method = "budget", model_type = "PR1") 
#> 
#>  Method Number of Non-Zero Coefficients Intercept Included Optimal Val
#>  budget                              13                Yes           0
#>  Fixed Lambda Value
#>                   0
#> 
#> Coefficients:
#>  Coefficient Estimate
#>  (Intercept)  -0.0005
#>           X1   0.6012
#>           X2  -0.7266
#>           X3   0.8628
#>           X4  -0.7532
#>           X5   0.5502
#>           X6   1.2139
#>           X7  -0.9149
#>           X8   1.1698
#>           X9  -0.4934
#>          X10   0.2795
#>          X11   0.2194
#>          X12  -0.5100

# Example 2: PR1 with "target" method
result_pr1_target <- cv.savvyPR(x, y, method = "target", model_type = "PR1")
print(result_pr1_target)
#> 
#> Call:  cv.savvyPR(x = x, y = y, method = "target", model_type = "PR1") 
#> 
#>  Method Number of Non-Zero Coefficients Intercept Included Optimal Val
#>  target                              13                Yes           0
#>  Fixed Lambda Value
#>                   0
#> 
#> Coefficients:
#>  Coefficient Estimate
#>  (Intercept)  -0.0005
#>           X1   0.6012
#>           X2  -0.7266
#>           X3   0.8628
#>           X4  -0.7532
#>           X5   0.5502
#>           X6   1.2139
#>           X7  -0.9149
#>           X8   1.1698
#>           X9  -0.4934
#>          X10   0.2795
#>          X11   0.2194
#>          X12  -0.5100

# Example 3: PR3 (default model_type) exploring budget parameter
result_pr3 <- cv.savvyPR(x, y, method = "budget", folds = 5)
print(result_pr3)
#> 
#> Call:  cv.savvyPR(x = x, y = y, method = "budget", folds = 5) 
#> 
#>  Method Number of Non-Zero Coefficients Intercept Included Fixed Val
#>  budget                              13                Yes         0
#>  Optimal Lambda Value
#>               0.01456
#> 
#> Coefficients:
#>  Coefficient Estimate
#>  (Intercept)   0.0025
#>           X1   0.5988
#>           X2  -0.7233
#>           X3   0.8566
#>           X4  -0.7468
#>           X5   0.5470
#>           X6   1.2085
#>           X7  -0.9096
#>           X8   1.1625
#>           X9  -0.4891
#>          X10   0.2774
#>          X11   0.2175
#>          X12  -0.5078
# }

Cross-Validation for Parity Regression Model Estimation

Usage

Arguments

Value

Details

References

See also

Author

Examples