Skip to contents

Performs k-fold cross-validation for Parity Regression (PR) models to select optimal tuning parameters. The underlying PR methodology distributes the total prediction error evenly across all parameters, ensuring stability in the presence of high multicollinearity and substantial noise (such as time series data with structural changes and evolving trends). This function supports both Budget-based and Target-based parameterizations and evaluates models across a variety of loss metrics.

Usage

cv.savvyPR(
  x,
  y,
  method = c("budget", "target"),
  vals = NULL,
  nval = 100,
  lambda_vals = NULL,
  nlambda = 100,
  folds = 10,
  model_type = c("PR3", "PR1", "PR2"),
  measure_type = c("mse", "mae", "rmse", "mape"),
  foldid = FALSE,
  use_feature_selection = FALSE,
  standardize = FALSE,
  intercept = TRUE,
  exclude = NULL
)

Arguments

x

A matrix of predictors with rows as observations and columns as variables. Must not contain NA values, and should not include an intercept column of ones.

y

A numeric vector of the response variable, should have the same number of observations as x. Must not contain NA values.

method

Character string specifying the parameterization method to use: "budget" (default) or "target".

vals

Optional; a numeric vector of values for tuning the PR model (acts as c for budget, or t for target). If NULL, a default sequence is generated based on the selected method. Must contain at least two values.

nval

Numeric value specifying the number of tuning values to try in the optimization process if vals=NULL. Defaults to 100.

lambda_vals

Optional; a numeric vector of lambda values used for regularization in the "PR2" and "PR3" model types. If NULL and model_type is "PR2" or "PR3", a default sequence is used. Must contain at least two values.

nlambda

Numeric value specifying the number of lambda_val values to try in the optimization process if lambda_vals=NULL.

folds

The number of folds to be used in the cross-validation, default is 10. Must be an integer >= 3.

model_type

Character string specifying the type of model to fit. Defaults to "PR3". Can be one of "PR3", "PR1", or "PR2". See details for further clarification.

measure_type

Character vector specifying the measure to use for model evaluation. Defaults to "mse". Supported types include "mse", "mae", "rmse", and "mape".

foldid

Logical indicating whether to return fold assignments. Defaults to FALSE.

use_feature_selection

Logical indicating whether to perform feature selection during the model fitting process. Defaults to FALSE.

standardize

Logical indicating whether to standardize predictor variables. Defaults to TRUE.

intercept

Logical indicating whether to include an intercept in the model. Defaults to TRUE.

exclude

Optional; indicate if any variables should be excluded in the model fitting process.

Value

A list of class "cv.savvyPR" containing the following components based on the specified model_type:

call

The matched call used to invoke the function.

coefficients

The optimal coefficients results of the final fitted model.

mean_error_cv

A vector of computed error values across all tested parameters.

model_type

The type of PR model used: PR1, PR2, or PR3.

measure_type

The loss measure used for evaluation, with a descriptive name.

method

The parameterization method used: "budget" or "target".

PR_fit

The final fitted model object from the savvyPR function.

coefficients_cv

A matrix of average coefficients across all cross-validation folds for each tuning parameter.

vals

The tuning values (acting as c or t) used in the cross-validation process.

lambda_vals

The lambda values used in the cross-validation process, applicable to PR2 and PR3.

optimal_val

The optimal tuning value found from cross-validation, applicable to PR1 and PR2.

fixed_val

The fixed tuning value used in PR3, derived from an initial PR1-style optimization.

optimal_lambda_val

The optimal lambda value found in PR3.

fixed_lambda_val

The fixed lambda value used in PR2, derived from cv.glmnet.

optimal_index

A list detailing the indices of the optimal parameters within the cross-validation matrix.

fold_assignments

(Optional) The fold assignments used during the cross-validation, provided if foldid=TRUE.

Details

Cross-Validation for Parity Regression Model Estimation

This function facilitates cross-validation for parity regression models across a range of tuning values (val) and regularization values (\(\lambda\)), depending on the model type specified. Each model type handles the parameters differently:

PR1

Performs cross-validation only over the val sequence while fixing \(\lambda=0\). This model type is primarily used when the focus is on understanding how different levels of risk parity constraints impact the model performance purely based on the parity mechanism without the influence of ridge \(\lambda\) shrinkage.

PR2

Uses a fixed \(\lambda\) value determined by performing a ridge regression (lambda optimization) using cv.glmnet on the dataset. It then performs cross-validation over the val sequence while using this optimized \(\lambda\) value. This approach is useful when one wishes to maintain a stable amount of standard shrinkage while exploring the impact of varying levels of the proportional contribution constraint.

PR3

First, determines an optimal val using the same method as PR1. Then, keeping this val fixed, it conducts a cross-validation over all possible \(\lambda\) values. This dual-stage optimization can be particularly effective when the initial parity regularization needs further refinement via \(\lambda\) adjustment.

The function supports several types of loss metrics for assessing model performance:

mse

Mean Squared Error: Measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

mae

Mean Absolute Error: Measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.

rmse

Root Mean Squared Error: It is the square root of the mean of the squared errors. RMSE is a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is prediction.

mape

Mean Absolute Percentage Error: Measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error, as shown above. Because it is based on relative errors, it is less sensitive to large deviations in small true values.

The choice of measure impacts how the model's performance is assessed during cross-validation. Users should select the measure that best reflects the requirements of their specific analytical context.

References

Asimit, V., Chen, Z., Ichim, B., & Millossovich, P. (2026). Prity Regression Estimation. Retrieved from https://openaccess.city.ac.uk/id/eprint/37017/

The optimization technique employed follows the algorithm described by: F. Spinu (2013). An Algorithm for Computing Risk Parity Weights. SSRN Preprint. doi:10.2139/ssrn.2297383

See also

savvyPR, glmnet, cv.glmnet, calcLoss, getMeasureName, optimizeRiskParityBudget, optimizeRiskParityTarget

Author

Ziwei Chen, Vali Asimit and Pietro Millossovich
Maintainer: Ziwei Chen <ziwei.chen.3@citystgeorges.ac.uk>

Examples

# \donttest{
# Generate synthetic data
set.seed(123)
n <- 100 # Number of observations
p <- 12  # Number of variables
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(rnorm(p), p, 1)
y <- x %*% beta + rnorm(n, sd = 0.5)

# Example 1: PR1 with "budget" method (focusing on c values with MSE)
result_pr1_budget <- cv.savvyPR(x, y, method = "budget", model_type = "PR1")
print(result_pr1_budget)
#> 
#> Call:  cv.savvyPR(x = x, y = y, method = "budget", model_type = "PR1") 
#> 
#>  Method Number of Non-Zero Coefficients Intercept Included Optimal Val
#>  budget                              13                Yes           0
#>  Fixed Lambda Value
#>                   0
#> 
#> Coefficients:
#>  Coefficient Estimate
#>  (Intercept)  -0.0005
#>           X1   0.6012
#>           X2  -0.7266
#>           X3   0.8628
#>           X4  -0.7532
#>           X5   0.5502
#>           X6   1.2139
#>           X7  -0.9149
#>           X8   1.1698
#>           X9  -0.4934
#>          X10   0.2795
#>          X11   0.2194
#>          X12  -0.5100

# Example 2: PR1 with "target" method
result_pr1_target <- cv.savvyPR(x, y, method = "target", model_type = "PR1")
print(result_pr1_target)
#> 
#> Call:  cv.savvyPR(x = x, y = y, method = "target", model_type = "PR1") 
#> 
#>  Method Number of Non-Zero Coefficients Intercept Included Optimal Val
#>  target                              13                Yes           0
#>  Fixed Lambda Value
#>                   0
#> 
#> Coefficients:
#>  Coefficient Estimate
#>  (Intercept)  -0.0005
#>           X1   0.6012
#>           X2  -0.7266
#>           X3   0.8628
#>           X4  -0.7532
#>           X5   0.5502
#>           X6   1.2139
#>           X7  -0.9149
#>           X8   1.1698
#>           X9  -0.4934
#>          X10   0.2795
#>          X11   0.2194
#>          X12  -0.5100

# Example 3: PR3 (default model_type) exploring budget parameter
result_pr3 <- cv.savvyPR(x, y, method = "budget", folds = 5)
print(result_pr3)
#> 
#> Call:  cv.savvyPR(x = x, y = y, method = "budget", folds = 5) 
#> 
#>  Method Number of Non-Zero Coefficients Intercept Included Fixed Val
#>  budget                              13                Yes         0
#>  Optimal Lambda Value
#>               0.01456
#> 
#> Coefficients:
#>  Coefficient Estimate
#>  (Intercept)   0.0025
#>           X1   0.5988
#>           X2  -0.7233
#>           X3   0.8566
#>           X4  -0.7468
#>           X5   0.5470
#>           X6   1.2085
#>           X7  -0.9096
#>           X8   1.1625
#>           X9  -0.4891
#>          X10   0.2774
#>          X11   0.2175
#>          X12  -0.5078
# }