Performs k-fold cross-validation for Parity Regression (PR) models to select optimal tuning parameters. The underlying PR methodology distributes the total prediction error evenly across all parameters, ensuring stability in the presence of high multicollinearity and substantial noise (such as time series data with structural changes and evolving trends). This function supports both Budget-based and Target-based parameterizations and evaluates models across a variety of loss metrics.
Usage
cv.savvyPR(
x,
y,
method = c("budget", "target"),
vals = NULL,
nval = 100,
lambda_vals = NULL,
nlambda = 100,
folds = 10,
model_type = c("PR3", "PR1", "PR2"),
measure_type = c("mse", "mae", "rmse", "mape"),
foldid = FALSE,
use_feature_selection = FALSE,
standardize = FALSE,
intercept = TRUE,
exclude = NULL
)Arguments
- x
A matrix of predictors with rows as observations and columns as variables. Must not contain
NAvalues, and should not include an intercept column of ones.- y
A numeric vector of the response variable, should have the same number of observations as
x. Must not containNAvalues.- method
Character string specifying the parameterization method to use:
"budget"(default) or"target".- vals
Optional; a numeric vector of values for tuning the PR model (acts as
cfor budget, ortfor target). IfNULL, a default sequence is generated based on the selected method. Must contain at least two values.- nval
Numeric value specifying the number of tuning values to try in the optimization process if
vals=NULL. Defaults to 100.- lambda_vals
Optional; a numeric vector of
lambdavalues used for regularization in the"PR2"and"PR3"model types. IfNULLand model_type is"PR2"or"PR3", a default sequence is used. Must contain at least two values.- nlambda
Numeric value specifying the number of
lambda_valvalues to try in the optimization process iflambda_vals=NULL.- folds
The number of folds to be used in the cross-validation, default is
10. Must be an integer>= 3.- model_type
Character string specifying the type of model to fit. Defaults to
"PR3". Can be one of"PR3","PR1", or"PR2". See details for further clarification.- measure_type
Character vector specifying the measure to use for model evaluation. Defaults to
"mse". Supported types include"mse","mae","rmse", and"mape".- foldid
Logical indicating whether to return fold assignments. Defaults to
FALSE.- use_feature_selection
Logical indicating whether to perform feature selection during the model fitting process. Defaults to
FALSE.- standardize
Logical indicating whether to standardize predictor variables. Defaults to
TRUE.- intercept
Logical indicating whether to include an intercept in the model. Defaults to
TRUE.- exclude
Optional; indicate if any variables should be excluded in the model fitting process.
Value
A list of class "cv.savvyPR" containing the following components based on the specified model_type:
- call
The matched call used to invoke the function.
- coefficients
The optimal coefficients results of the final fitted model.
- mean_error_cv
A vector of computed error values across all tested parameters.
- model_type
The type of PR model used:
PR1,PR2, orPR3.- measure_type
The loss measure used for evaluation, with a descriptive name.
- method
The parameterization method used:
"budget"or"target".- PR_fit
The final fitted model object from the
savvyPRfunction.- coefficients_cv
A matrix of average coefficients across all cross-validation folds for each tuning parameter.
- vals
The tuning values (acting as c or t) used in the cross-validation process.
- lambda_vals
The
lambdavalues used in the cross-validation process, applicable toPR2andPR3.- optimal_val
The optimal tuning value found from cross-validation, applicable to
PR1andPR2.- fixed_val
The fixed tuning value used in
PR3, derived from an initial PR1-style optimization.- optimal_lambda_val
The optimal
lambdavalue found inPR3.- fixed_lambda_val
The fixed
lambdavalue used inPR2, derived fromcv.glmnet.- optimal_index
A list detailing the indices of the optimal parameters within the cross-validation matrix.
- fold_assignments
(Optional) The fold assignments used during the cross-validation, provided if
foldid=TRUE.
Details
Cross-Validation for Parity Regression Model Estimation
This function facilitates cross-validation for parity regression models across a range
of tuning values (val) and regularization values (\(\lambda\)), depending
on the model type specified. Each model type handles the parameters differently:
- PR1
Performs cross-validation only over the
valsequence while fixing \(\lambda=0\). This model type is primarily used when the focus is on understanding how different levels of risk parity constraints impact the model performance purely based on the parity mechanism without the influence of ridge \(\lambda\) shrinkage.- PR2
Uses a fixed \(\lambda\) value determined by performing a ridge regression (
lambdaoptimization) usingcv.glmneton the dataset. It then performs cross-validation over thevalsequence while using this optimized \(\lambda\) value. This approach is useful when one wishes to maintain a stable amount of standard shrinkage while exploring the impact of varying levels of the proportional contribution constraint.- PR3
First, determines an optimal
valusing the same method asPR1. Then, keeping thisvalfixed, it conducts a cross-validation over all possible \(\lambda\) values. This dual-stage optimization can be particularly effective when the initial parity regularization needs further refinement via \(\lambda\) adjustment.
The function supports several types of loss metrics for assessing model performance:
- mse
Mean Squared Error: Measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.
- mae
Mean Absolute Error: Measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.
- rmse
Root Mean Squared Error: It is the square root of the mean of the squared errors.
RMSEis a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is prediction.- mape
Mean Absolute Percentage Error: Measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error, as shown above. Because it is based on relative errors, it is less sensitive to large deviations in small true values.
The choice of measure impacts how the model's performance is assessed during cross-validation. Users should select the measure that best reflects the requirements of their specific analytical context.
References
Asimit, V., Chen, Z., Ichim, B., & Millossovich, P. (2026). Prity Regression Estimation. Retrieved from https://openaccess.city.ac.uk/id/eprint/37017/
The optimization technique employed follows the algorithm described by: F. Spinu (2013). An Algorithm for Computing Risk Parity Weights. SSRN Preprint. doi:10.2139/ssrn.2297383
Author
Ziwei Chen, Vali Asimit and Pietro Millossovich
Maintainer: Ziwei Chen <ziwei.chen.3@citystgeorges.ac.uk>
Examples
# \donttest{
# Generate synthetic data
set.seed(123)
n <- 100 # Number of observations
p <- 12 # Number of variables
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(rnorm(p), p, 1)
y <- x %*% beta + rnorm(n, sd = 0.5)
# Example 1: PR1 with "budget" method (focusing on c values with MSE)
result_pr1_budget <- cv.savvyPR(x, y, method = "budget", model_type = "PR1")
print(result_pr1_budget)
#>
#> Call: cv.savvyPR(x = x, y = y, method = "budget", model_type = "PR1")
#>
#> Method Number of Non-Zero Coefficients Intercept Included Optimal Val
#> budget 13 Yes 0
#> Fixed Lambda Value
#> 0
#>
#> Coefficients:
#> Coefficient Estimate
#> (Intercept) -0.0005
#> X1 0.6012
#> X2 -0.7266
#> X3 0.8628
#> X4 -0.7532
#> X5 0.5502
#> X6 1.2139
#> X7 -0.9149
#> X8 1.1698
#> X9 -0.4934
#> X10 0.2795
#> X11 0.2194
#> X12 -0.5100
# Example 2: PR1 with "target" method
result_pr1_target <- cv.savvyPR(x, y, method = "target", model_type = "PR1")
print(result_pr1_target)
#>
#> Call: cv.savvyPR(x = x, y = y, method = "target", model_type = "PR1")
#>
#> Method Number of Non-Zero Coefficients Intercept Included Optimal Val
#> target 13 Yes 0
#> Fixed Lambda Value
#> 0
#>
#> Coefficients:
#> Coefficient Estimate
#> (Intercept) -0.0005
#> X1 0.6012
#> X2 -0.7266
#> X3 0.8628
#> X4 -0.7532
#> X5 0.5502
#> X6 1.2139
#> X7 -0.9149
#> X8 1.1698
#> X9 -0.4934
#> X10 0.2795
#> X11 0.2194
#> X12 -0.5100
# Example 3: PR3 (default model_type) exploring budget parameter
result_pr3 <- cv.savvyPR(x, y, method = "budget", folds = 5)
print(result_pr3)
#>
#> Call: cv.savvyPR(x = x, y = y, method = "budget", folds = 5)
#>
#> Method Number of Non-Zero Coefficients Intercept Included Fixed Val
#> budget 13 Yes 0
#> Optimal Lambda Value
#> 0.01456
#>
#> Coefficients:
#> Coefficient Estimate
#> (Intercept) 0.0025
#> X1 0.5988
#> X2 -0.7233
#> X3 0.8566
#> X4 -0.7468
#> X5 0.5470
#> X6 1.2085
#> X7 -0.9096
#> X8 1.1625
#> X9 -0.4891
#> X10 0.2774
#> X11 0.2175
#> X12 -0.5078
# }