--- title: "lessSEM" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{lessSEM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Regularized Structural Equation Modeling Regularized structural equation modeling has been proposed by Jacobucci et al. (2016) and Huang et al. (2017). The objective is to reduce overfitting in small samples and to allow for more flexibility. The general idea is to push some parameters towards zero. To this end, a penalty function $p(\pmb\theta)$ is added to the vanilla objective function. In lessSEM, this objective function is given by the full information maximum likelihood function $F_{\text{ML}}(\pmb\theta)$. The new objective function is defined as: $$F_{\text{REGSEM},\lambda}(\pmb\theta) = F_{\text{ML}}(\pmb\theta)+ \lambda N p(\pmb\theta)$$ Think of this function as a tug-of-war: * $F_{\text{ML}}(\pmb\theta)$ wants all parameters to be close to the ordinary maximum likelihood estimates * $p(\pmb\theta)$ wants regularized parameters to be close to zero * $\lambda$ allows us to fine tune which of the two forces mentioned above gets more influence on the final parameter estimates * $N$ is the sample size. Scaling with $N$ is done to stay consistent with results returned by **regsem** and **lslx**. There are many different penalty functions which could be used. In **lessSEM**, we have implemented the following functions: $$ \begin{array}{l|llll} \text{penalty} & \text{function} & \text{optimizer} & \text{reference}\\ \hline \text{ridge} & p( x_j) = \lambda x_j^2 & \text{glmnet, ista} & \text{(Hoerl \& Kennard, 1970)}\\ \text{lasso} & p( x_j) = \lambda| x_j| & \text{glmnet, ista} & \text{(Tibshirani, 1996)}\\ \text{adaptiveLasso} & p( x_j) = \frac{1}{w_j}\lambda| x_j| & \text{glmnet, ista} & \text{(Zou, 2006)}\\ \text{elasticNet} & p( x_j) = \alpha\lambda| x_j| + (1-\alpha)\lambda x_j^2 & \text{glmnet, ista} & \text{(Zou \& Hastie, 2005)}\\ \text{cappedL1} & p( x_j) = \lambda \min(| x_j|, \theta); \theta > 0 &\text{glmnet, ista}& \text{(Zhang, 2010)}\\ \text{lsp} & p( x_j) = \lambda \log(1 + |x_j|/\theta); \theta > 0 &\text{glmnet, ista}& \text{(Candès et al., 2008)} \\ \text{scad} & p( x_j) = \begin{cases} \lambda |x_j| & \text{if } |x_j| \leq \lambda\\ \frac{-x_j^2 + 2\theta\lambda |x_j| - \lambda^2}{2(\theta -1)} & \text{if } \lambda < |x_j| \leq \lambda\theta \\ (\theta + 1) \lambda^2/2 & \text{if } |x_j| \geq \theta\lambda\\ \end{cases}; \theta > 2 &\text{glmnet, ista}& \text{(Fan \& Li, 2001)} \\ \text{mcp} & p( x_j) = \begin{cases} \lambda |x_j| - x_j^2/(2\theta) & \text{if } |x_j| \leq \theta\lambda\\ \theta\lambda^2/2 & \text{if } |x_j| > \lambda\theta \end{cases}; \theta > 0 &\text{glmnet, ista}& \text{(Zhang, 2010)} \end{array} $$ ## Objectives The objectives of **lessSEM** are to provide ... 1. a flexible framework for regularizing SEM. 2. optimizers for other packages that can handle non-differentiable penalty functions. ## Regularizing SEM **lessSEM** is heavily inspired by the **regsem** package. It also builds on **lavaan** to set up the model. ### Setting up a model First, start with lavaan: ```r library(lavaan) library(lessSEM) set.seed(4321) # let's simulate data for a simple # cfa with 7 observed variables data <- lessSEM::simulateExampleData(N = 50, loadings = c(rep(1,4), rep(0,3)) ) head(data) #> y1 y2 y3 y4 y5 y6 y7 #> [1,] -0.1737175 -0.1970204 1.1888412 1.8520403 0.16257957 1.8825526 1.1383999 #> [2,] -1.5179940 0.9029781 -0.1726986 -0.3596920 -0.02092956 -0.5798953 0.9020861 #> [3,] 0.6136418 0.2578986 -0.1359237 0.7703602 0.23502463 0.2001872 0.7986506 #> [4,] -0.5920933 0.2157830 1.6784758 1.8568433 -0.60458482 0.2219578 0.4736751 #> [5,] 0.0763996 -1.1442382 -2.8122156 0.4899892 0.03453494 2.0457604 -2.6721417 #> [6,] 2.2504896 2.9742206 0.4353705 1.2338364 0.04693253 -0.6438847 -1.1386235 # we assume a single factor structure lavaanSyntax <- " f =~ l1*y1 + l2*y2 + l3*y3 + l4*y4 + l5*y5 + l6*y6 + l7*y7 f ~~ 1*f " # estimate the model with lavaan lavaanModel <- cfa(lavaanSyntax, data = data) ``` Next, decide which parameters should be regularized. Let's go with l5-l7. In **lessSEM**, we always use the parameter labels to specify which parameters should be regularized! ```r regularized <- c("l5", "l6", "l7") # tip: we can use paste0 to make this easier: regularized <- paste0("l", 5:7) ``` Finally, we set up the regularized model. To this end, we must first decide which penalty function we want to use. If we want to shrink parameters without setting them to zero, we can use ridge regularization. Otherwise, we must use any of the other penalty functions mentioned above. In **lessSEM**, there is a dedicated function for each of these penalties. The names of these functions are identical to the "penalty" column in the table above. For instance, let's have a look at the lasso penalty: ```r fitLasso <- lasso(lavaanModel = lavaanModel, regularized = regularized, # please use much larger nLambdas in practice (e.g., 100)! nLambdas = 5) ``` Plot the paths to see what is going on: ```r plot(fitLasso) ``` ![plot of chunk unnamed-chunk-6](lessSEMFigures/lessSEM-unnamed-chunk-6-1.png) Note that the parameters are pulled towards zero as $\lambda$ increases. Note also that we did not specify specific values for $\lambda$ in the lasso function above. Instead, we only specified how many $\lambda$s we want to have (`nLambdas=50`). If we use the lasso or adaptive lasso, **lessSEM** can automatically compute which $\lambda$ is necessary to set all parameters to zero. This is currently not supported for any of the other penalties. The plots returned by **lessSEM** are either **ggplot2** elements (in case of a single tuning parameter), or created with **plotly** (in case of 2 tuning parameters). You can change a plot post-hoc: ```r plot(fitLasso) + ggplot2::theme_bw() ``` ![plot of chunk unnamed-chunk-7](lessSEMFigures/lessSEM-unnamed-chunk-7-1.png) The `coef` function gives access to all parameter estimates: ```r coef(fitLasso) #> #> Tuning ||--|| Estimates #> ------- ------- ||--|| ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- #> lambda alpha ||--|| l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 #> ======= ======= ||--|| ========== ========== ========== ========== ========== ========== ========== ========== ========== #> 0.1034 1.0000 ||--|| 0.7523 0.7536 0.5742 . . . 0.8812 1.1477 1.9273 #> 0.0776 1.0000 ||--|| 0.7477 0.7480 0.5720 -0.0104 . . 0.8742 1.1523 1.9331 #> 0.0517 1.0000 ||--|| 0.7399 0.7396 0.5688 -0.0266 . -0.0090 0.8631 1.1602 1.9417 #> 0.0259 1.0000 ||--|| 0.7301 0.7332 0.5677 -0.0418 . -0.0478 0.8528 1.1706 1.9482 #> 0.0000 1.0000 ||--|| 0.7239 0.7319 0.5688 -0.0562 0.0166 -0.0894 0.8491 1.1779 1.9496 #> #> #> ---------- ---------- ---------- ---------- #> y4~~y4 y5~~y5 y6~~y6 y7~~y7 #> ========== ========== ========== ========== #> 1.0804 0.5710 0.9628 1.5320 #> 1.0818 0.5705 0.9628 1.5320 #> 1.0838 0.5697 0.9628 1.5312 #> 1.0841 0.5689 0.9628 1.5282 #> 1.0830 0.5685 0.9626 1.5255 ``` If you are only interested in the estimates, use ```r estimates(fitLasso) #> l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 y4~~y4 y5~~y5 y6~~y6 #> [1,] 0.7522904 0.7536140 0.5742228 0.00000000 0.00000000 0.000000000 0.8812416 1.147737 1.927350 1.080353 0.5710041 0.9628056 #> [2,] 0.7476862 0.7480112 0.5720039 -0.01042497 0.00000000 0.000000000 0.8741617 1.152318 1.933141 1.081837 0.5704936 0.9628058 #> [3,] 0.7398991 0.7395932 0.5688300 -0.02660810 0.00000000 -0.008974807 0.8630783 1.160213 1.941722 1.083766 0.5696926 0.9628057 #> [4,] 0.7301019 0.7331646 0.5677000 -0.04181445 0.00000000 -0.047849804 0.8528254 1.170580 1.948163 1.084065 0.5689390 0.9628055 #> [5,] 0.7239003 0.7318701 0.5688230 -0.05624391 0.01658325 -0.089365627 0.8491201 1.177878 1.949609 1.082966 0.5684702 0.9625805 #> y7~~y7 #> [1,] 1.531997 #> [2,] 1.532015 #> [3,] 1.531243 #> [4,] 1.528245 #> [5,] 1.525473 ``` Now, let's assume you also want to try out the scad penalty. In this case, all you have to do is to replace the `lasso()` function with the `scad()` function: ```r fitScad <- scad(lavaanModel = lavaanModel, regularized = regularized, lambdas = seq(0,1,length.out = 4), thetas = seq(2.1, 5,length.out = 2)) ``` The scad penalty has two tuning parmeters $\lambda$ and $\theta$. The naming follows that used by Gong et al. (2013). We can plot the results again, however this requires the **plotly** package and is currently not supported in Rmarkdown. ```r plot(fitScad) ``` The parameter estimates can again be accessed with the `coef()` function: ```r coef(fitScad) #> #> Tuning ||--|| Estimates #> ------- ------- ||--|| ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- #> lambda theta ||--|| l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 #> ======= ======= ||--|| ========== ========== ========== ========== ========== ========== ========== ========== ========== #> 0.0000 2.1000 ||--|| 0.7240 0.7320 0.5689 -0.0562 0.0166 -0.0894 0.8492 1.1778 1.9495 #> 0.3333 2.1000 ||--|| 0.7523 0.7535 0.5742 . . . 0.8812 1.1478 1.9274 #> 0.6667 2.1000 ||--|| 0.7522 0.7536 0.5742 . . . 0.8812 1.1478 1.9274 #> 1.0000 2.1000 ||--|| 0.7522 0.7536 0.5742 . . . 0.8812 1.1478 1.9274 #> 0.0000 5.0000 ||--|| 0.7242 0.7323 0.5690 -0.0562 0.0166 -0.0894 0.8495 1.1776 1.9493 #> 0.3333 5.0000 ||--|| 0.7522 0.7535 0.5740 . . . 0.8811 1.1479 1.9275 #> 0.6667 5.0000 ||--|| 0.7522 0.7535 0.5742 . . . 0.8811 1.1479 1.9274 #> 1.0000 5.0000 ||--|| 0.7522 0.7535 0.5742 . . . 0.8811 1.1478 1.9275 #> #> #> ---------- ---------- ---------- ---------- #> y4~~y4 y5~~y5 y6~~y6 y7~~y7 #> ========== ========== ========== ========== #> 1.0829 0.5684 0.9626 1.5255 #> 1.0804 0.5710 0.9628 1.5320 #> 1.0804 0.5710 0.9628 1.5320 #> 1.0804 0.5710 0.9628 1.5320 #> 1.0829 0.5684 0.9626 1.5255 #> 1.0805 0.5711 0.9628 1.5320 #> 1.0804 0.5710 0.9628 1.5320 #> 1.0804 0.5710 0.9628 1.5320 ``` ### Selecting a model To select a model and report the final parameter estimates, you can use the AIC or BIC. There are two ways to use these information criteria. First, you can compute them and select the model yourself: ```r AICs <- AIC(fitLasso) head(AICs) #> lambda alpha objectiveValue regObjectiveValue m2LL regM2LL nonZeroParameters convergence AIC #> 1 0.10340887 1 1071.078 1071.078 1071.078 1071.078 10 TRUE 1091.078 #> 2 0.07755666 1 1071.033 1071.074 1071.033 1071.074 11 TRUE 1093.033 #> 3 0.05170444 1 1070.956 1071.048 1070.956 1071.048 12 TRUE 1094.956 #> 4 0.02585222 1 1070.851 1070.967 1070.851 1070.967 12 TRUE 1094.851 #> 5 0.00000000 1 1070.810 1070.810 1070.810 1070.810 13 TRUE 1096.810 fitLasso@parameters[which.min(AICs$AIC),] #> lambda alpha l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 y4~~y4 y5~~y5 y6~~y6 y7~~y7 #> 1 0.1034089 1 0.7522904 0.753614 0.5742228 0 0 0 0.8812416 1.147737 1.92735 1.080353 0.5710041 0.9628056 1.531997 ``` An easier way is to use the `coef()` function again: ```r coef(fitLasso, criterion = "AIC") #> #> Tuning ||--|| Estimates #> ------- ------- ||--|| ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- #> lambda alpha ||--|| l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 #> ======= ======= ||--|| ========== ========== ========== ========== ========== ========== ========== ========== ========== #> 0.1034 1.0000 ||--|| 0.7523 0.7536 0.5742 . . . 0.8812 1.1477 1.9273 #> #> #> ---------- ---------- ---------- ---------- #> y4~~y4 y5~~y5 y6~~y6 y7~~y7 #> ========== ========== ========== ========== #> 1.0804 0.5710 0.9628 1.5320 ``` Alternatively, you can extract just the estimates with: ```r estimates(fitLasso, criterion = "AIC") #> l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 y4~~y4 y5~~y5 y6~~y6 y7~~y7 #> [1,] 0.7522904 0.753614 0.5742228 0 0 0 0.8812416 1.147737 1.92735 1.080353 0.5710041 0.9628056 1.531997 ``` #### Cross-Validation A very good alternative to information criteria is the use of cross-validation. In **lessSEM**, there is a dedicated cross-validation function for each of the penalties discussed above. Let's look at the `lsp()` penalty this time. Now, for your non-cross-validated lsp, you would use ```r fitLsp <- lsp(lavaanModel = lavaanModel, regularized = regularized, lambdas = seq(0,1,.1), thetas = seq(.1,2,length.out = 4)) ``` To use a cross-validated version of the lsp, simply use the `cv` prefix. The function is called `cvLsp()`: ```r fitCvLsp <- cvLsp(lavaanModel = lavaanModel, regularized = regularized, lambdas = seq(0,1,.1), thetas = seq(.1,2,length.out = 4)) ``` The best model can now be accessed with ```r coef(fitCvLsp) #> #> Tuning ||--|| Estimates #> ------- ------- ||--|| ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- #> lambda theta ||--|| l2 l3 l4 l5 l6 l7 y1~~y1 y2~~y2 y3~~y3 #> ======= ======= ||--|| ========== ========== ========== ========== ========== ========== ========== ========== ========== #> 1.0000 0.1000 ||--|| 0.7523 0.7536 0.5742 . . . 0.8813 1.1477 1.9273 #> #> #> ---------- ---------- ---------- ---------- #> y4~~y4 y5~~y5 y6~~y6 y7~~y7 #> ========== ========== ========== ========== #> 1.0804 0.5710 0.9628 1.5320 ``` ### Missing Data Most psychological data sets will have missing data. In **lessSEM**, we use the full information maximum likelihood function to account for this missingness. **lessSEM** expects that you already use the full information maximum likelihood method in **lavaan**. ```r # let's simulate data for a simple # cfa with 7 observed variables # and 10 % missing data data <- lessSEM::simulateExampleData(N = 100, loadings = c(rep(1,4), rep(0,3)), percentMissing = 10 ) head(data) #> y1 y2 y3 y4 y5 y6 y7 #> [1,] 0.60367543 -0.3206755 -0.5712115 0.36626658 0.6138552 0.8207451 0.6346473 #> [2,] 0.37497661 2.0100766 -1.5925242 -0.02983920 0.2409065 1.1250778 0.8865902 #> [3,] NA 0.8134143 1.7803075 3.27710938 -0.3651732 NA -0.8283463 #> [4,] -0.04379503 0.1369219 -1.9424719 0.40304282 -0.6435542 1.5412868 0.0635044 #> [5,] -0.32969221 NA -1.6536493 -2.20991516 1.2462449 0.6725163 NA #> [6,] 0.61738032 0.9116425 0.9196841 0.03340633 0.5553805 0.1209500 2.0956358 # we assume a single factor structure lavaanSyntax <- " f =~ l1*y1 + l2*y2 + l3*y3 + l4*y4 + l5*y5 + l6*y6 + l7*y7 f ~~ 1*f " # estimate the model with lavaan lavaanModel <- cfa(lavaanSyntax, data = data, missing = "ml") # important: use fiml for missing data ``` Note that we added the argument `missing = 'ml'` to the **lavaan** model. This tells **lavaan** to use the full information maximum likelihood function. Next, pass this model to any of the penalty functions in **lessSEM**. **lessSEM** will automatically switch to the full information maximum likelihood function as well: ```r fitLasso <- lasso(lavaanModel = lavaanModel, regularized = regularized, nLambdas = 10) ``` To check if **lessSEM** did actually use the full information maximum likelihood, we can compare the 2log-likelihood of **lavaan** and **lessSEM** when no penalty is used ($\lambda = 0$): ```r fitLasso <- lasso(lavaanModel = lavaanModel, regularized = regularized, lambdas = 0) ``` ```r fitLasso@fits$m2LL #> [1] 2034.104 ``` Compare this to: ```r -2*logLik(lavaanModel) #> 'log Lik.' 2034.104 (df=20) ``` ## Using multiple cores By default, **lessSEM** will only use one computer core. However, if a model has many parameters, parallel computations can be faster. Multi-Core support is therefore provided using the **RcppParallel** package (Allaire et. al, 2023). To make use of multiple cores, the number of cores must be specified in the `control` argument (see below). Before doing that, it makes sense to check how many cores the computer has: ```r library(RcppParallel) # Print the number of threads (we call them cores for simplicity, but technically they are threads) RcppParallel::defaultNumThreads() #> [1] 16 ``` Note that using all cores can block the computer because there are no resources left for other tasks than R. To use 2 cores, we can set `nCores = 2` as follows: ```r fitLasso <- lasso(lavaanModel = lavaanModel, regularized = regularized, nLambdas = 10, control = controlGlmnet(nCores = 2)) ``` Note that multi-core support is only provided for SEM. Using the optimizers implemented in **lessSEM** for models other than SEM (e.g., in the [**lessLM**](https://github.com/jhorzek/lessLM) package) will not automatically allow for multi-core execution. ## Changing the optimizer **lessSEM** comes with two specialized optimization procedures: ista and glmnet. Currently, the default is glmnet for all penalties. Ista does not require the computation of a Hessian matrix. However, this comes at a price: ista optimization tends to call the fit and gradient function a lot more than glment. We recommend that you first test the glmnet optimizer and then switch to ista if glmnet results in errors due to the Hessian matrix. Switching to ista is done as follows: ```r fitLasso <- lasso(lavaanModel = lavaanModel, regularized = regularized, nLambdas = 10, method = "ista", # change the method control = controlIsta() # change the control argument ) ``` ## Parameter transformations **lessSEM** allows for parameter transformations. This is explained in detail in the vignette Parameter-transformations (see `vignette("Parameter-transformations", package = "lessSEM")`). To provide a short example, let's have a look at the political democracy data set: ```r # example from ?lavaan::sem library(lavaan) modelSyntax <- ' # latent variable definitions ind60 =~ x1 + x2 + x3 dem60 =~ y1 + a*y2 + b*y3 + c*y4 dem65 =~ y5 + a*y6 + b*y7 + c*y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8 ' lavaanFit <- sem(model = modelSyntax, data = PoliticalDemocracy) ``` Note that in the model estimated above, loadings on the latent variables are constrained to equality over time. We could also relax this assumption by allowing for time point specific loadings: ```r library(lavaan) modelSyntax <- ' # latent variable definitions ind60 =~ x1 + x2 + x3 dem60 =~ y1 + a1*y2 + b1*y3 + c1*y4 dem65 =~ y5 + a2*y6 + b2*y7 + c2*y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8 ' lavaanFit <- sem(model = modelSyntax, data = PoliticalDemocracy) ``` Deciding between both approaches can be difficult as there may be some parameters for which equality over time holds, while others violate the assumption. Here, transformations can be used to regularize differences between parameters. To this end, we define the transformations: ```r transformations <- " // IMPORTANT: Our transformations always have to start with the follwing line: parameters: a1, a2, b1, b2, c1, c2, delta_a2, delta_b2, delta_c2 // In the line above, we defined the names of the parameters which we // want to use in our transformations. EACH AND EVERY PARAMETER USED IN // THE FOLLOWING MUST BE STATED ABOVE. The line must always start with // the keyword 'parameters' followed by a colon. The parameters must be // separated by commata. // Comments are added with double-backslash // Now we can state our transformations: a2 = a1 + delta_a2; // statements must end with semicolon b2 = b1 + delta_b2; c2 = c1 + delta_c2; " ``` Next, we have to pass the `transformations` variable to the penalty function: ```r lassoFit <- lasso(lavaanModel = lavaanFit, regularized = c("delta_a2", "delta_b2", "delta_c2"),# we want to regularize # the differences between the parameters nLambdas = 100, # Our model modification must make use of the modifyModel - function: modifyModel = modifyModel(transformations = transformations) ) ``` To check if measurement invariance can be assumed, we can select the best model using information criteria: ```r coef(lassoFit, criterion = "BIC") #> #> Tuning ||--|| Estimates #> ------- ------- ||--|| ---------- ---------- ---------- ---------- ---------- ----------- ----------- ----------- ---------- #> lambda alpha ||--|| ind60=~x2 ind60=~x3 a1 b1 c1 dem60~ind60 dem65~ind60 dem65~dem60 y1~~y5 #> ======= ======= ||--|| ========== ========== ========== ========== ========== =========== =========== =========== ========== #> 0.2216 1.0000 ||--|| 2.1825 1.8189 1.2110 1.1679 1.2340 1.4534 0.5935 0.8659 0.5552 #> #> #> ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- #> y2~~y4 y3~~y7 y4~~y8 y6~~y8 x1~~x1 x2~~x2 x3~~x3 y1~~y1 y2~~y2 y3~~y3 y4~~y4 #> ========== ========== ========== ========== ========== ========== ========== ========== ========== ========== ========== #> 1.5947 0.7807 0.6537 1.5350 0.0820 0.1177 0.4675 1.7929 7.3843 5.0175 3.4074 #> #> ||--|| #> ---------- ---------- ---------- ---------- ------------ ------------ ------------ ---------- ---------- ---------- ||--|| #> y5~~y5 y6~~y6 y7~~y7 y8~~y8 ind60~~ind60 dem60~~dem60 dem65~~dem65 delta_a2 delta_b2 delta_c2 ||--|| #> ========== ========== ========== ========== ============ ============ ============ ========== ========== ========== ||--|| #> 2.2857 4.8977 3.5510 3.4511 0.4480 3.9408 0.2034 . . . ||--|| #> #> Transform #> ---------- ---------- ---------- #> a2 b2 c2 #> ========== ========== ========== #> 1.2110 1.1679 1.2340 ``` More details are provided in `vignette("Parameter-transformations", package = "lessSEM")`. # Experimental Features The following features are relatively new and you may still experience some bugs. Please be aware of that when using these features. ## From **lessSEM** to **lavaan** **lessSEM** supports exporting specific models to **lavaan**. This can be very useful when plotting the final model. ```r lavaanModel <- lessSEM2Lavaan(regularizedSEM = rsem, criterion = "BIC") ``` The result can be plotted with, for instance, [**semPlot**](https://github.com/SachaEpskamp/semPlot): ```r library(semPlot) semPaths(lavaanModel, what = "est", fade = FALSE) ``` ## Multi-Group Models and Definition Variables **lessSEM** supports multi-group SEM and, to some degree, definition variables. Regularized multi-group SEM have been proposed by Huang (2018) and are implemented in **lslx** (Huang, 2020). Here, differences between groups are regularized. A detailed introduction can be found in `vignette(topic = "Definition-Variables-and-Multi-Group-SEM", package = "lessSEM")`. Therein it is also explained how the multi-group SEM can be used to implement definition variables (e.g., for latent growth curve models). ## Mixed Penalties **lessSEM** allows for defining different penalties for different parts of the model. This feature is new and very experimental. Please keep that in mind when using the procedure. A detailed introduction can be found in `vignette(topic = "Mixed-Penalties", package = "lessSEM")`. To provide a short example, we will regularize the loadings and the regression parameters of the Political Democracy data set with different penalties. The following script is adapted from `?lavaan::sem`. ```r model <- ' # latent variable definitions ind60 =~ x1 + x2 + x3 + c2*y2 + c3*y3 + c4*y4 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + c*y8 # regressions dem60 ~ r1*ind60 dem65 ~ r2*ind60 + r3*dem60 ' lavaanModel <- sem(model, data = PoliticalDemocracy) # Let's add a lasso penalty on the cross-loadings c2 - c4 and # scad penalty on the regressions r1-r3 mp <- lavaanModel |> mixedPenalty() |> addLasso(regularized = c("c2", "c3", "c4"), lambdas = seq(0,1,.1)) |> addLasso(regularized = c("r1", "r2", "r3"), lambdas = seq(0,1,.2)) |> fit() ``` The best model according to the BIC can be extracted with: ```r coef(fitMp, criterion = "BIC") ``` # More information We provide more information in the documentation of the individual functions. For instance, see `?lessSEM::lasso` for more details on the lasso penalty. If you are interested in the general purpose interface, have a look at `?lessEM::gpLasso`, `?lesssEM::gpMcp`, etc. To get more details on implementing the **lessSEM** optimizers in your own package, have a look at the vignettes `vignette('General-Purpose-Optimization')` and `vignette('The-optimizer-interface')` and at the [**lessLM**](https://github.com/jhorzek/lessLM) package. # References ## R - Packages / Software * [lavaan](https://github.com/yrosseel/lavaan) Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02 * [regsem](https://github.com/Rjacobucci/regsem): Jacobucci, R. (2017). regsem: Regularized Structural Equation Modeling. ArXiv:1703.08489 [Stat]. https://arxiv.org/abs/1703.08489 * [lslx](https://github.com/psyphh/lslx): Huang, P.-H. (2020). lslx: Semi-confirmatory structural equation modeling via penalized likelihood. Journal of Statistical Software, 93(7). https://doi.org/10.18637/jss.v093.i07 * [fasta](https://cran.r-project.org/package=fasta): Another implementation of the fista algorithm (Beck & Teboulle, 2009) * [ensmallen](https://ensmallen.org/): Curtin, R. R., Edel, M., Prabhu, R. G., Basak, S., Lou, Z., & Sanderson, C. (2021). The ensmallen library for flexible numerical optimization. Journal of Machine Learning Research, 22, 1–6. * [RcppParallel](https://rcppcore.github.io/RcppParallel/) Allaire J, Francois R, Ushey K, Vandenbrouck G, Geelnard M, Intel (2023). _RcppParallel: Parallel Programming Tools for 'Rcpp'_. R package version 5.1.6, . ## Regularized Structural Equation Modeling * Huang, P.-H., Chen, H., & Weng, L.-J. (2017). A Penalized Likelihood Method for Structural Equation Modeling. Psychometrika, 82(2), 329–354. https://doi.org/10.1007/s11336-017-9566-9 * Huang, P.-H. (2018). A penalized likelihood method for multi-group structural equation modelling. British Journal of Mathematical and Statistical Psychology, 71(3), 499–522. https://doi.org/10.1111/bmsp.12130 * Jacobucci, R., Grimm, K. J., & McArdle, J. J. (2016). Regularized Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 555–566. https://doi.org/10.1080/10705511.2016.1154793 ## Penalty Functions * Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing Sparsity by Reweighted l1 Minimization. Journal of Fourier Analysis and Applications, 14(5–6), 877–905. https://doi.org/10.1007/s00041-008-9045-x * Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273 * Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634 * Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. * Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942. https://doi.org/10.1214/09-AOS729 * Zhang, T. (2010). Analysis of Multi-stage Convex Relaxation for Sparse Regularization. Journal of Machine Learning Research, 11, 1081–1107. * Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429. https://doi.org/10.1198/016214506000000735 * Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x ## Optimizer ### GLMNET * Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01 * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). An improved GLMNET for l1-regularized logistic regression. The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421 ### Variants of ISTA * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1), 183–202. https://doi.org/10.1137/080716542 * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. Proceedings of the 30th International Conference on Machine Learning, 28(2)(2), 37–45. * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and Trends in Optimization, 1(3), 123–231. # LICENSE NOTE THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.