--- title: "General-Purpose-Optimization" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{General-Purpose-Optimization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- **lessSEM** can be used for regularized SEM and for general purpose optimization. That is, you can use all optimizers and penalty functions implemented in **lessSEM** for your own models. To this end, you must define a fitting function; i.e., a function which takes in the parameters and returns a single value - the unregularized fit. **lessSEM** uses this fitting function and adds the penalty terms. The combined fitting function is then optimized. Currently, there are four ways to use the optimizers in **lessSEM** 1. You can use the R interface. This interface is very similar to that of optim (see e.g., `?lessSEM::gpLasso`). 2. If your functions are defined in C++, you can use a faster interface which is a bit more involved (see e.g., `?lessSEM::gpLassoCpp`). 3. You can include the header files of **lessSEM** in your package to directly interface to the underlying C++ functions. This is the most complicated approach. 4. The optimizers are implemented in the separate C++ header only library [lesstimate](https://jhorzek.github.io/lesstimate/) that can be used as a submodule in R packages. In general, the approaches get faster as you transition from 1 to 4. You will see the largest performance gains when implementing a gradient function and not just a fitting function, however. As a rule of thumb: Use approach 1 if you intend to run your model a few times, don't want to create a new package and your model runs fairly fast. Use approach 2 if you want to increase the speed a bit, while keeping the changes necessary to your files manageable. Use approach 3 or 4 if you create a new package, have some experience with **RcppArmadillo** and want to get the best performance. In the following, we will demonstrate all three approaches using a linear regression model as an example. ## The example Let's start by setting up our linear regression model. To this end, we will simulate a data set: ```r set.seed(123) # first, we simulate data for our # linear regression. N <- 100 # number of persons p <- 10 # number of predictors X <- matrix(rnorm(N*p), nrow = N, ncol = p) # design matrix b <- c(rep(1,4), rep(0,6)) # true regression weights y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2) ``` ## The first approach: Interfacing from R We will now try to implement a lasso regularized linear regression using the gpLasso interface. This interface is very similar to `optim`. To use it, we must define our fitting function in R: ```r # defining the sum-squared-errors: sseFun <- function(par, y, X, N){ # par is the parameter vector # y is the observed dependent variable # X is the design matrix # N is the sample size pred <- X %*% matrix(par, ncol = 1) #be explicit here: # we need par to be a column vector sse <- sum((y - pred)^2) # we scale with .5/N to get the same results as glmnet return((.5/N)*sse) } ``` Additionally, we need a *labeled* vector with starting values: ```r par <- rep(0, p+1) names(par) <- paste0("b", 0:p) print(par) #> b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 #> 0 0 0 0 0 0 0 0 0 0 0 ``` Note that we defined one more parameter than there are variables in X. This is because we also want to estimate the intercept. To this end, we extend X: ```r Xext <- cbind(1,X) head(Xext) #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] #> [1,] 1 -0.56047565 -0.71040656 2.1988103 -0.7152422 -0.07355602 -0.60189285 1.07401226 -0.7282191 0.3562833 -1.0141142 #> [2,] 1 -0.23017749 0.25688371 1.3124130 -0.7526890 -1.16865142 -0.99369859 -0.02734697 -1.5404424 -0.6580102 -0.7913139 #> [3,] 1 1.55870831 -0.24669188 -0.2651451 -0.9385387 -0.63474826 1.02678506 -0.03333034 -0.6930946 0.8552022 0.2995937 #> [4,] 1 0.07050839 -0.34754260 0.5431941 -1.0525133 -0.02884155 0.75106130 -1.51606762 0.1188494 1.1529362 1.6390519 #> [5,] 1 0.12928774 -0.95161857 -0.4143399 -0.4371595 0.67069597 -1.50916654 0.79038534 -1.3647095 0.2762746 1.0846170 #> [6,] 1 1.71506499 -0.04502772 -0.4762469 0.3311792 -1.65054654 -0.09514745 -0.21073418 0.5899827 0.1441047 -0.6245675 ``` Finally, we need to decide which parameters should be regularized and the values for lambda. We want to regularize everything except for the intercept: ```r (regularized <- paste0("b", 1:p)) #> [1] "b1" "b2" "b3" "b4" "b5" "b6" "b7" "b8" "b9" "b10" lambdas <- seq(0,.1,length.out = 20) ``` Now, we are ready to estimate the model: ```r library(lessSEM) l1 <- gpLasso(par = par, regularized = regularized, fn = sseFun, lambdas = lambdas, X = Xext, y = y, N = length(y) ) head(l1@parameters) ``` ``` #> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8 #> 1 0.000000000 1 0 0.02738472 1.0129194 0.9991454 0.9705725 1.027626 0.014036009 -0.007460964 0.0185899238 0.021930771 #> 2 0.005263158 1 0 0.02935302 1.0043737 0.9908934 0.9626258 1.025138 0.003365832 0.000000000 0.0143411319 0.015434822 #> 3 0.010526316 1 0 0.02995132 0.9967095 0.9846674 0.9552789 1.021891 0.000000000 0.000000000 0.0096220740 0.010707768 #> 4 0.015789474 1 0 0.03010607 0.9897339 0.9789423 0.9481496 1.018672 0.000000000 0.000000000 0.0049333012 0.006364327 #> 5 0.021052632 1 0 0.03029739 0.9827288 0.9732058 0.9409861 1.015363 0.000000000 0.000000000 0.0001782005 0.002036124 #> 6 0.026315789 1 0 0.03112551 0.9753354 0.9670622 0.9338614 1.011552 0.000000000 0.000000000 0.0000000000 0.000000000 #> b9 b10 #> 1 -0.009900077 0.027401044 #> 2 -0.007939443 0.022297575 #> 3 -0.005256845 0.017465446 #> 4 -0.002392921 0.012713123 #> 5 0.000000000 0.007969754 #> 6 0.000000000 0.003304080 ``` Note that we did not specify the gradients of our function. In this case, **lessSEM** will use **numDeriv** to compute the gradients. However, if you know how to specify the gradients, this can result in faster estimation: ```r sseGrad <- function(par, y, X, N){ gradients = (-2.0*t(X) %*% y + 2.0*t(X)%*%X%*%matrix(par,ncol = 1)) gradients = (.5/length(y))*gradients return(t(gradients)) } ``` ```r l1 <- gpLasso(par = par, regularized = regularized, fn = sseFun, gr = sseGrad, lambdas = lambdas, X = Xext, y = y, N = length(y) ) head(l1@parameters) ``` ``` #> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8 #> 1 0.000000000 1 0 0.02738485 1.0129200 0.9991452 0.9705725 1.027626 0.014034994 -0.007460252 0.0185901898 0.021930702 #> 2 0.005263158 1 0 0.02935325 1.0043732 0.9908928 0.9626258 1.025139 0.003364951 0.000000000 0.0143418480 0.015434447 #> 3 0.010526316 1 0 0.02995023 0.9967094 0.9846669 0.9552792 1.021892 0.000000000 0.000000000 0.0096217574 0.010707383 #> 4 0.015789474 1 0 0.03010649 0.9897330 0.9789426 0.9481493 1.018672 0.000000000 0.000000000 0.0049332838 0.006363868 #> 5 0.021052632 1 0 0.03029729 0.9827286 0.9732062 0.9409869 1.015362 0.000000000 0.000000000 0.0001772169 0.002036300 #> 6 0.026315789 1 0 0.03112481 0.9753368 0.9670620 0.9338616 1.011553 0.000000000 0.000000000 0.0000000000 0.000000000 #> b9 b10 #> 1 -0.009900699 0.027400748 #> 2 -0.007939640 0.022297136 #> 3 -0.005257048 0.017465134 #> 4 -0.002393522 0.012713116 #> 5 0.000000000 0.007969996 #> 6 0.000000000 0.003303777 ``` Here is a short comparison of running both models 5 times each: Runtime in seconds without gradients: ``` #> [1] 0.2155101 0.2029781 0.1960189 0.2026141 0.2020009 ``` Runtime in seconds with gradients: ``` #> [1] 0.02016687 0.02007699 0.02003384 0.01954794 0.02730393 ``` That's quite a speedup! Note that you can also pass a C++ function to gpLasso similar to the approach above: ```r library(RcppArmadillo) library(Rcpp) linreg <- ' // [[Rcpp::depends(RcppArmadillo)]] #include // [[Rcpp::export]] double fitfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){ // compute the sum of squared errors: arma::mat sse = arma::trans(y-X*parameters)*(y-X*parameters); // other packages, such as glmnet, scale the sse with // 1/(2*N), where N is the sample size. We will do that here as well sse *= 1.0/(2.0 * N); // note: We must return a double, but the sse is a matrix // To get a double, just return the single value that is in // this matrix: return(sse(0,0)); } // [[Rcpp::export]] arma::rowvec gradientfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){ // note: we want to return our gradients as row-vector; therefore, // we have to transpose the resulting column-vector: arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*parameters); // other packages, such as glmnet, scale the sse with // 1/(2*N), where N is the sample size. We will do that here as well gradients *= (.5/N); return(gradients); }' Rcpp::sourceCpp(code = linreg) ``` Run the model as before: ```r l1 <- gpLasso(par = par, regularized = regularized, fn = fitfunction, gr = gradientfunction, lambdas = lambdas, X = Xext, y = y, N = length(y) ) head(l1@parameters) ``` ``` #> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8 #> 1 0.000000000 1 0 0.02738527 1.0129194 0.9991448 0.9705722 1.027624 0.014035744 -0.007459583 0.0185892822 0.021930374 #> 2 0.005263158 1 0 0.02935316 1.0043741 0.9908936 0.9626259 1.025139 0.003366124 0.000000000 0.0143421453 0.015435255 #> 3 0.010526316 1 0 0.02995019 0.9967089 0.9846670 0.9552796 1.021892 0.000000000 0.000000000 0.0096214361 0.010707812 #> 4 0.015789474 1 0 0.03010669 0.9897326 0.9789426 0.9481493 1.018672 0.000000000 0.000000000 0.0049330908 0.006364232 #> 5 0.021052632 1 0 0.03029700 0.9827285 0.9732063 0.9409868 1.015362 0.000000000 0.000000000 0.0001764845 0.002037008 #> 6 0.026315789 1 0 0.03112464 0.9753365 0.9670623 0.9338615 1.011553 0.000000000 0.000000000 0.0000000000 0.000000000 #> b9 b10 #> 1 -0.009900549 0.027401230 #> 2 -0.007938658 0.022297962 #> 3 -0.005256606 0.017465214 #> 4 -0.002393757 0.012713307 #> 5 0.000000000 0.007971003 #> 6 0.000000000 0.003304894 ``` The runtime in seconds with C++ is: ``` #> [1] 0.01715302 0.01618695 0.01546001 0.01603198 0.01513100 ``` Which is even lower than what we had before! ## The second approach: Using C++ function pointers While using the Rcpp functions defined above was quite fast for our linear regression, it can still be fairly slow for more involved models (e.g., SEM). This is due to our optimizer having to go back and forth between R and C++. To reduce this overhead, we can use the second approach. Here, instead of passing an Rcpp function which is then executed in R, we pass a pointer to the underlying C++ functions. This approach is more constrained than the one presented above: 1. We must define both, a fitting function and a gradient function in Rcpp. We cannot rely on numDeriv any more! 2. The fitting function and the gradient function are only allowed two parameters each: a `const Rcpp::NumericVector&` (the parameters) and an `Rcpp::List&` (everything else). While this seems restrictive, note that we can virtually pass anything we want in a list. 3. We must [create pointers](https://gallery.rcpp.org/articles/passing-cpp-function-pointers/) to the fit and gradient function. This is difficult, however we will provide some guidance below. This may be a bit overwhelming at first, so we will go through it step by step. ### 1. Creating a fitting function and a gradient function We already defined a fitting function and a gradient function for our linear regression model in the example above. However, we often do not know the gradients in closed form. If you don't have a gradient function, you can try a numerical approximation. More details can be found [here](https://en.wikipedia.org/wiki/Finite_difference). ### 2. Adapting the functions to the constraints Note that our fitting function and our gradient function do not comply with the constraints mentioned above. That is, they do take more than two parameters as arguments (`const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N`), and these arguments are not a `const Rcpp::NumericVector&` and an `Rcpp::List&`. How can we make this work? The parameter vector `const Rcpp::NumericVector&` will hold all elements in the `arma::colvec pararameters` of our old function. The `Rcpp::List&` must contain all of the other elements (`X,y,N`). Let's start by creating this list, which we will call data: ```r data <- list("X" = Xext, "y" = y, "N" = length(y)) ``` Next, we have to change our functions to make things work: ```r linreg <- ' // [[Rcpp::depends(RcppArmadillo)]] #include // [[Rcpp::export]] double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){ // our function now only takes the two specified arguments: a // const Rcpp::NumericVector& and an Rcpp::List&. // We have to extract all elements from the list: arma::colvec y = Rcpp::as(data["y"]); // the dependent variable arma::mat X = Rcpp::as(data["X"]); // the design matrix int N = Rcpp::as(data["N"]); // the sample size // Next, we want to get the parameters as a column-vector: arma::colvec b = Rcpp::as(parameters); // compute the sum of squared errors: arma::mat sse = arma::trans(y-X*b)*(y-X*b); // other packages, such as glmnet, scale the sse with // 1/(2*N), where N is the sample size. We will do that here as well sse *= 1.0/(2.0 * N); // note: We must return a double, but the sse is a matrix // To get a double, just return the single value that is in // this matrix: return(sse(0,0)); } // [[Rcpp::export]] arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){ // our function now only takes the two specified arguments: a // const Rcpp::NumericVector& and an Rcpp::List&. // We have to extract all elements from the list: arma::colvec y = Rcpp::as(data["y"]); // the dependent variable arma::mat X = Rcpp::as(data["X"]); // the design matrix int N = Rcpp::as(data["N"]); // the sample size // Next, we want to get the parameters as a column-vector: arma::colvec b = Rcpp::as(parameters); // note: we want to return our gradients as row-vector; therefore, // we have to transpose the resulting column-vector: arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b); // other packages, such as glmnet, scale the sse with // 1/(2*N), where N is the sample size. We will do that here as well gradients *= (.5/N); return(gradients); } ' ``` That's it, our functions have been transformed! ### Step 3: Creating pointers to our functions This is where it get's really tricky! We can't just pass our functions to C++. However, we can [create pointers](https://gallery.rcpp.org/articles/passing-cpp-function-pointers/). These have to be generated in C++ and this can be tricky to get right. To simplify the process, we have created a function which helps setting things up: ```r cat(lessSEM::makePtrs(fitFunName = "fitfunction", # name of the function in C++ gradFunName = "gradientfunction" # name of the function in C++ ) ) #> #> // INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS #> #> // IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES #> // // [[Rcpp::depends(RcppArmadillo)]] #> // #include #> #> // Dirk Eddelbuettel at #> // https://gallery.rcpp.org/articles/passing-cpp-function-pointers/ #> #> typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters #> Rcpp::List& //additional elements #> ); #> typedef Rcpp::XPtr fitFunPtr_t; #> #> typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters #> Rcpp::List& //additional elements #> ); #> typedef Rcpp::XPtr gradientFunPtr_t; #> #> // [[Rcpp::export]] #> fitFunPtr_t fitfunctionPtr() { #> return(fitFunPtr_t(new fitFunPtr(&fitfunction))); #> } #> #> // [[Rcpp::export]] #> gradientFunPtr_t gradientfunctionPtr() { #> return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction))); #> } ``` Let's follow the instructions and add the lines to our C++ functions: ```r linreg <- ' // [[Rcpp::depends(RcppArmadillo)]] #include // [[Rcpp::export]] double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){ // our function now only takes the two specified arguments: a // const Rcpp::NumericVector& and an Rcpp::List&. // We have to extract all elements from the list: arma::colvec y = Rcpp::as(data["y"]); // the dependent variable arma::mat X = Rcpp::as(data["X"]); // the design matrix int N = Rcpp::as(data["N"]); // the sample size // Next, we want to get the parameters as a column-vector: arma::colvec b = Rcpp::as(parameters); // compute the sum of squared errors: arma::mat sse = arma::trans(y-X*b)*(y-X*b); // other packages, such as glmnet, scale the sse with // 1/(2*N), where N is the sample size. We will do that here as well sse *= 1.0/(2.0 * N); // note: We must return a double, but the sse is a matrix // To get a double, just return the single value that is in // this matrix: return(sse(0,0)); } // [[Rcpp::export]] arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){ // our function now only takes the two specified arguments: a // const Rcpp::NumericVector& and an Rcpp::List&. // We have to extract all elements from the list: arma::colvec y = Rcpp::as(data["y"]); // the dependent variable arma::mat X = Rcpp::as(data["X"]); // the design matrix int N = Rcpp::as(data["N"]); // the sample size // Next, we want to get the parameters as a column-vector: arma::colvec b = Rcpp::as(parameters); // note: we want to return our gradients as row-vector; therefore, // we have to transpose the resulting column-vector: arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b); // other packages, such as glmnet, scale the sse with // 1/(2*N), where N is the sample size. We will do that here as well gradients *= (.5/N); return(gradients); } /// THE FOLLOWING PART IS NEW: // INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS // IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES // // [[Rcpp::depends(RcppArmadillo)]] // #include // Dirk Eddelbuettel at // https://gallery.rcpp.org/articles/passing-cpp-function-pointers/ typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters Rcpp::List& //additional elements ); typedef Rcpp::XPtr fitFunPtr_t; typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters Rcpp::List& //additional elements ); typedef Rcpp::XPtr gradientFunPtr_t; // [[Rcpp::export]] fitFunPtr_t fitfunctionPtr() { return(fitFunPtr_t(new fitFunPtr(&fitfunction))); } // [[Rcpp::export]] gradientFunPtr_t gradientfunctionPtr() { return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction))); } ' ``` Compile the functions using Rcpp: ```r Rcpp::sourceCpp(code = linreg) ``` Great! Now that this is out of the way, we can create the pointers to our functions: ```r ffp <- fitfunctionPtr() # create the pointer to the fitting function # Note that the name of this function will depend on the name of your fitting function. # For instance, if your fitting function is called sse, then the pointer will be created # with ffp <- ssePtr() gfp <- gradientfunctionPtr() # create the pointer to the gradient function # Note that the name of this function will depend on the name of your gradient function. # For instance, if your gradient function is called sseGradient, then the pointer will be created # with gfp <- sseGradientPtr() ``` ### Optimizing the model The last step is to call the general purpose optimization. To this end, use the `gpLassoCpp` function: ```r l1 <- gpLassoCpp(par = par, regularized = regularized, # important: pass the poinnters! fn = ffp, gr = gfp, lambdas = lambdas, # finally, pass the list which the fitting function and the # gradient function need: additionalArguments = data ) head(l1@parameters) ``` ``` #> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8 #> 1 0.000000000 1 0 0.02738542 1.0129198 0.9991455 0.9705733 1.027625 0.014037188 -0.007460885 0.0185907073 0.021930984 #> 2 0.005263158 1 0 0.02935271 1.0043736 0.9908928 0.9626259 1.025139 0.003365862 0.000000000 0.0143413234 0.015434697 #> 3 0.010526316 1 0 0.02995027 0.9967094 0.9846668 0.9552792 1.021892 0.000000000 0.000000000 0.0096220212 0.010707439 #> 4 0.015789474 1 0 0.03010668 0.9897329 0.9789425 0.9481493 1.018672 0.000000000 0.000000000 0.0049333261 0.006364019 #> 5 0.021052632 1 0 0.03029739 0.9827288 0.9732059 0.9409868 1.015363 0.000000000 0.000000000 0.0001773121 0.002036026 #> 6 0.026315789 1 0 0.03112461 0.9753368 0.9670620 0.9338617 1.011553 0.000000000 0.000000000 0.0000000000 0.000000000 #> b9 b10 #> 1 -0.009900023 0.027401255 #> 2 -0.007939417 0.022297387 #> 3 -0.005256688 0.017464767 #> 4 -0.002393495 0.012713141 #> 5 0.000000000 0.007969742 #> 6 0.000000000 0.003303734 ``` Benchmarking this approach results in: ``` #> [1] 0.01474500 0.01400304 0.01196003 0.01131010 0.01107502 ``` So, we have reduced our runtime even more! ## The third and fourth approach: Including the header files This approach requires a more elaborate setup which is why we have created a whole package to demonstrate it. You will find more information in the vignette `The-optimizer-interface` and in the [lessLM](https://github.com/jhorzek/lessLM) package. If you just want the optimizers and don't want to depend on the **lessSEM** package, we recommend that you copy the [lesstimate](https://jhorzek.github.io/lesstimate/) C++ library in your packages inst/include folder. It will come to the same parameter estimates: ``` #> b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 #> [1,] 0.02734701 1.0129361 0.9991629 0.9705501 1.027728 0.013993181 -0.007491533 0.0186210155 0.021974963 -0.009975776 #> [2,] 0.02939675 1.0043635 0.9908681 0.9626493 1.025035 0.003400965 0.000000000 0.0143089363 0.015388336 -0.007860992 #> [3,] 0.02998680 0.9967117 0.9846504 0.9552960 1.021803 0.000000000 0.000000000 0.0095927088 0.010679371 -0.005183552 #> [4,] 0.03006777 0.9897315 0.9789595 0.9481301 1.018774 0.000000000 0.000000000 0.0049636831 0.006397664 -0.002474334 #> [5,] 0.03032345 0.9827556 0.9731799 0.9409662 1.015441 0.000000000 0.000000000 0.0001374354 0.002100692 0.000000000 #> [6,] 0.03111085 0.9753325 0.9670615 0.9338506 1.011607 0.000000000 0.000000000 0.0000000000 0.000000000 0.000000000 #> b10 #> [1,] 0.027466466 #> [2,] 0.022231196 #> [3,] 0.017406513 #> [4,] 0.012779758 #> [5,] 0.008033475 #> [6,] 0.003352431 ``` And the run times are even lower: ``` #> [1] 0.002497911 0.001735926 0.001685858 0.001821041 0.001929998 ```