---
title: "Latent-Growth-Curve"
output: rmarkdown::html_vignette
bibliography: mxsem.bib
csl: apa.csl
vignette: >
  %\VignetteIndexEntry{Latent-Growth-Curve}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\DeclareUnicodeCharacter{2194}{$\leftrightarrow$}
  %\DeclareUnicodeCharacter{2192}{$\rightarrow$}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, include=FALSE}
library(mxsem)
```

Latent growth curve models are structural equation models (SEMs) that are used
to analyze longitudinal data. Let's assume that we measured variable `y` at 
five distinct measurement occasions: (`y1`-`y5`). Our dataset could look as
follows:
```{r, echo=FALSE}
set.seed(123)
head(mxsem::simulate_latent_growth_curve(N = 100)[,paste0("y",1:5)])
```

Let us further assume that the measurements took place at baseline (time $t_1 = 0$),
after $t_2 = 1$ week, $t_3 = 5$ weeks, $t_4 = 7$ weeks, and $t_5 = 11$ weeks. With 
linear latent growth curve models,
the observations of individual $i$ at the five measurement occasions $u=1,...,5$ are predicted 
with a latent intercept ($I$) and a latent slope ($S$) using the equation 
$$y_{it_u} = I_i + S_i\times t_u+\varepsilon_{it_u}$$
Note how time is used as a predictor here; that is, we assume a linear growth over
time. However, we also assume that individuals may differ in the intercept $I$ and
the slope $S$. More precisely, we assume that $I$ and $S$ are mutivariate normally
distributed and $I$ may be $0.4$ for the first person, but $-1.2$ for the second. SEMs allow us
to capture these assumptions in a single model:

![](figures/latent_growth_curve.png)

In the Figure shown above, blue paths denote estimated parameters and gray
paths are fixed to specific values. Note that the paths of the latent intercept 
($I$) to the observations ($y_1$-$y_5$) are constrained to $1$, while the paths of
the latent slope ($S$) are set to the times
$t_1 = 0$, $t_2 = 1$, $t_3 = 5$, $t_4 = 7$, and $t_5 = 11$. Because $I$ and $S$ 
are modeled as latent variables
with variances, covariances, and means, the model allows for person-specific
parameters (this is identical to a random effect in mixed models).

Such latent growth curve models can be set up with **lavaan** [@rosseelLavaanPackageStructural2012]. 
For instance, the model shown above can be defined with:
```{r}
model <- "
  # specify latent intercept
     I =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
  # specify latent slope
     S =~ 0 * y1 + 1 * y2 + 5 * y3 + 7 * y4 + 11 * y5
    
  # specify means of latent intercept and slope
     I ~ int*1
     S ~ slp*1
  
  # set intercepts of manifest variables to zero
     y1 ~ 0*1; y2 ~ 0*1; y3 ~ 0*1; y4 ~ 0*1; y5 ~ 0*1;
  "
```

## Person-Specific Occasions

> **Note**: Such models can also be specified with **metaSEM** [@cheungMetaSEMPackageMetaanalysis2015]

In the model outlined above it was assumed that all individuals were observed
at the same time points ($0$, $1$, $5$, $7$, and $11$). In many studies, however,
this is not the case. For instance, measurements may have been at random occasions
to provide more insights into everyday life. Or reports may have been provided
by the participants at self-selected occasions. The following is an example of
such a data set:
```{r}
library(mxsem)
lgc_dat <- simulate_latent_growth_curve(N = 100)
head(lgc_dat)
```
The columns `t_1`-`t_5` indicate the person-specific time points of observations.
For our latent growth curve model this implies that the loading of the latent 
slope variable `S` on the observations `y1`-`y4` must be person-specific.
This is expressed in the following equation, where the time $t_{ui}$ is person-specific:
$$y_{it_{ui}} = I_i + S_i\times t_{ui}+\varepsilon_{it_{ui}}$$ 
To this end, so-called definition variables are used [see @mehtaPuttingIndividualBack2000;
@sterbaFittingNonlinearLatent2014].
With **mxsem**, this can be achieved as follows:

```{r}
model <- "
  # specify latent intercept
     I =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
  # specify latent slope
     S =~ data.t_1 * y1 + data.t_2 * y2 + data.t_3 * y3 + data.t_4 * y4 + data.t_5 * y5
    
  # specify means of latent intercept and slope
     I ~ int*1
     S ~ slp*1
  
  # set intercepts of manifest variables to zero
     y1 ~ 0*1; y2 ~ 0*1; y3 ~ 0*1; y4 ~ 0*1; y5 ~ 0*1;
  "
```

Note how the loadings of the latent slope `S` on the items are now specified
with `data.t_1`-`data.t_5`. This will tell **OpenMx** [@bokerOpenMxOpenSource2011]
that these parameters should be replaced by the person-specific variables `t_1`-`t_5`
found in the data set `lgc_dat`. Everything else stayed the same.

> Important: The prefix `data.` is indidepent of the name of the data set in R. 
That is, even if our data set is called `lgc_dat`, we have to use `data.t_1`
to refer to the `t_1` variable located in `lgc_dat`.

The model can be set up and fitted with **mxsem**:
```{r}
# set up model
lgc_mod <- mxsem(model = model, 
                 data = lgc_dat, 
                 # we set scale_loadings to FALSE because the 
                 # loadings were already fixed to specific values.
                 # This just avoids a warning from mxsem
                 scale_loadings = FALSE)
# fit 
lgc_fit <- mxRun(model = lgc_mod)

summary(lgc_fit)
```


## Bibliography