Foundations of Deep Learning for the Social Sciences:
Day 2 Python Tutorial

Today, we will demonstrate how deep learning methods can be used to fit and extend traditional latent variable models used in structural equation modeling and item response theory.

The methods used here come primarily from two papers. The first paper is by van Kesteren and Oberski (2022) and demonstrates how to fit structural equation models using backpropagation and stochastic gradient-based optimization. The second paper is by Urban and Bauer (2021) and demonstrates how to fit item response theory models using deep learning-based approximate inference methods.

Both papers have Python packages that make using the methods convenient: these are called tensorsem and DeepIRTools, respectively. If you have not already installed these packages, you can do so now using:

Now, let's import the packages we'll be using.

Holzinger-Swineford Example

We begin by analyzing the classic Holzinger-Swineford (HS) data set. HS consists of mental ability test scores of seventh- and eighth-grade children from two different schools and are typically fitted using a confirmatory 3-factor model. This is the model we'll fit in the following examples.

We will first fit the model using the Tensorsem package. Recall from lecture that structural equation models (SEMs) are fitted by minimizing a fitting function. Here, we will minimize the maximum likelihood fitting function: \begin{equation} F_{\text{ML}}(\boldsymbol{\theta}) = \log \lvert \boldsymbol{\Sigma}(\boldsymbol{\theta}) \rvert + \text{tr}\big[\mathbf{S} \boldsymbol{\Sigma}^{-1}(\boldsymbol{\theta}) \big], \nonumber \end{equation} where $\boldsymbol{\Sigma}(\boldsymbol{\theta})$ is the model-implied covariance matrix: \begin{equation} \boldsymbol{\Sigma}(\boldsymbol{\theta}) = \boldsymbol{\Lambda} (\mathbf{I} - \mathbf{B}_0)^{-1} \boldsymbol{\Psi} (\mathbf{I} - \mathbf{B}_0)^{-\top} \boldsymbol{\Lambda}^\top + \boldsymbol{\Theta}, \nonumber \end{equation} with $\boldsymbol{\Lambda}$ the factor loadings matrix, $\mathbf{B}_0$ is the structural regression weight matrix, $\boldsymbol{\Theta}$ the residual covariance matrix, and $\boldsymbol{\Psi}$ the factor covariance matrix. It's very easy to do SEM in deep learning frameworks. The fitting function can be constructed as the output of a computational graph (reproduced from van Kesteren and Oberski, 2022): fml_computation_graph.png

Tensorsem is a Python package that implements SEM in PyTorch. The following code for Tensorsem is taken directly from the Tensorsem GitHub repository with a few basic changes.

We now fit the same model using DeepIRTools, another PyTorch-based Python package that implements a number of latent factor models.

DeepIRTools uses an importance-weighted amortized variational estimator (I-WAVE) for the model parameters (Urban and Bauer, 2021). This estimator can be used for latent variable models with very complicated measurement models (e.g., models with neural networks). Unlike the SEM fitting function $F_{\text{ML}}(\boldsymbol{\theta})$, the I-WAVE loss — called the importance-weighted evidence lower bound, or IW-ELBO — is inherently stochastic, meaning that its value will change slightly every time it is computed.

The IW-ELBO is a lower bound on the exact log-likelihood that can be improved by increasing the number of importance samples drawn during fitting (e.g., increasing the iw_samples argument in the fit() method in DeepIRTools). As DeepIRTools fits the model and prints updates below, notice that it converges to essentially the same loss value as SEM, indicating that for this example the IW-ELBO is very close to the true log-likelihoood.

Determining convergence can be somewhat tricky for stochastic loss functions. DeepIRTools determines convergence by computing the average loss value over the past $100$ fitting iterations. If the best average loss does not change for a specified number of $100$-iteration intervals, the model is deemed converged. Convergence will be a bit slower in the small-scale setting using this approach, but will be quite fast with large sample sizes.

Th estimates obtained by DeepIRTools are very close to the SEM estimates.

Big-Five Personality Factors Example

H-S is quite small, so let's try a large-scale example.

I begin by downloading and pre-processing the data set, which consists of over 1 million people's responses to 50 Big-Five personality factors items. There are 10 items designed to measure each personality factor (i.e., extraversion, emotional stability, agreeableness, conscientiousness, and openness) and each item has 5 response categories ranging from 1 = Disagree to 5 = Agree.

After some basic pre-processing, I cut the data set down to around 621K people.

Let's fit a five-factor model to the Big-Five data using Tensorsem. We'll now use mini-batching due to the huge sample size.

We now fit a five-factor model to the Big-Five data using DeepIRTools. DeepIRTools is a little faster than Tensorsem in this example. This is likely because the SEM fitting function $F_{\text{ML}}(\boldsymbol{\theta})$ requires computing the determinant of as well as inverting the model-implied covariance matrix $\boldsymbol{\Sigma}(\boldsymbol{\theta})$ at each iteration, which is now somewhat large (i.e., $50 \times 50$).

The DeepIRTools estimates are close to the SEM estimates, although the differences are larger than for the Holzinger-Swineford example due to the extra stochasticity introduced by using mini-batch sampling. The average values across random starts for both methods would likely be quite close.