Discretization in 3 languages

There is an obscure and useful function that makes it easy to fit stochastic differential equations to data insofar as the model can be linearized without causing too much trouble. The function discretizes the continuous-time (i.e., differential equation) state matrices A, the drift or state transition matrix, B, the input or covariate coefficient matrix, and Q, diffusion or noise covariance matrix. That means that the function essentially takes the differential equation in matrix form and solves it for a given time step. The discretized matrices function like those of an autoregressive process. Some details of this approach and what this does can be found here but not exactly a complete implementation, namely with matrix B. So this is one of those code blocks I just have backed up in several project folders in various languages. Thanks to Mike Hunter of Georgia Tech for originally creating this and providing it in OpenMx.

R:

discretize <- function(A, B, Q, deltaT){
  deltaT <- c(deltaT)
  I <- diag(ncol(A))
  BLOCK <- matrix(0, nrow=2*nrow(A), ncol=2*ncol(A))
  
  # First Block expm for A integral, and expm(A*deltaT)
  BLOCK[1:(2*nrow(A)), 1:ncol(A)] <- 0
  BLOCK[1:nrow(A), (nrow(A)+1):(2*nrow(A))] <- I
  BLOCK[(nrow(A)+1):(2*nrow(A)), (nrow(A)+1):(2*nrow(A))] <- A
  BLOCK <- OpenMx::expm(BLOCK*deltaT)
  expA <- BLOCK[(nrow(A)+1):(2*nrow(A)), (nrow(A)+1):(2*nrow(A))]
  intA <- BLOCK[1:nrow(A), (nrow(A)+1):(2*nrow(A))]
  
  # Second Block expm for discretized Q
  BLOCK[1:(nrow(A)), 1:ncol(A)] <- -t(A)
  BLOCK[(nrow(A)+1):(2*nrow(A)), 1:ncol(A)] <- 0
  BLOCK[1:nrow(A), (nrow(A)+1):(2*nrow(A))] <- Q
  BLOCK[(nrow(A)+1):(2*nrow(A)), (nrow(A)+1):(2*nrow(A))] <- A
  BLOCK <- OpenMx::expm(BLOCK*deltaT)
  
  Bd <- intA %*% B
  Qd <- t(expA) %*% BLOCK[1:nrow(A), (nrow(A)+1):(2*nrow(A))]
  return(list(Ad=expA, Bd=Bd, Qd=Qd))
}

C++ (RcppArmadillo):

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;
using namespace arma;
using namespace std;

// [[Rcpp::export]]
List discretize(mat A, mat B, mat Q, double dt) {
	mat M, expA, intA, Ad, Bd, Qd, I;
	List output;
	I.eye(A.n_rows, A.n_cols);
	M.zeros(2*A.n_rows, 2*A.n_cols);
	
	// First Block expm for A integral, and expm(A*deltaT)
	M.submat(0,  A.n_rows, A.n_rows-1, 2*A.n_rows-1) = I;
	M.submat(A.n_rows,  A.n_rows, 2*A.n_rows-1, 2*A.n_rows-1) = A;
	M=expmat(M*dt);
	
	intA = M.submat(0,  A.n_rows, A.n_rows-1, 2*A.n_rows-1);
	expA = M.submat(A.n_rows,  A.n_rows, 2*A.n_rows-1, 2*A.n_rows-1);
	
	// Second Block expm for discretized Q
	M.submat(0,  0, A.n_rows-1, A.n_rows-1) = -1*A.t();
	M.submat(A.n_rows,  0, 2*A.n_rows-1, A.n_rows-1) = zeros(A.n_rows, A.n_cols);
	M.submat(0,  A.n_rows, A.n_rows-1, 2*A.n_rows-1)=Q;
	M.submat(A.n_rows,  A.n_rows, 2*A.n_rows-1, 2*A.n_rows-1) = A;
	M=expmat(M*dt);
	
	output("Ad")=expA;
	output("Bd")=intA*B;
	output("Qd")=expA.t() * M.submat(0,  A.n_rows, A.n_rows-1, 2*A.n_rows-1);

	return output;
}

Stan:

functions{
	matrix[] discretize(matrix A, matrix B, matrix Q, real dt) {
		int m = rows(A);
		int n = m;
		matrix[m,n] expA;
		matrix[m,n] intA; 
		matrix[m,n] I = diag_matrix(rep_vector(1.0, m));
		matrix[m,n] output[3];
		
		// First Block expm for A integral, and expm(A*deltaT)
		matrix[2*m,2*n] M = append_col(rep_matrix(0, 2*m, n), append_row(I, A));
		M = matrix_exp(M*dt);
		
		intA = block(M, 1,  n+1, m, n);
		expA = block(M, m+1,  n+1, m, n);
		
		// Second Block expm for discretized Q
		M = append_col( append_row(-1*A', rep_matrix(0, m, n)), append_row(Q, A));
		M = matrix_exp(M*dt);
				
		output[1] = expA;
		output[2] = intA*B;
		output[3] = expA' * block(M, 1,  n+1, m, n);

		return output;
	}	
}
R  stan  rcpp  c++ 

Understanding MCMC and autodifferentiation

In putting together educational materials related to Stan and posterior sampling, I remembered two good ones.

The MCMC interactive gallery is the best set of MCMC visualizations I’ve found.

Stan uses NUTS, so it has to calculate numerous gradients for each new sample and does so by autodifferentiation. I recommend this video for understanding autodiff. It helps a lot to know what Stan is doing with the model code to avoid giving it more work than necessary.

Stan: Structural Ordinal Item Response Theory with a Latent Interaction

The Non-Gaussian factor model introduces the idea of using other distributions for a factor analysis. But that code is not very generalized, and in reality we’ll tend to need something more like structural equation modeling with non-Gaussian variables.

The name for a factor model with logit-linked indicators, whether dichotomous or ordinal, is Item Response Theory. It has been used for decades to develop instruments and in particular, tests. Because of its history, factor loadings are called the “discrimination” parameters, intercepts are the item “difficulty”, and the factor scores represent each person’s “ability”.

I do not know if there’s a cool brand name for IRT where multiple latent variables go on to have structure amongst themselves. But it seems useful to establish the possibility. We are essentially talking about structural equation modeling with discrete distributions.

I wrote a model to explore a number of difficult computing challenges related to that pursuit and common to the kinds of analysis requests that I get:

  • Non-Gaussian factor indicators
  • of which there are a large number (»20) arising from questionnaire data,
  • each having 5 or more response categories, as a common property of Likert scales,
  • And latent variable interactions, because everyone wants to know if their variables involve moderation.

And a bonus idea: when there are a large number of indicators and we are assuming that they function similarly in representing the factor, we can model the distribution of the factor loadings and perhaps gain a little extra power to estimate them. The discrimination parameters in IRT tend to be the least precise.

So here is a model that addresses all of that, first with simulation of the data:

D<-4 #Number of factors
Q<-50 #Indicators per factor
K<-7 #Levels per indicator
N<-600 #Sample size


z<-matrix(NA, N, D)
z[,1]<-rnorm(N) 
z[,4]<-rnorm(N) 
z[,2]<-rnorm(N) + .3*z[,1]
z[,3]<-rnorm(N) + 2.5*z[,1] + .3*z[,2] + 2*z[,4]*z[,1]

pairs(z)

#I know, this is actually a probit model. 
dat<-array(NA, dim=c(N,D,Q))
for(d in 1:D){
  for(q in 1:Q){
    dat[,d,q]<- findInterval( rnorm(1,1,.05)*z[,d] + rnorm(N), seq(-1, 2, l=K-1) )+1
  }
}

And then the Stan code:

functions{
}
data {
  int<lower=0> N; 	 //sample size
  int<lower=1> D; 	 //num factors
  int<lower=1> Q;	 //num indicators per factor
  int<lower=2> K; 	 //num levels
  int<lower=1,upper=K> y[N, D, Q];
}
parameters {
  matrix[Q,D] lambda_z;
  ordered[K-1] c[Q, D];
  matrix[N,D] z;
  real beta[3];
  real alpha;
  
//optional multi-level distribution of discrimination parameters, helpful for smaller samples but many indicators
  real<lower=0> lambda_mu;
  real<lower=0> lambda_sd;
}
transformed parameters{
//centered parameterization for discrimination parameters
	matrix[Q,D] lambda = lambda_z*lambda_sd + lambda_mu;
}
model {
	z[,1] ~ std_normal();
	z[,4] ~ std_normal();
	z[,2] ~ normal(beta[1]*z[,1], 1.0);
	z[,3] ~ normal(beta[2]*z[,1] + beta[3]*z[,2] + alpha*(z[,1].*z[,4]), 1.0);
	
	for(d in 1:D){
		for(q in 1:Q){
			for(n in 1:N){
			y[n, d, q] ~ ordered_logistic( z[n,d] * lambda[q, d], c[q, d]);
			}
		 }
		 lambda_z[,d] ~ std_normal();
	}

//optional priors	
  lambda_sd ~ normal(0, 5.0);
  lambda_mu ~ normal(0, 5.0);
  beta ~ normal(0.0,5.0);
  alpha ~ normal(0.0,5.0);
}

Note certain features include grouping variables together into matrices instead of inputting them one by one, centered factor loadings (“discrimination” parameters) with a shared mean and standard deviation. Most important for me was achievement of latent interaction by element-wise multiplication of the z-score vectors alpha*(z[,1].*z[,4]). No higher order moments necessary!

Stan: Non-Normal Factor Model

There are so many factor analyses in the world and so few truly normally distributed variables. I have not seen much careful tailoring of residual distributions in medical or psychological research. It is probably because most software don’t support it or make it convenient. It was a revelation to me that you can use Markov Chain Monte Carlo (MCMC) to sample latent variables and then do all kinds of things, like non-Gaussian distributions and latent variable interactions. For Stan users, this will be trivial, but I know that I would have benefited from a simple demo earlier on. So that’s what I’m posting here.

In R, a non-normal factor can be simulated this way:

f<-rnorm(n) #latent factor
x1<-rnorm(n, mean=.8*f, sd=.5) 
x2<-rbinom(n, size=1, p=1/(1+exp(-2*f-.5))) 
x3<-rpois(n, lambda=exp(-1 + 1.5*f)) 

Here is the Stan code for a simplistic model with 3 indicators distributed as normal, Bernoulli, and Poisson:

data { 
  int<lower=0> N;
  vector[N] x1;
  int x2[N];
  int x3[N];
}
parameters {
real x1_mu;
real<lower=0> x1_sd;
real x2_mu;
real x3_mu;

real<lower=0> f_x1; //lower bound prevents sign indeterminacy
real<lower=0> f_x2;
real<lower=0> f_x3;
vector[N] z; //Latent variable scores- i.e., person-intercepts
}
model {
  z  ~ std_normal(); // N(0,1) prior for latent variables.
  x1 ~ normal( x1_mu + f_x1 * z, x1_sd );
  x2 ~ bernoulli( 1.0 ./( 1+exp( - f_x2 * z + x2_mu )) );
  x3 ~ poisson( exp( x3_mu + f_x3 * z)  );
}

Note that z doesn’t really need to be normally distributed either. Putting the above code in a string object model3.src, the model is then run as follows with package rstan:

model3<-stan_model(model_code = model3.src)
stan.dat<-list(N=n,"x1"=x1,"x2"=x2,"x3"=x3)
model3.fit<-sampling(model3, stan.dat, 
                     iter = 1500, warmup = 500, #Reducing the amount of warmup to cut down on time.
                     chains=3, cores=3, 
                     control=list(max_treedepth=8)) #Reduced treedepth from 10 to 8 to cut down time.
model3.draws<-extract(model3.fit)

Sampling z takes a much longer time than marginalizing over it (i.e., by using matrix decomposition algebra), of course, but I do not know how to do that without assuming the indicators are normal. And really, most of us are not using research models to shoot down missiles or dodge pedestrians, so a little extra time to do things right is worth the payoff.