There are so many factor analyses in the world and so few truly normally distributed variables. I have not seen much careful tailoring of residual distributions in medical or psychological research. It is probably because most software don’t support it or make it convenient. It was a revelation to me that you can use Markov Chain Monte Carlo (MCMC) to sample latent variables and then do all kinds of things, like non-Gaussian distributions and latent variable interactions. For Stan users, this will be trivial, but I know that I would have benefited from a simple demo earlier on. So that’s what I’m posting here.
In R, a non-normal factor can be simulated this way:
f<-rnorm(n) #latent factor
x1<-rnorm(n, mean=.8*f, sd=.5)
x2<-rbinom(n, size=1, p=1/(1+exp(-2*f-.5)))
x3<-rpois(n, lambda=exp(-1 + 1.5*f))
Here is the Stan code for a simplistic model with 3 indicators distributed as normal, Bernoulli, and Poisson:
data {
int<lower=0> N;
vector[N] x1;
int x2[N];
int x3[N];
}
parameters {
real x1_mu;
real<lower=0> x1_sd;
real x2_mu;
real x3_mu;
real<lower=0> f_x1; //lower bound prevents sign indeterminacy
real<lower=0> f_x2;
real<lower=0> f_x3;
vector[N] z; //Latent variable scores- i.e., person-intercepts
}
model {
z ~ std_normal(); // N(0,1) prior for latent variables.
x1 ~ normal( x1_mu + f_x1 * z, x1_sd );
x2 ~ bernoulli( 1.0 ./( 1+exp( - f_x2 * z + x2_mu )) );
x3 ~ poisson( exp( x3_mu + f_x3 * z) );
}
Note that z
doesn’t really need to be normally distributed either.
Putting the above code in a string object model3.src
, the model is then run as follows with package rstan
:
model3<-stan_model(model_code = model3.src)
stan.dat<-list(N=n,"x1"=x1,"x2"=x2,"x3"=x3)
model3.fit<-sampling(model3, stan.dat,
iter = 1500, warmup = 500, #Reducing the amount of warmup to cut down on time.
chains=3, cores=3,
control=list(max_treedepth=8)) #Reduced treedepth from 10 to 8 to cut down time.
model3.draws<-extract(model3.fit)
Sampling z
takes a much longer time than marginalizing over it (i.e., by using matrix decomposition algebra), of course, but I do not know how to do that without assuming the indicators are normal.
And really, most of us are not using research models to shoot down missiles or dodge pedestrians, so a little extra time to do things right is worth the payoff.