This is a follow-on to A meta-transfer objective for learning to disentangle causal mechanisms
Here we describe an algorithm for predicting the causal graph structure of a set of visible random variables, each possibly causally dependent on any of the other variables.
There are two sets of parameters, the structural parameters and the functional parameters. The structural parameters compose a matrix where \(\sigma(\gamma_{ij})\)
represents the belief that variable \(X_j\)
is a direct cause of \(X_i\)
. The functional parameters are the parameters of the neural networks that model the conditional probability distribution of each random variable given its parent set.
From here, the algorithm used to learn the graph structure is composed of three steps:
\(\sigma(\gamma_{ij})\)
as the parameter) to create a possible graph configuration\(\gamma_{ij}\)
and \(\gamma_{ji}\)
from both being high (which reinforces the assumption that the graph is directed and acyclic)This continues until the structural parameters are all near 0 or 1, meaning that we have become confident in our estimates of the graph structure.
The sampled graphs are important because they are used to force the neural networks to rely only on the values of a node’s parents when predicting the value for a node. It appears that this method creates a model that generalizes to unseen interventions.
\(\gamma_{ij}\)
with the known value)Large graphs and dense graphs are progressively more difficult to learn. It also appears that smaller graphs are less sensitive to the regularization hyperparameters.