The Prolog Factor Language (PFL) is a language that extends Prolog for providing a syntax to describe first-order probabilistic graphical models. These models can be either directed (bayesian networks) or undirected (markov networks). This language replaces the old one known as CLP(BN).
The package also includes implementations for a set of well-known inference algorithms for solving probabilistic queries on these models. Both ground and lifted inference methods are support.
### Installation
PFL is included with the YAP Prolog system. The commands to perform a default installation of YAP in your home directory in a Unix-based environment are shown next.
In case you want to install YAP somewhere else or with different settings, please consult the YAP documentation. From now on, we will assume that the directory `$HOME/bin` (where the binary is) is in your `$PATH` environment variable.
Once in a while, we will refer to the PFL examples directory. In a default installation, this directory will be located at `$HOME/share/doc/Yap/packages/examples/CLPBN`.
### Language
A first-order probabilistic graphical model is described using parametric factors, commonly known as parfactors. The PFL syntax for a parfactor is
_Type F; Phi ; C._
Where,
+ _Type_ refers the type of network over which the parfactor is defined. It can be `bayes` for directed networks, or `markov` for undirected ones.
+ _F_ is a comma-separated sequence of Prolog terms that will define sets of random variables under the constraint _C_. If _Type_ is `bayes`, the first term defines the node while the remaining terms define its parents.
+ _Phi_ is either a Prolog list of potential values or a Prolog goal that unifies with one. Notice that if _Type_ is `bayes`, this will correspond to the conditional probability table. Domain combinations are implicitly assumed in ascending order, with the first term being the 'most significant' (e.g. _x_0y_0_, _x_0y_1_, _x_0y_2_, _x_1y_0_, _x_1y_1_, _x_1y_2_).
+ _C_ is a (possibly empty) list of Prolog goals that will instantiate the logical variables that appear in _F_, that is, the successful substitutions for the goals in _C_ will be the valid values for the logical variables. This allows the constraint to be defined as any relation (set of tuples) over the logical variables.
In the example, we started by loading the PFL library, then we have defined one factor for each node, and finally we have specified the probabilities for each conditional probability table.
Notice that this network is fully grounded, as all constraints are empty. Next we present the PFL representation for a well-known markov logic network - the social network model. For convenience, the two main weighted formulas of this model are shown below.
~~~~
1.5 : Smokes(x) => Cancer(x)
1.1 : Smokes(x) ^ Friends(x,y) => Smokes(y)
~~~~
Next, we show the PFL representation for this model.
Notice that we have defined the world to be consisted of only two persons, `anna` and `bob`. We can easily add as many persons as we want by inserting in the program a fact like `person @ 10.`~. This would automatically create ten persons named `p1`, `p2`, dots, `p10`.
Unlike other fist-order probabilistic languages, in PFL the logical variables that appear in the terms are not directly typed, and they will be only constrained by the goals that appears in the constraint of the parfactor. This allows the logical variables to be constrained to any relation (set of tuples), and not only pairwise (in)equalities. For instance, the next example defines a network with three ground factors, each defined respectively over the random variables `p(a,b)`, `p(b,d)` and `p(d,e)`.
~~~~
constraint(a,b).
constraint(b,d).
constraint(d,e).
markov p(A,B); some_table; [constraint(A,B)].
~~~~
We can easily add static evidence to PFL programs by inserting a fact with the same functor and arguments as the random variable, plus one extra argument with the observed state or value. For instance, suppose that we know that `anna` and `bob` are friends. We can add this knowledge to the program with the following fact: `friends(anna,bob,t).`~.
One last note for the domain of the random variables. By default, all terms instantiate boolean (`t`/`f`) random variables. It is possible to choose a different domain for a term by appending a list of its possible values or states. Next we present a self-explanatory example of how this can be done.
The output of this goal will show the marginal probability for each _WetGrass_ possible state or value, that is, `t` and `f`. Notice that in PFL a random variable is identified by a term with the same functor and arguments plus one extra argument.
Now let's suppose that we want to estimate the probability for the same random variable, but this time we have evidence that it had rained in the day before. We can estimate this probability without resorting to static evidence with:
It is possible to tweak some parameters of PFL through `set_pfl_flag/2` predicate. The first argument is a option name that identifies the parameter that we want to tweak. The second argument is some possible value for this option. Next we explain the available options in detail.
This option allows to control when the message passing should cease. Be the residual of one message the difference (according some metric) between the one sent in the current iteration and the one sent in the previous. If the highest residual is lesser than the given value, the message passing is stopped and the probabilities are calculated using the last messages that were sent.
+ `max_residual`, the next message to be sent is the one with maximum residual (as explained in the paper textit{Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing}).
Parameter learning is done by calling the `em/5` predicate. Its arguments are the following.
`em(+Data, +MaxError, +MaxIters, -CPTs, -LogLik)`
Where,
+ `Data` is a list of samples for the distribution that we want to estimate. Each sample is a list of either observed random variables or unobserved random variables (denoted when its state or value is not instantiated).
+ `MaxError` is the maximum error allowed before stopping the EM loop.
+ `MaxIters` is the maximum number of iterations for the EM loop.
+ `CPTs` is a list with the estimated conditional probability tables.
It is possible to choose the solver that will be used for the inference part during parameter learning with the `set_em_solver/1` predicate (defaults to `hve`). At the moment, only the following solvers support parameter learning: `ve`, `hve`, `bdd`, `bp` and `cbp`.
This package also includes an external command for perform inference over probabilistic graphical models described in formats other than PFL. Currently two are supported, the [libDAI file format](http://cs.ru.nl/~jorism/libDAI/doc/fileformats.html), and the [UAI08 file format](http://graphmod.ics.uci.edu/uai08/FileFormat).
Let's assume that the current directory is the one where the examples are located. We can perform inference in any supported model by passing the file name where the model is defined as argument. Next, we show how to load a model with `hcli`.
With the above command, the program will load the model and print the marginal probabilities for all defined random variables. We can view only the marginal probability for some variable with a identifier _X_, if we pass _X_ as an extra argument following the file name. For instance, the following command will output only the marginal probability for the variable with identifier _0_.
Evidence can be given as a pair containing a variable identifier and its observed state (index), separated by a '=`. For instance, we can introduce knowledge that some variable with identifier _0_ has evidence on its second state as follows.
The options that are available with the `set_pfl_flag/2` predicate can be used in `hcli` too. The syntax is a pair `<Option>=<Value>` before the model's file name.