\ifnum\pdfoutput>0 % pdflatex compilation \documentclass[a4paper,10pt]{article} \usepackage[pdftex]{graphicx} \DeclareGraphicsExtensions{.pdf,.png,.jpg} \RequirePackage[hyperindex]{hyperref} \else % htlatex compilation \documentclass{article} \usepackage{graphicx} \DeclareGraphicsExtensions{.png, .gif, .jpg} \newcommand{\href}[2]{\Link[#1]{}{} #2 \EndLink} \newcommand{\hypertarget}[2]{\Link[]{}{#1} #2 \EndLink} \newcommand{\hyperlink}[2]{\Link[]{#1}{} #2 \EndLink} \newcommand{\url}[1]{\Link[#1]{}{} #1 \EndLink} \fi \begin{document} \title{\texttt{cplint} Manual} \author{Fabrizio Riguzzi\\ fabrizio.riguzzi@unife.it} \maketitle \section{Introduction} \texttt{cplint} is a suite of programs for reasoning with ICL \cite{DBLP:journals/ai/Poole97}, LPADs \cite{VenVer03-TR,VenVer04-ICLP04-IC} and CP-logic programs \cite{VenDenBru-JELIA06,DBLP:journals/tplp/VennekensDB09}. It contains programs both for inference and learning. \section{Installation} \texttt{cplint} is distributed in source code in the source code development tree of Yap. It includes Prolog and C files. Download it by following the instruction in \url{http://www.ncc.up.pt/~vsc/Yap/downloads.html}. \texttt{cplint} requires \href{http://vlsi.colorado.edu/~fabio/CUDD/}{CUDD}. You can download CUDD from \url{ftp://vlsi.colorado.edu/pub/cudd-2.4.2.tar.gz}. Compile CUDD: \begin{enumerate} \item decompress cudd-2.4.2.tar.gz \item \texttt{cd cudd-2.4.2} \item see the \texttt{README} file for instructions on compilation \end{enumerate} Install Yap together with \texttt{cplint}: when compiling Yap following the instruction of the \texttt{INSTALL} file in the root of the Yap folder, use \begin{verbatim} configure --enable-cplint=DIR \end{verbatim} where \verb|DIR| is the directory where CUDD is, i.e., the directory ending with \texttt{cudd-2.4.2}. Under Windows, you have to use Cygwin (CUDD does not compile under MinGW), so\\ \begin{verbatim} configure --enable-cplint=DIR --enable-cygwin \end{verbatim} After having performed \texttt{make install} you can do \texttt{make installcheck} that will execute a suite of tests of the various programs. If no error is reported you have a working installation of \texttt{cplint}. \section{Syntax} LPAD and CP-logic programs consist of a set of annotated disjunctive clauses. Disjunction in the head is represented with a semicolon and atoms in the head are separated from probabilities by a colon. For the rest, the usual syntax of Prolog is used. For example, the CP-logic clause $$h_1:p_1\vee \ldots \vee h_n:p_n\leftarrow b_1,\dots,b_m ,\neg c_1,\ldots,\neg c_l$$ is represented by \begin{verbatim} h1:p1 ; ... ; hn:pn :- b1,...,bm,\+ c1,....,\+ cl \end{verbatim} No parentheses are necessary. The \texttt{pi} are numeric expressions. It is up to the user to ensure that the numeric expressions are legal, i.e. that they sum up to less than one. If the clause has an empty body, it can be represented like this \begin{verbatim} h1:p1 ; ... ;hn:pn. \end{verbatim} If the clause has a single head with probability 1, the annotation can be omitted and the clause takes the form of a normal prolog clause, i.e. \begin{verbatim} h1:- b1,...,bm,\+ c1,...,\+ cl. \end{verbatim} stands for \begin{verbatim} h1:1 :- b1,...,bm,\+ c1,...,\+ cl. \end{verbatim} The coin example of \cite{VenVer04-ICLP04-IC} is represented as (see file \texttt{coin.cpl}) \begin{verbatim} heads(Coin):1/2 ; tails(Coin):1/2:- toss(Coin),\+biased(Coin). heads(Coin):0.6 ; tails(Coin):0.4:- toss(Coin),biased(Coin). fair(Coin):0.9 ; biased(Coin):0.1. toss(coin). \end{verbatim} The first clause states that if we toss a coin that is not biased it has equal probability of landing heads and tails. The second states that if the coin is biased it has a slightly higher probability of landing heads. The third states that the coin is fair with probability 0.9 and biased with probability 0.1 and the last clause states that we toss a coin with certainty. Moreover, the bodies of rules can contain the built-in predicates: \begin{verbatim} is/2, >/2, =/2 ,=1.5. student_rank(S,h):0.4 ; student_rank(S,l):0.6:- bagof(G,R^(registr_stu(R,S),registr_gr(R,G)),L), average(L,Av),Av =< 1.5. \end{verbatim} where \verb|registr_stu(R,S)| expresses that registration \texttt{R} refers to student \texttt{S} and \verb|registr_gr(R,G)| expresses that registration \texttt{R} reports grade \texttt{G} which is a natural number. The two clauses express a dependency of the rank of the student from the average of her grades. Another extension can be used with \texttt{lpadsld.pl} in order to be able to represent reference uncertainty of PRMs. Reference uncertainty means that the link structure of a relational model is not fixed but is uncertain: this is represented by having the instance referenced in a relationship be chosen uniformly from a set. For example, consider a domain modeling scientific papers: you have a single entity, paper, and a relationship, cites, between paper and itself that connects the citing paper to the cited paper. To represent the fact that the cited paper and the citing paper are selected uniformly from certain sets, the following clauses can be used (see file \verb|paper_ref_simple.cpl|): \begin{verbatim} uniform(cites_cited(C,P),P,L):- bagof(Pap,paper_topic(Pap,theory),L). uniform(cites_citing(C,P),P,L):- bagof(Pap,paper_topic(Pap,ai),L). \end{verbatim} The first clauses states that the paper \texttt{P} cited in a citation \texttt{C} is selected uniformly from the set of all papers with topic theory. The second clauses expresses that the citing paper is selected uniformly from the papers with topic ai. These clauses make use of the predicate \begin{verbatim} uniform(Atom,Variable,List) \end{verbatim} in the head, where \texttt{Atom} must contain \texttt{Variable}. The meaning is the following: the set of all the atoms obtained by instantiating \texttt{Variable} of \texttt{Atom} with a term taken from \texttt{List} is generated and the head is obtained by having a disjunct for each instantiation with probability $1/N$ where $N$ is the length of \texttt{List}. A more elaborate example is present in file \verb|paper_ref.cpl|: \begin{verbatim} uniform(cites_citing(C,P),P,L):- setof(Pap,paper(Pap),L). cites_cited_group(C,theory):0.9 ; cites_cited_group(C,ai):0.1:- cites_citing(C,P),paper_topic(P,theory). cites_cited_group(C,theory):0.01;cites_cited_group(C,ai):0.99:- cites_citing(C,P),paper_topic(P,ai). uniform(cites_cited(C,P),P,L):- cites_cited_group(C,T),bagof(Pap,paper_topic(Pap,T),L). \end{verbatim} where the cited paper depends on the topic of the citing paper. In particular, if the topic is theory, the cited paper is selected uniformly from the papers about theory with probability 0.9 and from the papers about ai with probability 0.1. if the topic is ai, the cited paper is selected uniformly from the papers about theory with probability 0.01 and from the papers about ai with probability 0.99. PRMs take into account as well existence uncertainty, where the existence of instances is also probabilistic. For example, in the paper domain, the total number of citations may be unknown and a citation between any two paper may have a probability of existing. For example, a citation between two paper may be more probable if they are about the same topic: \begin{verbatim} cites(X,Y):0.005 :- paper_topic(X,theory),paper_topic(Y,theory). cites(X,Y):0.001 :- paper_topic(X,theory),paper_topic(Y,ai). cites(X,Y):0.003 :- paper_topic(X,ai),paper_topic(Y,theory). cites(X,Y):0.008 :- paper_topic(X,ai),paper_topic(Y,ai). \end{verbatim} This is an example where the probabilities in the head do not sum up to one so the null event is automatically added to the head. The first clause states that, if the topic of a paper \texttt{X} is theory and of paper \texttt{Y} is theory, there is a probability of 0.005 that there is a citation from \texttt{X} to \texttt{Y}. The other clauses consider the remaining cases for the topics. \subsection{Files} In the directory where Yap keeps the library files (usually \texttt{/usr/local/share/ Yap}) you can find the directory \texttt{cplint} that contains the files: \begin{itemize} \item \texttt{testlpadsld\_gbtrue.pl, testlpadsld\_gbfalse.pl, testlpad.pl, testcpl.pl, testsemlpadsld.pl, testsemlpad.pl testsemcpl.pl}: Prolog programs for testing the modules. They are executed when issuing the command \texttt{make installcheck} during the installation. To execute them afterwords, load the file and issue the command \texttt{t.} \item Subdirectory \texttt{examples}: \begin{itemize} \item \texttt{alarm.cpl}: representation of the Bayesian network in Figure 2 of \cite{VenVer04-ICLP04-IC}. \item \texttt{coin.cpl}: coin example from \cite{VenVer04-ICLP04-IC}. \item \texttt{coin2.cpl}: coin example with two coins. \item \texttt{dice.cpl}: dice example from \cite{VenVer04-ICLP04-IC}. \item \verb|twosideddice.cpl, threesideddice.cpl| game with idealized dice with two or three sides. Used in the experiments in \cite{Rig-RCRA07-IC}. \item \texttt{ex.cpl}: first example in \cite{Rig-RCRA07-IC}. \item \texttt{exapprox.cpl}: example showing the problems of approximate inference (see \cite{Rig-RCRA07-IC}). \item \texttt{exrange.cpl}: example showing the problems with non range restricted programs (see \cite{Rig-RCRA07-IC}). \item \texttt{female.cpl}: example showing the dependence of probabilities in the head from variables in the body (from \cite{VenVer04-ICLP04-IC}). \item \texttt{mendel.cpl, mendels.cpl}: programs describing the Mendelian rules of inheritance, taken from \cite{Blo04-ILP04WIP-IC}. \item \verb|paper_ref.cpl, paper_ref_simple.cpl|: paper citations examples, showing reference uncertainty, inspired by \cite{Getoor+al:JMLR02}. \item \verb|paper_ref_not.cpl|: paper citations example showing that negation can be used also for predicates defined by clauses with \texttt{uniform} in the head. \item \texttt{school.cpl}: example inspired by the example \verb|school_32.yap| from the source distribution of Yap in the \texttt{CLPBN} directory. \item \verb|school_simple.cpl|: simplified version of \texttt{school.cpl}. \item \verb|student.cpl|: student example from Figure 1.3 of \cite{GetFri01-BC}. \item \texttt{win.cpl, light.cpl, trigger.cpl, throws.cpl, hiv.cpl,}\\ \texttt{ invalid.cpl}: programs taken from \cite{DBLP:journals/tplp/VennekensDB09}. \texttt{invalid.cpl} is an example of a program that is invalid but sound. \end{itemize} The files \texttt{*.uni} that are present for some of the examples are used by the semantical modules. Some of the example files contain in an initial comment some queries together with their result. \item Subdirectory \texttt{doc}: contains this manual in latex, html and pdf. \end{itemize} \section{Learning} \texttt{cplint} contains the following learning algorithms: \begin{itemize} \item CEM (\texttt{cplint} EM): an implementation of EM for learning parameters that is based on \texttt{lpadsld.pl} \cite{RigDiM11-ML-IJ} \item RIB (Relational Information Bottleneck): an algorithm for learning parameters based on the Information Bottleneck \cite{RigDiM11-ML-IJ} \item EMBLEM (EM over Bdds for probabilistic Logic programs Efficient Mining): an implementation of EM for learning parameters that computes expectations directly on BDDs \cite{BelRig11-IDA,BelRig11-CILC11-NC,BelRig11-TR} \item SLIPCASE (Structure LearnIng of ProbabilistiC logic progrAmS with Em over bdds): an algorithm for learning the structure of program that is based on EMBLEM \cite{BelRig11-ILP11-IC} \end{itemize} \subsection{Input} To execute the learning algorithms, prepare four files in the same folder: \begin{itemize} \item \texttt{.kb}: contains the example interpretations \item \texttt{.bg}: contains the background knowledge, i.e., knowledge valid for all interpretations \item \texttt{.l}: contains language bias information \item \texttt{.cpl}: contains the LPAD for you which you want to learn the parameters or the initial LPAD for SLIPCASE \end{itemize} where \texttt{} is your dataset name. Examples of these files can be found in the dataset pages. In \texttt{.kb} the example interpretations have to be given as a list of Prolog facts initiated by \texttt{begin(model()).} and terminated by \texttt{end(model()).} as in \begin{verbatim} begin(model(b1)). sameperson(1,2). movie(f1,1). movie(f1,2). workedunder(1,w1). workedunder(2,w1). gender(1,female). gender(2,female). actor(1). actor(2). end(model(b1)). \end{verbatim} The interpretations may contain a fact of the form \begin{verbatim} prob(0.3). \end{verbatim} assigning a probability (0.3 in this case) to the interpretations. If this is omitted, the probability of each interpretation is considered equal to $1/n$ where $n$ is the total number of interpretations. \verb|prob/1| can be used to set different multiplicity for the different interpretations. In order for RIB to work, the input interpretations must share the Herbrand universe. If this is not the case, you have to translate the interpretations in this was, see for example the \texttt{sp1} files in RIB's folder, that are the results of the conversion of the first fold of the IMDB dataset. \texttt{.bg} can contain Prolog clauses that can be used to derive additional conclusions from the atoms in the interpretations. \texttt{.l} contains the declarations of the input and output predicates, of the unseen predicates and the commands for setting the algorithms' parameters. Output predicates are declared as \begin{verbatim} output(/). \end{verbatim} and define the predicates whose atoms in the input interpretations are used as the goals for the prediction of which you want to optimize the parameters. Derivations for these goals are built by the systems. Input predicates are those for the predictions of which you do not want to optimize the parameters. You can declare closed world input predicates with \begin{verbatim} input_cw(/). \end{verbatim} For these predicates, the only true atoms are those in the interpretations, the clauses in the input program are not used to derive atoms not present in the interpretations. Open world input predicates are declared with \begin{verbatim} input(/). \end{verbatim} In this case, if a subgoal for such a predicate is encountered when deriving the atoms for the output predicates, both the facts in the interpretations and the clauses of the input program are used. For RIB, if there are unseen predicates, i.e., predicates that are present in the input program but not in the interpretations, you have to declare them with \begin{verbatim} unseen(/). \end{verbatim} For SLIPCASE, you have to specify the language bias by means of mode declarations in the style of \href{http://www.doc.ic.ac.uk/~shm/progol.html}{Progol}. \begin{verbatim} modeh(,(,...). \end{verbatim} specifies the atoms that can appear in the head of clauses, while \begin{verbatim} modeb(,(,...). \end{verbatim} specifies the atoms that can appear in the body of clauses. \texttt{} can be an integer or \texttt{*} (currently unused). The arguments are of the form \begin{verbatim} + \end{verbatim} for specifying an input variable of type \texttt{}, or \begin{verbatim} - \end{verbatim} for specifying an output variable of type \texttt{}. or \begin{verbatim} \end{verbatim} for specifying a constant. \subsection{Parameters} In order to set the algorithms' parameters, you have to insert in \texttt{.l} commands of the form \begin{verbatim} :- set(,). \end{verbatim} The available parameters are: \begin{itemize} \item \verb|depth| (values: integer or \verb|inf|, default value: 3): depth of derivations if \verb|depth_bound| is set to \verb|true| \item \verb|single_var| (values: \verb|{true,false}|, default value: \verb|false|, valid for CEM, EMBLEM and SLIPCASE): if set to \verb|true|, there is a random variable for each clauses, instead of a separate random variable for each grounding of a clause \item \verb|sample_size| (values: integer, default value: 1000): total number of examples in case in which the models in the \verb|.kb| file contain a \verb|prob(P).| fact. In that case, one model corresponds to \verb|sample_size*P| examples \item \verb|epsilon_em| (values: real, default value: 0.1, valid for CEM, EMBLEM and SLIPCASE): if the difference in the log likelihood in two successive EM iteration is smaller than \verb|epsilon_em|, then EM stops \item \verb|epsilon_em_fraction| (values: real, default value: 0.01, valid for CEM, EMBLEM and SLIPCASE): if the difference in the log likelihood in two successive EM iteration is smaller than \verb|epsilon_em_fraction|*(-current log likelihood), then EM stops \item \verb|iter| (values: integer, defualt value: 1, valid for EMBLEM and SLIPCASE): maximum number of iteration of EM parameter learning. If set to -1, no maximum number of iterations is imposed \item \verb|iterREF| (values: integer, defualt value: 1, valid for SLIPCASE): maximum number of iteration of EM parameter learning for refinements. If set to -1, no maximum number of iterations is imposed. \item \verb|random_restarts_number| (values: integer, default value: 1, valid for CEM, EMBLEM and SLIPCASE): number of random restarts of EM learning \item \verb|random_restarts_REFnumber| (values: integer, default value: 1, valid for SLIPCASE): number of random restarts of EM learning for refinements \item \verb|setrand| (values: rand(integer,integer,integer)): seed for the random functions, see Yap manual for allowed values \item \verb|minimal_step| (values: [0,1], default value: 0.005, valid for RIB): minimal increment of $\gamma$ \item \verb|maximal_step| (values: [0,1], default value: 0.1, valid for RIB): maximal increment of $\gamma$ \item \verb|logsize_fraction| (values: [0,1], default value 0.9, valid for RIB): RIB stops when $\mathbf{I}(CH,T;Y)$ is above \verb|logsize_fraction| times its maximum value ($\log |CH,T|$, see \cite{DBLP:journals/jmlr/ElidanF05}) \item \verb|delta| (values: negative integer, default value -10, valid for RIB): value assigned to $\log 0$ \item \verb|epsilon_fraction| (values: integer, default value 100, valid for RIB): in the computation of the step, the value of $\epsilon$ of \cite{DBLP:journals/jmlr/ElidanF05} is obtained as $\log |CH,T|\times$\verb|epsilon_fraction| \item \verb|max_rules| (values: integer, default value: 6000, valid for RIB and SLIPCASE): maximum number of ground rules. Used to set the size of arrays for storing internal statistics. Can be increased as much as memory allows. \item \verb|logzero| (values: negative real, default value $\log(0.000001)$, valid for SLIPCASE): value assigned to $\log 0$ \item \verb|examples| (values: \verb|atoms|,\verb|interpretations|, default value \verb|atoms|, valid for SLIPCASE): determines how BDDs are built: if set to \verb|interpretations|, a BDD for the conjunction of all the atoms for the target predicates in each interpretations is built. If set to \verb|atoms|, a BDD is built for the conjunction of a group of atoms for the target predicates in each interpretations. The number of atoms in each group is determined by the parameter \verb|group| \item \verb|group| (values: integer, default value: 1, valid for SLIPCASE): number of target atoms in the groups that are used to build BDDs \item \verb|nax_iter| (values: integer, default value: 10, valid for SLIPCASE): number of interations of beam search \item \verb|max_var| (values: integer, default value: 1, valid for SLIPCASE): maximum number of distinct variables in a clause \item \verb|verbosity| (values: integer in [1,3], default value: 1): level of verbosity of the algorithms \item \verb|beamsize| (values: integer, default value: 20, valid for SLIPCASE): size of the beam in SLIPCASE \end{itemize} \subsection{Commands} To execute CEM, load \texttt{em.pl} with \begin{verbatim} ?:- use_module(library('cplint/em')). \end{verbatim} and call: \begin{verbatim} ?:- em(stem). \end{verbatim} To execute RIB, load \texttt{rib.pl} with \begin{verbatim} ?:- use_module(library('cplint/rib')). \end{verbatim} and call: \begin{verbatim} ?:- ib_par(stem). \end{verbatim} To execute EMBLEM, load \texttt{slipcase.pl} with \begin{verbatim} ?:- use_module(library('cplint/slipcase')). \end{verbatim} and call \begin{verbatim} ?:- em(stem). \end{verbatim} To execute SLIPCASE, load \texttt{slipcase.pl} with \begin{verbatim} ?:- use_module(library('cplint/slipcase')). \end{verbatim} and call \begin{verbatim} ?:- sl(stem). \end{verbatim} \section{License} \label{license} \texttt{cplint}, as Yap, follows the Artistic License 2.0 that you can find in Yap CVS root dir. The copyright is by Fabrizio Riguzzi. \vspace{3mm} The modules in the approx subdirectory use SimplecuddLPADs, a modification of the \href{www.cs.kuleuven.be/~theo/tools/simplecudd.html}{Simplecudd} library whose copyright is by Katholieke Universiteit Leuven and that follows the Artistic License 2.0. \vspace{3mm} Some modules use the library \href{http://vlsi.colorado.edu/~fabio/}{CUDD} for manipulating BDDs that is included in glu. For the use of CUDD, the following license must be accepted: \vspace{3mm} Copyright (c) 1995-2004, Regents of the University of Colorado All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: \begin{itemize} \item Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. \item Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. \item Neither the name of the University of Colorado nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. \end{itemize} THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS \\ AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAU-SED \\ AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. \texttt{lpad.pl}, \texttt{semlpad.pl} and \texttt{cpl.pl} are based on the SLG system by \href{http://engr.smu.edu/~wchen/}{Weidong Chen} and \href{http://www.cs.sunysb.edu/~warren/}{David Scott Warren}, Copyright (C) 1993 Southern Methodist University, 1993 SUNY at Stony Brook, see the file COYPRIGHT\_SLG for detailed information on this copyright. \bibliographystyle{plain} \bibliography{bib} \end{document}