initial version of LEMUR

This commit is contained in:
Fabrizio Riguzzi 2014-10-15 15:56:49 +02:00
parent b25c9e5b61
commit ce12c424f3

View File

@ -441,6 +441,9 @@ The files \texttt{*.uni} that are present for some of the examples are used by
\item EMBLEM (EM over Bdds for probabilistic Logic programs Efficient Mining): an implementation of EM for learning parameters that computes expectations directly on BDDs \cite{BelRig11-IDA,BelRig11-CILC11-NC,BelRig11-TR}
\item SLIPCASE (Structure LearnIng of ProbabilistiC logic progrAmS with Em over bdds): an algorithm for learning the structure of programs by searching directly the theory space \cite{BelRig11-ILP11-IC}
\item SLIPCOVER (Structure LearnIng of Probabilistic logic programs by searChing OVER the clause space): an algorithm for learning the structure of programs by searching the clause space and the theory space separatery \cite{BelRig13-TPLP-IJ}
\item LEMUR (LEarning with a Monte carlo Upgrade of tRee search): an algorithm
for learning the structure of programs by searching the clase space using
Monte-Carlo tree search.
\end{itemize}
\subsection{Input}
@ -449,7 +452,7 @@ To execute the learning algorithms, prepare four files in the same folder:
\item \texttt{<stem>.kb}: contains the example interpretations
\item \texttt{<stem>.bg}: contains the background knowledge, i.e., knowledge valid for all interpretations
\item \texttt{<stem>.l}: contains language bias information
\item \texttt{<stem>.cpl}: contains the LPAD for you which you want to learn the parameters or the initial LPAD for SLIPCASE. For SLIPCOVER, this file should be absent
\item \texttt{<stem>.cpl}: contains the LPAD for you which you want to learn the parameters or the initial LPAD for SLIPCASE and LEMUR. For SLIPCOVER, this file should be absent
\end{itemize}
where \texttt{<stem>} is your dataset name. Examples of these files can be found in the dataset pages.
@ -504,7 +507,7 @@ For RIB, if there are unseen predicates, i.e., predicates that are present in th
unseen(<predicate>/<arity>).
\end{verbatim}
For SLIPCASE and SLIPCOVER, you have to specify the language bias by means of mode declarations in the style of
For SLIPCASE, SLIPCOVER and LEMUR, you have to specify the language bias by means of mode declarations in the style of
\href{http://www.doc.ic.ac.uk/\string ~shm/progol.html}{Progol}.
\begin{verbatim}
modeh(<recall>,<predicate>(<arg1>,...).
@ -558,7 +561,7 @@ modeb(*,samecourse(+course, -course)).
modeb(*,samecourse(-course, +course)).
....
\end{verbatim}
SLIPCOVER also requires facts for the \verb|determination/2| predicate that indicate which predicates can appear in the body of clauses.
SLIPCOVER and LEMUR lso requires facts for the \verb|determination/2| predicate that indicate which predicates can appear in the body of clauses.
For example
\begin{verbatim}
determination(professor/1,student/1).
@ -592,17 +595,21 @@ In order to set the algorithms' parameters, you have to insert in \texttt{<stem>
The available parameters are:
\begin{itemize}
\item \verb|depth| (values: integer or \verb|inf|, default value: 3): depth of derivations if \verb|depth_bound| is set to \verb|true|
\item \verb|single_var| (values: \verb|{true,false}|, default value: \verb|false|, valid for CEM, EMBLEM, SLIPCASE and SLIPCOVER): if set to \verb|true|, there is a random variable for each clauses, instead of a separate random variable for each grounding of a clause
\item \verb|single_var| (values: \verb|{true,false}|, default value: \verb|false|, valid for CEM, EMBLEM, SLIPCASE, SLIPCOVER and LEMUR): if set to \verb|true|, there is a random variable for each clauses, instead of a separate random variable for each grounding of a clause
\item \verb|sample_size| (values: integer, default value: 1000): total number of examples in case in which the models in the \verb|.kb| file contain a \verb|prob(P).| fact. In that case, one model corresponds to \verb|sample_size*P| examples
\item \verb|epsilon_em| (values: real, default value: 0.1, valid for CEM, EMBLEM, SLIPCASE and SLIPCOVER): if the difference in the log likelihood in two successive EM iteration is smaller
\item \verb|epsilon_em| (values: real, default value: 0.1, valid for CEM,
EMBLEM, SLIPCASE, SLIPCOVER and LEMUR): if the difference in the log likelihood in two successive EM iteration is smaller
than \verb|epsilon_em|, then EM stops
\item \verb|epsilon_em_fraction| (values: real, default value: 0.01, valid for CEM, EMBLEM, SLIPCASE and SLIPCOVER): if the difference in the log likelihood in two successive EM iteration is smaller
\item \verb|epsilon_em_fraction| (values: real, default value: 0.01, valid for
CEM, EMBLEM, SLIPCASE, SLIPCOVER and LEMUR): if the difference in the log likelihood in two successive EM iteration is smaller
than \verb|epsilon_em_fraction|*(-current log likelihood), then EM stops
\item \verb|iter| (values: integer, defualt value: 1, valid for EMBLEM, SLIPCASE and SLIPCOVER): maximum number of iteration of EM parameter learning. If set to -1, no maximum number of iterations is imposed
\item \verb|iterREF| (values: integer, defualt value: 1, valid for SLIPCASE and SLIPCOVER):
\item \verb|iter| (values: integer, defualt value: 1, valid for EMBLEM,
SLIPCASE, SLIPCOVER and LEMUR): maximum number of iteration of EM parameter learning. If set to -1, no maximum number of iterations is imposed
\item \verb|iterREF| (values: integer, defualt value: 1, valid for SLIPCASE,
SLIPCOVER and LEMUR):
maximum number of iteration of EM parameter learning for refinements. If set to -1, no maximum number of iterations is imposed.
\item \verb|random_restarts_number| (values: integer, default value: 1, valid for CEM, EMBLEM, SLIPCASE and SLIPCOVER): number of random restarts of EM learning
\item \verb|random_restarts_REFnumber| (values: integer, default value: 1, valid for SLIPCASE and SLIPCOVER): number of random restarts of EM learning for refinements
\item \verb|random_restarts_number| (values: integer, default value: 1, valid for CEM, EMBLEM, SLIPCASE, SLIPCOVER and LEMUR): number of random restarts of EM learning
\item \verb|random_restarts_REFnumber| (values: integer, default value: 1, valid for SLIPCASE, SLIPCOVER and LEMUR): number of random restarts of EM learning for refinements
\item \verb|setrand| (values: rand(integer,integer,integer)): seed for the random functions, see Yap manual for allowed values
\item \verb|minimal_step| (values: [0,1], default value: 0.005, valid for RIB): minimal increment of $\gamma$
\item \verb|maximal_step| (values: [0,1], default value: 0.1, valid for RIB): maximal increment of $\gamma$
@ -610,14 +617,19 @@ than \verb|epsilon_em_fraction|*(-current log likelihood), then EM stops
\item \verb|delta| (values: negative integer, default value -10, valid for RIB): value assigned to $\log 0$
\item \verb|epsilon_fraction| (values: integer, default value 100, valid for RIB): in the computation of the step, the value of $\epsilon$ of \cite{DBLP:journals/jmlr/ElidanF05} is obtained as $\log |CH,T|\times$\verb|epsilon_fraction|
\item \verb|max_rules| (values: integer, default value: 6000, valid for RIB and SLIPCASE): maximum number of ground rules. Used to set the size of arrays for storing internal statistics. Can be increased as much as memory allows.
\item \verb|logzero| (values: negative real, default value $\log(0.000001)$, valid for SLIPCASE and SLIPCOVER): value assigned to $\log 0$
\item \verb|logzero| (values: negative real, default value $\log(0.000001)$, valid for SLIPCASE, SLIPCOVER and LEMUR): value assigned to $\log 0$
\item \verb|examples| (values: \verb|atoms|,\verb|interpretations|, default value \verb|atoms|, valid for SLIPCASE): determines how BDDs are built: if set to \verb|interpretations|, a BDD for the conjunction of all the atoms for the target predicates in each interpretations is built.
If set to \verb|atoms|, a BDD is built for the conjunction of a group of atoms for the target predicates in each interpretations. The number of atoms in each group is determined by the parameter \verb|group|
\item \verb|group| (values: integer, default value: 1, valid for SLIPCASE): number of target atoms in the groups that are used to build BDDs
\item \verb|nax_iter| (values: integer, default value: 10, valid for SLIPCASE and SLIPCOVER): number of interations of beam search
\item \verb|max_var| (values: integer, default value: 1, valid for SLIPCASE and SLIPCOVER): maximum number of distinct variables in a clause
\item \verb|max_var| (values: integer, default value: 1, valid for SLIPCASE,
SLIPCOVER and LEMUR): maximum number of distinct variables in a clause
\item \verb|verbosity| (values: integer in [1,3], default value: 1): level of verbosity of the algorithms
\item \verb|beamsize| (values: integer, default value: 20, valid for SLIPCASE and SLIPCOVER): size of the beam
\item \verb|mcts_beamsize| (values: integer, default value: 3, valid for LEMUR): size of the MCTS beam
\item \verb|mcts_visits| (values: integer, default value: +inf, valid for LEMUR): maximum number of visits (Nicola controlla)
\item \verb|megaex_bottom| (values: integer, default value: 1, valid for SLIPCOVER): number of mega-examples on which to build the bottom clauses
\item \verb|initial_clauses_per_megaex| (values: integer, default value: 1, valid for SLIPCOVER):
number of bottom clauses to build for each mega-example
@ -627,7 +639,7 @@ If set to \verb|atoms|, a BDD is built for the conjunction of a group of atoms f
maximum number of theory search iterations
\item \verb|background_clauses| (values: integer, default value: 50, valid for SLIPCOVER):
maximum numbers of background clauses
\item \verb|maxdepth_var| (values: integer, default value: 2, valid for SLIPCOVER): maximum depth of
\item \verb|maxdepth_var| (values: integer, default value: 2, valid for SLIPCOVER and LEMUR): maximum depth of
variables in clauses (as defined in \cite{DBLP:journals/ai/Cohen95}).
\item \verb|score| (values: \verb|ll|, \verb|aucpr|, default value \verb|ll|, valid for SLIPCOVER): determines the score function for refinement: if set to \verb|ll|, log likelihood is used, if set to \verb|aucpr|, the area under the
Precision-Recall curve is used.
@ -673,7 +685,19 @@ and call
\begin{verbatim}
?:- sl(stem).
\end{verbatim}
To execute LEMUR, load \texttt{lemur.pl} with
\begin{verbatim}
?:- use_module(library('cplint/lemur')).
\end{verbatim}
and call
\begin{verbatim}
?:- "mcts(stem,depth,c,iter,rules,covering)
\end{verbatim}
where \verb|depth| (integer) is the maximum number
of random specialization steps in the default policy, \verb|C| (real) is the value of the MCTS $C$ constant, \verb|iter| (integer) is the number of UCT rounds, \verb|rules| (integer) is
the maximum number of clauses to be
learned and \verb|covering| (Boolean) dentoes whether the search is peformed in
the space of clauses (true) or theories (false) (Nicola controlla).
\subsection{Testing}
To test the theories learned, load \texttt{test.pl} with